Multipart upload với công cụ AWS SDK

Tìm kiếm

Multipart upload với công cụ AWS SDK

Cập nhật lần cuối: 2024/09/30 10:35:53

Vì Object lớn nhất có thể tải lên trong một yêu cầu PUT là 5 GB, nên đối với object lớn, mỗi object được chia thành nhiều phần để thực hiện upload, phương pháp này sẽ đảm bảo tính toàn vẹn dữ liệu (khác với multi-threads upload đồng thời các thành phần đó lên, để đảm bảo tốc độ nhanh) Chia ra nhiều luồng để tối ưu tiến trình upload với những object có kích cỡ file lớn

Topic này giải thích cách sử dụng SDK nâng cao của thư viện SDK cho ngôn ngữ Python

Lưu ý: Khi bạn sử dụng AWS SDK để tải lên tệp tin lớn (multipart upload), tệp tin được chia thành nhiều segment để tải lên hệ thống CMC S3. Trong quá trình tải của tệp tin, có thể có một số segment được tải lên, một số segment không được tải lên do gặp lỗi như network có vấn đề, hệ thống S3 đang quá tải, ứng dụng của bạn bị dừng chạy, treo, …. Tệp tin khi đó được xem như tải lên không thành công, các segment đã được tải lên được xem như là các incomplete segment hay là segment rác và đang chiếm dụng dung lượng lưu trữ của bạn. Chúng tôi khuyến cáo bạn nên chủ động xóa các segment rác này trong ứng dụng của bạn để tối ưu chi phí và dung lượng lưu trữ của project mà bạn đang sử dụng.

Topic này giải thích cách sử dụng SDK nâng cao của thư viện SDK cho ngôn ngữ Python

import argparse
import boto3
import botocore
import os
import base64
import sys
import threading
from boto3.s3.transfer import TransferConfig

s3_endpoint_url = "xxxxxxxxxxxxxx"
s3_access_key_id = "xxxxxxxxxxxxxx="
s3_secret_access_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx=="
def get_files():
file_path = []
path = ''
while path != 'done':
# ask for file path
path = input("Please input the path for file you want to
upload or type 'done': ")
if path != 'done':
file_path.append(path)
return file_path
def s3_upload_file(args):
file_paths = get_files()
for path in file_paths:
try:
# Configure an S3 Resource
# Higher level object oriented API
s3 = boto3.resource('s3',
'',
use_ssl = False,
verify = False,
endpoint_url = s3_endpoint_url,
aws_access_key_id = base64.decodebytes(bytes(s3_access_key_id,'utf-8')).decode('utf-8'),
aws_secret_access_key =
base64.decodebytes(bytes(s3_secret_access_key, 'utf-8')).decode('utf8'),
)
GB = 1024 ** 3
# Ensure that multipart uploads only happen if the size of a transfer is larger than S3's size limit for nonmultipart uploads, which is 5 GB.
config = TransferConfig(multipart_threshold=5 * GB, max_concurrency=10, use_threads=True)
s3.meta.client.upload_file(path, args.bucket, os.path.basename(path),
Config=config, Callback=ProgressPercentage(path))
print("S3 Uploading successful")
break
except botocore.exceptions.EndpointConnectionError:
print("Network Error: Please Check your Internet Connection")

class ProgressPercentage(object):
def init (self, filename):
self._filename = filename
self._size = float(os.path.getsize(filename))
self._seen_so_far = 0
self._lock = threading.Lock()
def call (self, bytes_amount):
# To simplify we'll assume this is hooked up
# to a single filename.
with self._lock:
self._seen_so_far += bytes_amount
percentage = (self._seen_so_far / self._size) * 100
sys.stdout.write(
"\r%s %s / %s (%.2f%%)" % (
self._filename, self._seen_so_far, self._size,percentage))
sys.stdout.flush()
if name == ' main ':
parser = argparse.ArgumentParser(description='UPLOAD A FILE TO
CEPH')
parser.add_argument('bucket', metavar='BUCKET_NAME', type=str,
help='Enter the name of the bucket to which file has to be uploaded')

args = parser.parse_args()
s3_upload_file(args)