Multipart Upload

Intro

Multipart Upload is a way to upload large files to an S3 compatible Object Storage like GDX Cloud by splitting them into smaller parts and uploading each part in parallel.

Uploading multiple pieces simultaneously improves upload speed and provides better reliability and resumability in case of network errors or interruptions. In fact, if the upload of a single part fails, the other parts remain unaffected and the process can resume at any time. The parts are then combined into a single object on the server.

This has several advantages:

  • Upload speed: uploading more parts in parallel makes the uploading process faster.
  • Failure recovery: if the connection is dropped while uploading, the object is still safe. Parts that have previously been uploaded are retained and the upload can resume with the missing parts.
  • Pause the upload: The upload of the object can be stopped and resumed as needed, without requiring the entire upload to be restarted.

This feature is ideal for uploading large objects, maximizing network throughput, or for uploading files in an unstable network where failures are common.

Multipart Upload step by step

The multipart upload process involves the following steps:

  1. Initiate the upload: Start the upload process by sending a CreateMultipartUpload API. This request returns an upload ID, which is used to identify the upload in subsequent API requests.
  2. Upload parts: Upload parts of the file in parallel by sending UploadPart API requests with the upload ID and a part number. The part number should start at 1 and increment for each part.
  3. Complete the upload: Once all parts are uploaded, send a CompleteMultipartUpload API request with the upload ID and a list of part numbers and their corresponding ETags (a hash of the data).
  4. Verify the upload: Confirm the successful completion of the upload by downloading the entire file using GetObject and comparing it to the original file.
How to stop an upload
In case of failure, it is possible to abort a multipart upload with a AbortMultipartUpload API request, which will discard the uploaded parts and free up storage.

Usage

Let’s see how to work with this feature using the GDX Cloud Console or AWS s3api CLI commands.

Using the GDX Cloud Console

  1. Sign in to the GDX Cloud Console
  2. Navigate to your bucket
  3. Click the “Upload” button
  4. Select large files to upload
  5. The multipart upload will be handled automatically for files over a certain size

The console provides a user-friendly interface for managing multipart uploads with progress tracking and automatic retry handling.

Using AWS S3 API

You can use the AWS s3api CLI commands to manually control multipart uploads:

Create a multipart upload

Let’s start initializing a multipart upload:

aws s3api create-multipart-upload --endpoint https://s3.gdx.datnass.com --bucket my-gdx-cloud-bucket --key dork.png

This will print a UploadId in the output, let’s take note of that.

Did you forget to write down the UploadId?
No worries, if you forget to write down the UploadId you can perform a ListMultipartUploads, which shows a list of ongoing multipart uploads with their ID.

Show the ongoing multipart uploads

If at any point we need to check which multipart uploads are still ongoing:

aws s3api list-multipart-uploads --endpoint https://s3.gdx.datnass.com --bucket my-gdx-cloud-bucket --key dork.png

Upload a few parts

Then we can upload a few parts, let’s make two:

aws s3api upload-part --endpoint https://s3.gdx.datnass.com --bucket my-gdx-cloud-bucket --key dork.log --upload-id <UploadId> --part-number 1 --body ~/dork-part1.logaws s3api upload-part --endpoint https://s3.gdx.datnass.com --bucket my-gdx-cloud-bucket --key dork.log --upload-id <UploadId> --part-number 2 --body ~/dork-part2.log

These will print the ETag values in the output, let’s take note of them.

Did you forget to write down the ETags?
No worries, if you forget to write down the ETags you can perform a ListParts, which shows a list of parts with their part number and ETag.

Upload a part by copying from another object

When uploading a part, we can also omit the body and specify an existing object as a source to copy from:

aws s3api upload-part-copy --endpoint https://s3.gdx.datnass.com --bucket my-gdx-cloud-bucket --key dork.log --upload-id <UploadId> --part-number 1 --copy-source "my-gdx-cloud-bucket/my-source-object"

Alternatively, we can choose to only copy a portion of the source object. For example, we could copy the only first 1024 bytes:

aws s3api upload-part-copy --endpoint https://s3.gdx.datnass.com --bucket my-gdx-cloud-bucket --key dork.log --upload-id <UploadId> --part-number 1 --copy-source "my-gdx-cloud-bucket/my-source-object" --copy-source-range bytes=0-1023

Just like ordinary part uploading, you will need to take note of the value of ETag printed.

Possible error during the copy
Some very old objects might be considered ineligible for copying resulting in an error saying “Invalid source object”. Please contact us if you encounter this case.

Show the uploaded parts

If at any point we need to check which parts have already been uploaded:

aws s3api list-parts --endpoint https://s3.gdx.datnass.com --bucket my-gdx-cloud-bucket --key dork.png --upload-id <UploadId>

Complete a multipart upload

Finally, once all the parts have been uploaded we can create the final object out of them, by completing the upload:

aws s3api complete-multipart-upload --endpoint https://s3.gdx.datnass.com --bucket my-gdx-cloud-bucket --key dork.png --upload-id <UploadId> --multipart-upload "Parts=[{ETag=<ETag first part>,PartNumber=1},{ETag=<ETag second part>,PartNumber=2}]"

Limits

The multipart upload has some size limitations as summarized in the following table:

ItemLimit
Maximum object size5 TiB
Maximum number of parts per upload10,000
Part numbers1 to 10,000 (inclusive)
Minimum part size5 MiB
Maximum part size5 GiB
Maximum number of parts returned for a list parts request1000
Maximum number of multipart uploads returned in a list multipart uploads request1000