Copying Images and Movies to OCI Object Storage using Rclone
This solution describes how to use Rclone to migrate/copy data from Amazon S3, Microsoft Azure Blob storage and more to an OCI Object Storage bucket. You may want to copy images or movies or any other data from on-premise, S3 or Microsoft Azure Blob storage, so that we can easily run some ML pipelines or data integration activities, you can easily do this with this approach.
You can use this to perform a one-time copy or an ongoing sync of the data. Rclone is a command-line program described as a swiss army knife of utilities to move data across various storage technologies from cloud and non-cloud systems — this includes on-premise FTP/SFTP servers. We can then use the OCI Cloud Agent to easily execute this via REST APIs (see the post here for more information).
In order to use this you’ll need an OCI account and Amazon/Azure account and have data stored in S3/Azure Blob Storage service.
OCI Object Storage is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data. Rclone is an open-source command-line program inspired by common Linux file utilities like cp and rsync. It is used to manage files across many cloud storage and non-cloud storage platforms.
Create a new OCI Object Storage bucket in the appropriate OCI Region or choose an existing bucket as the destination for the data you want to migrate. Ensure you have access that allows write access permissions to the destination bucket. Launch an OCI Linux instance (ensure you can SSH to it). This instance will also need access to AWS or Azure public API endpoints through the Internet — so you will need a NAT Gateway at least, or an Internet Gateway.
Download and install the Rclone command-line program. For installation instructions, see the Rclone installation documentation. This takes a couple of minutes, its very simple.
Copy the following
rclone.conf sample file below.
- Replace YOUR_S3_ACCESS_KEY_ID with your AWS S3 access key id and YOUR_S3_SECRET_ACCESS_KEY with your S3 secret access key. Change any other parameters to what you want, such as
us-east-1with the AWS Region where your S3 bucket is located.
- Replace YOUR_OCI_ACCESS_KEY_ID with your OCI access key id and YOUR_OCI_SECRET_ACCESS_KEY with your OCI secret access key. Change YOUR_OCI_NAMESPACE to your Object Storage namespace (plus the region that is in the endpoint URL).
- Change any other parameters to what you want, such as
us-ashburn-1with the OCI Region where your OCI bucket is located.
Save this file to the location
~/.config/rclone/rclone.conf on your OCI compute instance.
To confirm that Rclone is configured and permissions are working properly, verify that Rclone can parse your configuration file and that objects inside your OCI and S3 bucket are accessible. See the following for example validation commands. Its fun to explore the systems from the command line!
List the configured remotes in the configuration file. This will ensure that your configuration file is being parsed correctly. Review the output to make sure that it matches your
The above should return oci: and s3: for example based on my config above. Now, let’s listthe S3 buckets in the configured account.
rclone lsd s3:
List the files in the S3 bucket. Replace
images in this command with an actual S3 bucket name.
rclone ls s3:/images
You can do the same now using oci: to see the buckets and so on.
Migrate data using Rclone
This command copies data from the source S3 bucket to the destination OCI bucket.
rclone copy s3:/images oci:/images
This command synchronizes data between the source S3 bucket and the destination OCI bucket.
rclone sync s3:/images oci:images
Note, when you use the sync command, data that isn’t present in the source container will be deleted from the destination OCI bucket. After the initial copy is complete, run the Rclone sync command for ongoing migration so that only new files that are missing from the destination OCI bucket will be copied.
Hope you found this useful, check out the rclone documentation for detailed use and also OCI Data Integration for orchestrating this on a recurring basis. Send me comments on your thoughts and ideas, would love to hear from them.