Serverless Data Copy into OCI Object Storage
In this post we will look at using Rclone and the OCI Container Instance service to provide a completely serverless and cloud native solution for migrating and synchronizing data in a multi cloud environment. You will be able to copy from other clouds, from your on-premise file servers via SFTP, from your Hadoop HDFS filesystems and much more. A few weeks back native Oracle Cloud Infrastructure (OCI) Object Storage API support in the popular open source Rclone data migration and sync tool was announced, check out the blog here, this was great news. What are the challenges tho? There are a few pieces that have to be put together.
There’s a few challenges in doing this such as how to secure the config file for Rclone. This file has user access keys and secrets for connecting to all kinds of systems from SFTP, AWS S3, Google Drive, the list goes on. So where do we normally store sensitive secrets in OCI? The OCI Vault of course. That’s the perfect place to store the Rclone config file. We can store as is. Next challenge? Before we can call Rclone we must get the config file from the vault and store in the Rclone config file within the running container. One option would be to use volumes, but that is not an option for today. This can be done easily with a little python and the OCI Object Storage SDK. How can we copy from on-premise or other cloud providers? The OCI Container Instance can be created in a specific subnet which you can control how much or little network connectivity the instance can have. We will wrap this all up in a Docker image and execute using OCI Container Instances. Everything is defined within this blog for you to build a container image that allows you to run every Rclone command.
The Architecture
This architecture represents a use case for using leveraging OCI Container Instances service to perform the data copy/sync and OCI Data Integration to orchestrate the copy. This architecture also addresses the challenges of securing the connectivity using OCI Vault. The following diagram illustrates this architecture.
Let’s see how this is actually done.
How is this done — the code
The Dockerfile and supporting scripts are below. You will need to build this and push it to OCI once. The Rclone config file is stored in OCI Vault, then when you execute Container Instance you will need to specify the environment variable SECRET_ID and the secret OCID for the value. The command to execute is passed in the entrypoint arguments as a comma separated list of tokens, we shall see some examples later.
The file is leveraging the OCI CLI docker image, this has the Python SDK and the CLI, the Dockerfile is extending this, it installs the latest Rclone and it overrides the entrypoint with a script that gets the config file then executes Rclone.
Dockerfile.rclone
FROM ghcr.io/oracle/oci-cli:latest
USER root
RUN yum install -y unzip
RUN curl https://rclone.org/install.sh | bash
USER oracle
# this is the bootstrap driver
ADD bootstrap.sh /etc/bootstrap.sh
# The script_rclone.py file, gets the secret from OCI Secret Vault for Rclone
# This secret is the Rclone config file
ADD script_rclone.py /tmp/script_rclone.py
ENTRYPOINT [ "/etc/bootstrap.sh" ]
CMD [ "--help" ]
The bootstrap.sh script invokes the python to get the Rclone config file from the OCI Vault and then the Rclone command, passing all of the arguments to the command — so the container can do all kinds of Rclone commands;
bootstrap.sh
#!/bin/sh
# Execute the python script to generate the Rclone config file, retrieves from OCI Vault and stores
python3 /tmp/script_rclone.py
# Execute Rclone using the entrypoint arguments
rclone --config="/tmp/rclone.conf" $@
The python script is below, this gets the Rclone config from the secret and then saves it in the container — today this script is hard-wired to use resource principal;
script_rclone.py
import io
import os
import oci
import base64
# Reads the Rclone config file from a secret in OCI Object Storage.
def read_secret_value(secret_client, secret_id):
response = secret_client.get_secret_bundle(secret_id)
base64_Secret_content = response.data.secret_bundle_content.content
base64_secret_bytes = base64_Secret_content.encode('ascii')
base64_message_bytes = base64.b64decode(base64_secret_bytes)
secret_content = base64_message_bytes.decode('ascii')
return secret_content
secret_id=os.environ.get('SECRET_ID')
local=os.environ.get('LOCAL')
secret_client = None
if local is not None:
config = oci.config.from_file()
secret_client = oci.secrets.SecretsClient(config=config)
else:
signer = oci.auth.signers.get_resource_principals_signer()
secret_client = oci.secrets.SecretsClient(config={}, signer=signer)
secret_contents = read_secret_value(secret_client, secret_id)
file1 = open("/tmp/rclone.conf", "w")
file1.write(secret_contents)
file1.close()
Building the Container
Build the container image using command below, you will need to login to the OCI Container Registry in order to push the image, use the region, namespace and your own names for items;
docker build -t iad.ocir.io/yournamespace/rclonetool -f Dockerfile.rclone .
Testing the Container Locally
There are a lot of moving parts here, so its essential to be able to test this locally before we move into testing in OCI. You can test locally by using user authentication to the OCI Vault in order to get the config file for Rclone, you will want to ensure your config is using the appropriate authentication also. You need to setup your .oci/config file locally, this will use the default and will print out the Rclone help;
docker run --rm --mount type=bind,source=$HOME/.oci,target=/root/.oci iad.ocir.io/yournamespace/rclonetool --help
Rather than help you can setup your OCI Vault with the config file and pass the secret OCID to perform the command. Setup a secret id using user principal in OCI Vault, for example;
rclone_local.config
[oos]
type = oracleobjectstorage
namespace = redacted
compartment = ocid1.compartment.oc1..redacted
region = us-ashburn-1
provider = user_principal_auth
access_key_id = redacted
secret_access_key = redacted
[s3]
type = s3
provider = AWS
env_auth = false
access_key_id = redacted
secret_access_key = redacted
region = us-east-1
Then run the container locally to test using the secret ocid for the user principal based authentication to OCI;
docker run -e SECRET_ID=ocid1.vaultsecret.oc1.iad.redacted -e LOCAL=true --rm --mount type=bind,source=$HOME/.oci,target=/oracle/.oci iad.ocir.io/yournamespace/rclonetool copy s3://sourcedatabucket2022/taxi1.3Gb.snappy.parquet oos://targetdatabucket2022
Serverless execution in OCI
Similar to running locally we will use OCI Vault to store the secrets, OCI Container Registry for holding the container images and OCI Container Instance service for running the containers in a truly serverless manner.
When running the container in the cloud rather than using user principal as we did when testing locally, we will run without the LOCAL=true variable, by default the code uses resource principal for OCI authentication. Here is a sample config for use when copying from S3 to OCI Object Storage using resource principal based authentication (you will need to add policy statements to grant the resource the permissions needed);
rclone.config
[oos]
type = oracleobjectstorage
namespace = redacted
compartment = ocid1.compartment.oc1..redacted
region = us-ashburn-1
provider = resource_principal_auth
[s3]
type = s3
provider = AWS
env_auth = false
access_key_id =
redacted
secret_access_key = redacted
region = us-east-1
In order to run in OCI Container Instances, for this example we will be pushing this to OCI Container Registry (other external registries are also supported);
docker push iad.ocir.io/yournamespace/rclonetool
We can then use OCI Container Instances to execute — choose your shape and network;
Then select your container image to use, specify the environment variable SECRET_ID with the OCID of the secret containing the config information.
Then the fun part, specifying the Rclone command to use, here we copy from S3 to OCI. The entrypoint arguments as documented in the console need to be a comma separated list of arguments, so needs to be like;
- copy,<source>,<target>
When the Container Instance is created you will see the containers being created and running;
To debug and diagnose you can navigate to the specific container with in the Container Instances object and use “View Logs” to see the output. This is great for understanding what happened when things go wrong, wrong secret ocid, no permissions for it, connectivity issues and so on.
Orchestrating in OCI Data Integration
The copy data and any other container can be orchestrated and scheduled from within OCI Data Integration. Use the Rest Task to execute the container and pass the arguments and environment variables. Here is a snippet of the Rest Task;
You can get this task using the postman collection from here. It’s fully parameterized, so not only can it be used for executing this data copy example, but for any other container also. Below you can see the task being executed and the container specified along with other arguments;
You can schedule this task to run on a recurring basis, add to a data pipeline to perform this prior to other tasks for example.
How does this compare to OCI Data Integration’s Data Loader, let’s compare some of the source below;
That’s a high level summary of the capabilities, see the documentation links in the conclusion for more detailed information.
Conclusion
As you can see, once you containerize your work it becomes very easy to use and manage, here we leverage many of the OCI services to build the solution along with open source components. Get started with trying this out, for more information on the concepts discussed in this post, see the following resources:
- Rclone.org
- Download version 1.60.0 with native OCI support
- Rclone documentation for Oracle Cloud Infrastructure Object Storage
- OCI Container Instances
- OCI Container Instances FAQ
- OCI Container Instances documentation
- OCI CLI Docker
- OCI Data Integration Data Loader
We hope that this blog helps as you learn more about Oracle Cloud Infrastructure!