Zipping Objects in OCI

David Allan
4 min readMar 2, 2022

Many times there is the need to create archive files from many other files (using zip for the archiving), this is true from laptops, to on premise compute nodes to the cloud. Here we will see how to use an OCI Function to create a zip archive from a list of objects. The function we have designed has an input payload defined with object names to zip (identified as either a list of names or a prefix for names), the bucket they are in and in which bucket to write the zip along with the zip file name.

The function streams the objects from object storage (using tips from useful python articles like this), creates the zip in a stream (uses the python module stream-zip) and then uploads the parts in to Object Storage using the Object Storage multi part upload APIs;

The initial code for this can be found below — this uses the OCI Object Storage APIs to create a multipart upload (see here — if you have a 32 bit zip zip processing dependency use the parameter ZIP_32 with value ZIP_32);

After creating the function, you can invoke from the command line using all_files and listing the files in a comma separated list to test.

echo '{
"all_files":"headerdata.csv,linedata.csv",
"source_bucket":"sourcedata",
"target_bucket":"targetdata",
"target_zip_file":"targetdata.zip"}'
| fn invoke distools difunctionzipper

The above specifies that the source files headerdata.csv and linedata.csv will be read from the bucket sourcedata and zipped into a new zip archive named targetdata.zip and stored in bucket targetdata.

You can also specify a file prefix (using file_prefix rather than all_files), so all files that match that prefix will be zipped.

echo '{
"file_prefix":"sales",
"source_bucket":"sourcedata",
"target_bucket":"targetdata",
"target_zip_file":"targetdata.zip"}'
| fn invoke distools difunctionzipper

The above specifies that any source files with sales in the prefix will be read from the bucket sourcedata and zipped into a new zip archive named targetdata.zip and stored in bucket targetdata.

Now what?

Well one of the cool things about OCI Functions is that they have an endpoint you can invoke the function from, so you can integrate with any of your tools in the ecosystem we can integrate into OIC (Oracle Integration Cloud) or OCI Data Integration. Here we will see how we can integrate with OCI Data Integration.

From OCI Data Integration you can create a REST Task to execute this from within a schedule, in a pipeline or wherever you like;

The REST Task uses the OCI Function endpoint URL, you can copy this from your function and ensure you can reach that from OCI DI. The REST task above has the request payload as a parameter so when the task is executed you can modify the payload;

Now this function and DI Task is defined its now easy to incorporate this in to your data pipelines.

The above example illustrates a data pipeline that loads data into object storage and then zips the objects if successful, otherwise pushes a notification.

Above we’ve seen how we can zip objects in Object Storage and integrate with other tools such as OCI Data Integration and Oracle Integration Cloud. For the reverse, here is an example that unzips a zip file and stores a compressed file.

As you can see this is useful to incorporate all kinds of other REST activities. OCI Functions is a great way of adding custom business logic and they are automatically exposed with REST endpoints also! On top of that as you have seen here in the OCI Data Integration service, one of the most useful tasks is the REST Task, here we can extend data integration to call all kinds of activities. Hope you found this post useful.

--

--

David Allan

Architect at @Oracle The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.