Automate loading data to Exadata in OCI using OCI Data Integration

David Allan
5 min readAug 10, 2020

--

Oracle Cloud Infrastructure Data Integration is an Oracle-managed service that provides extract, transform, and load (ETL) capabilities to target data lake and data mart use cases on Oracle Cloud Infrastructure (OCI). The Data Integration service supports data loading and transformation privately to your Exadata instance in OCI and can load from on-premise databases again through this private path (see Aditya’s blog here for Connecting Exadata DB systems from Oracle Cloud Infrastructure (OCI) Data Integration).

Exadata features a modern cloud-based architecture with scale-out high-performance database servers, scale-out intelligent storage servers with state-of-the-art PCI flash, and an ultra-fast InfiniBand internal fabric that connects all servers and storage. Exadata can be purchased and deployed on premises as the ideal foundation for a private database cloud, or can be acquired using a subscription model and deployed in the Oracle Public Cloud or Cloud at Customer with all infrastructure management performed by Oracle. This illustration uses Exadata deployed in the Oracle Public Cloud.

The diagram below illustrates an example flow; when a notification is pushed into a topic in OCI, a subscriber for that topic kicks off the extract from on-premise and loads the Exadata instance in Oracle Public Cloud. The emitted message defines the task to execute, it could have other information such as filters and anything else that can be parameterized. The subscriber invokes a function that wrappers a Data Integration task. The task loads the data into an OCI Exadata instance for example.

Custom message with subscription triggering Data Integration Task to load Exadata.

When I created my workspace I used private connectivity and specified a network with FastConnect to my on-premise network. I can now create a data asset for Exadata in OCI Data Integration — this is done just like other data asset types, similarly we can create a data asset for our on-premise databases;

Create data asset for Exadata using scan DNS name.

We can use the data loader task for example to easily load data (object storage, database data etc) into the Exadata database, this is a very simple and focused task allowing source and target to be identified and rules to prepare (shape and transform) the data for the copy.

Let’s look at how we create execute this task based on a custom notification of our choice. We can use Fn to bind the notification message (what task to execute, which workspace etc see the code below where workspace/application/task are extracted from the input in the function) to the task, let’s create a new python based function to invoke the task;

fn init --runtime python execute-task-topic
cd execute-task-topic/

This generates a stub that if we change the generated func.py to have the content below;

import io
import json
import time
from fdk import response
import oci
from oci.data_integration.data_integration_client import DataIntegrationClient
def handler(ctx, data: io.BytesIO=None):
signer = oci.auth.signers.get_resource_principals_signer()
resp = do(signer,data)
return response.Response(
ctx, response_data=resp,
headers={"Content-Type": "application/json"})
def do(signer,data):
dip = DataIntegrationClient(config={}, signer=signer)
body = json.loads(data.getvalue())
wsid = body["workspace"]
application = body["application"]
task = body["task"]
md = oci.data_integration.models.RegistryMetadata(aggregator_key=task)
trkey = str(int(time.time()))
task = oci.data_integration.models.CreateTaskRunDetails(key=trkey, registry_metadata=md)
tsk = dip.create_task_run(wsid,application, create_task_run_details=task)
return tsk.data

Deploy the function to your application;

fn -v deploy --app distools

You can invoke the function from the command line to test for example, we have coded the function to use a JSON payload with values for workspace, application and task — this could have whatever you want, here is an example execution;

echo '{"workspace":"ocid1.disworkspace.oc1.iad.workspaceocid", "application":"add-application-key", "task":"add-task-key" }' | fn invoke distools execute-dataflow-topic

This passes the JSON we designed to the standard input of the function, the function is invoked and the task identified is executed.

Now we get to the fun part! Using the notification service create a topic and subscription , we can use the function deployed above in the subscription— when a message is published into the topic the OCI Data Integration function is executed to process the data. We can then execute the task, for example the notification has a way of testing in place, the “Publish Message” button allows you to enter the payload, below you can see the values passed for workspace, application and task;

The notification service will trigger any subscriptions, in this case executing the function which in turn kicks off the task. Easy!

You might think big deal, so what is the deal? With data integration tasks you can parameterize the source and even the target so that you can reuse the logic you have designed. This lets you reuse that logic to read from all like-shaped data sources and targets.

Here you have seen multiple services from Oracle Cloud Infrastructure work together to load data into Exadata in the Oracle Public Cloud through a private network, it leverages Data Integration, Fn and Notification Service to automate the load. Call to action? Go try out out the Oracle Cloud Infrastructure Data Integration service and use other services including Exadata, Functions, Notification Service and much more!

Check out the Get Going with Oracle Cloud Infrastructure Data Integration blog post below for a lot of useful information to get started…

--

--

David Allan
David Allan

Written by David Allan

Architect at @Oracle The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

No responses yet