As technology advances, so does the need for efficient and reliable data integration. The use of cloud computing has made it easier for organizations to store and manage their data, and Oracle Cloud Infrastructure (OCI) provides a suite of powerful tools for data integration orchestration and scheduling.
If you’re currently using a legacy or custom scheduling solution, it’s time to consider making the switch to OCI. Here are just a few of the benefits that OCI can offer:
- Scalability: OCI provides a scalable infrastructure that can handle data integration of any size. This means that as your organization grows, your data integration capabilities can grow with it.
- Cost-effectiveness: With OCI, you only pay for the resources you use. This means that you can reduce your data integration costs by using the resources you need, when you need them.
- Automation: OCI provides automation capabilities that can significantly reduce the time and effort required for data integration tasks. With automated workflows and scheduling, you can focus on other tasks while your data is being integrated in the background.
- Security: OCI provides robust security features to ensure that your data is protected. With features such as encryption and access controls, you can be confident that your data is safe.
- Integration with other Oracle services: OCI integrates seamlessly with other Oracle services, such as Oracle Database, Oracle Analytics Cloud, and Oracle Autonomous Data Warehouse. This means that you can easily connect your data integration processes with other Oracle services to create a comprehensive data management solution.
In addition to these benefits, OCI provides a user-friendly interface that makes it easy to manage your data integration processes. With OCI, you can create, manage, and schedule your data integration workflows with just a few clicks.
Some of the common use cases;
- schedule Apache Spark jobs — leverage OCI Dataflow the serverless Spark offering in OCI (built in task here and can always use REST)
- schedule Jupyter Notebooks — run your notebooks in OCI Data Science — see blog here
- schedule custom Python/Shell scripts — leverage OCI Container Instances to build a completely cloud based, serverless solution or Cloud Agent for customer compute nodes.
- schedule custom jobs where you bring your own container — example here that uses rclone
The REST task in OCI Data Integration along with the serverless services such as;
- OCI Functions (serverless compute service for code that typically runs for short durations)
- OCI Container Instances. (serverless compute service that enables you to instantly run containers)
- OCI Dataflow (fully managed Apache Spark service that performs processing tasks on extremely large datasets)
- OCI Data Science (fully-managed platform for data scientists to build, train, deploy, and manage machine learning models)
allow you to schedule and orchestrate any program, that’s any program hosted in OCI or in other clouds or on-premise.
There is a collection of REST tasks for many of the OCI services in the post below including all of above;
We will look at how to;
- create a schedule
- schedule tasks
- monitor task runs
- integrate with events to manage by exception
- define pipelines for orchestration
Creating a schedule
The example below shows how to create a schedule using a custom Cron expression to run every 30 minutes;
As well as custom Cron expressions there are prebuilt frequencies which are simpler ways of defining the schedule. These can also be created using python and the CLI, there is a python example here. Let’s see how to schedule a specific task.
Any task can be scheduled, you can also schedule one task many times and configure the parameters different for each task. Below we see how a task that invokes an OCI Dataflow with my custom Spark application can be scheduled, when the task is scheduled you can define;
- whether to enable the task schedule or not
- the schedule to execute the task on
- when to start the task schedule and when to end
- the estimated time for the task run to complete and the number of retries to attempt in case of failure. When a task run exceeds the expected time, an OCI notification event is generated. This lets you manage by exception.
- parameters for this task schedule — anything you parameterize in a task can be configured here.
Below you can see the task schedule created;
These can be created using python and the CLI for example, there is a python example here. Let’s see how we monitor tasks when they are running.
Scheduled task runs
When tasks are executed you can view the execution as a child of the task run within the console, from here you can see when the task task started, if its running, when it finished, drill to logs, check the parameter values for that run and also the next run date;
Viewing information in the console is one mechanism to understand what is going on, I mentioned integration with the OCI events earlier when the task takes longer than you would expect, you can also integrate with events for when tasks are started or when they finish, let’s look at that next.
Integration with Events — Managing by Exception
Events are produced when tasks are started, when tasks complete and when they break a specific SLA duration you can intercept these events and perform actions. For example with the Rules page in OCI, you can define a rule that pushes a notification, below you can see we have defined the rule conditions as a Data Integration event for the event type Execute Task — End, also the condition is defined such that the trigger is only happening when the task status is ERROR.
The next part is to define the action, the action we have chosen below is to push a notification to the topic Run_failures;
Within the notification service we can subscribe to messages being posted to this topic and perform different actions including being notified by email, PagerDuty or Slack amongst others;
Here we can see when email is chosen we get to specify en email address. The subscriber will be sent an email which they have to acknowledge to allow the mails to be recieved.
Let’s see now how we orchestrate more complex scenarios.
Pipelines for Orchestrating
Pipelines are tasks for defining more complex scenarios such as running multiple tasks concurrently and upon completion doing some tasks after, you can define conditional paths and also handle error paths. The example below illustrates many of these, concurrent tasks, serial dependencies, error handling, conditional paths, pushing notifications with tasks and so on;
You can easily store state in state such as NoSQL tables and handle more creative wait on data scenarios (see this blog here).
That’s a very quick overview of the different capabilities in OCI Data Integration. As you can see overall, OCI offers a comprehensive suite of data integration orchestration and scheduling capabilities that can significantly enhance your organization’s data management capabilities. If you’re currently using legacy or custom scheduling solution, it’s time to consider making the switch to OCI. With its scalability, cost-effectiveness, automation, security, and integration capabilities, OCI is the ideal choice for modern data integration needs. Check out the references below!