Automating Task Termination on SLA Miss and Best Practices
In this post we will see how to leverage OCI Events to terminate a long running task that has missed an SLA. Within task schedules and operators in a pipeline you can configure the expected duration time and OCI Data Integration will raise event when it is breached.
Limiting concurrent executions
You can configure task schedules so that if an existing task schedule is still active, then a new instance will not be created. This is done from within the task schedule editor, this is also configurable within the task itself;
Defining the SLA — Expected time to complete
You can configure the expected duration time for any task and also operators in a pipelines. When the expected duration is passed and the task is still executing, the service will raise an OCI Event for this. The event payload includes the workspace id, application key and task run key.
Here we see the expected duration in task schedule;
Within a pipeline, you can also define on each operator the expected time to complete and conditional paths on the task (task could be a pipeline, integration task, spark task etc).
Action on the Event — Expected Duration
You can then create a rule in OCI to perform some action when the task exceeds expected duration. In the rule conditions you can also add more conditions on attributes such as any task in a specific compartment or in a workspace or application;
The Actions can be any of;
- Notification — can then subscribe to a notification and for example automatically get an email or a slack message and more.
- Function — you can code whatever you want in any of the support function languages. Its this that we use below to terminate the task in OCI Data Integration.
- Streaming- publish to a stream, then consume
Here we see the rule condition matches on Exceeded Expected Duration and then the action is to call an OCI Function named terminate-taskrun. The code for that is further below;
This needs a little knowledge of OCI Functions but its pretty easy to setup and its a really useful mechanism for doing all kinds of work. The gist below includes everything you need to create and deploy the function, you should go through the OCI Functions tutorial to get a hang of that.
Summary
In this post we have seen how to leverage OCI Events to terminate a long running task in OCI Data Integration that has missed an SLA. Hope this is useful, let me know what you think.