Executing Tasks using Java in Oracle Cloud Infrastructure Data Integration
Here we will use the Oracle Cloud Infrastructure Data Integration Java SDK to execute a task which has been published to an application. To execute a task you’ll need the OCID of the workspace itself, the application key and the task key. Everything else can be defaulted and overwritten when needed — I’ll demonstrate both cases here. Tasks can be any of the supported tasks types in OCI Data Integration — this includes data loader tasks, integration tasks (based on a dataflow), pipeline tasks and so on. In this post you will see how to pass a CSV, a JSON file a table into an integration task. Calling this API has the same effect as invoking “Run” from a task within an application in OCI Console;
Tasks can have parameters for overriding the data asset, connection, schema, entity for any source or target operator. Tasks can also have parameters for filter conditions and join conditions, below you can see what happens in the console when you run a task with parameters, the console will prompt to see if you wish to override the defaults;
In the code example below we will see how to execute with default values and then how to execute and pass a different object name to be used during execution.
When you want to override parameters pass in values in the config provider property. This is JSON which has binding for the parameters. If its a database table you can simply pass in the table name, if its a file you’’ll need to pass in the data format for the data. For example below you can see dataFormat, the format attribute passes in model type of CSV_FORMAT, the encoding (UTF-8), the delimiter (,), the quote character (“), the time stamp format (yyyy-MM-dd HH:mm:ss.SSS), escape character(\);
If it was JSON, the data format is much simpler, just need the model type of JSON_FORMAT and the encoding of UTF-8 (for example);
If its a database table its even simpler;
In each of these you can see the key has a special format;
dataref:connectionKey/bucketName/FILE_ENTITY:fileName
or
dataref:connectionKey/schemaName/TABLE_ENTITY:tableName
Below you can see an example where the file and format is passed in as a parameter to the execution, the property parameters are passed into the execution via a local class with properties including the entity and data format plus a modelType property indicating it is an ENRICHED_ENTITY.
Here is the complete JSON example;
You can then see the task running in the OCI Console;
If you click on a task run, from within the OCI Console you can see more details of the run in including metrics on throughput and durations.
Data Integration also supports processing directories of objects in a bucket, that’s based on the naming convention for the object — so if the objects have name ‘20200707/meterx10934.csv’, ‘20200707/meterx10935.csv’ etc., then using ‘20200707/’ as the file entity name when executing above will process all objects with that prefix, in that logical directory.
Check out more on the Oracle Cloud Infrastructure Data Integration service.
https://docs.oracle.com/en-us/iaas/data-integration/home.htm