Incremental Extract using OCI Data Integration

  • get privileges in the database to utilize OCI — this is best done using resource principal, see below (needs OCI administrator and database admin to do) — it can also be done by using an OCI user principal, this is done by creating a credential in the ADW with your OCI user, you will need the OCID, key, fingerprint etc.
  • get database permissions to access the PLSQL OCI packages, these are defined in the gist (needs database admin)
  • define the package below in your database schema in order to perform the loads, this package is supplied in the gist
  • design a dataflow and integration task to extract and load, publish to an application (you define this)
  • populate the metadata configuration table for the solution with the tasks and source/target pairs (you define this)
  • execute the procedure! (you define this)
  • P_EXTRACT_SCHEMA — the source schema (parameterize the source schema in the source operator)
  • P_EXTRACT_OBJECT — the source entity (parameterize the source entity in the source operator)
  • P_EXTRACT_DATE — a timestamp representing the date to extract from (parameterize the extract date — on source operator for BICC source or in a filter condition or custom SQL for database)
  • P_TARGET_NAME — the target table name, add a scalar VARCHAR parameter and set entity value to ${P_TARGET_NAME}

Database as Source

For a database as source, the dataflow design looks like below, the source schema and entity are parameterized using P_EXTRACT_SCHEMA and P_EXTRACT_OBJECT respectively;

Fusion as Source

If this was a Fusion source, there is no filter operator needed, the Fusion source operator has a property for incremental and an extract date property. Ensure you parameterize this with a parameter named P_EXTRACT_DATE that is of datatype TIMESTAMP, the source schema and entity are parameterized using P_EXTRACT_SCHEMA and P_EXTRACT_OBJECT respectively;

Object Storage as Target

Your dataflow design can target whatever you desire. The above examples were targeting databases, its also possible to have this target Object Storage. When the driver is executed you can pass the connection key for your OCI Object Storage data asset and set the bucket name and object name in the driver table. This may need extending if you also want to include some path information to include the year/month/day for example.

Sample Driver procedure

This is an example PLSQL package with procedure that is based on the fixed template of parameters described above.

Sample Data

The example below uses 2 simple tables to represent our source, they have different number and names of columns but do have a common LOAD_DATE column. We will use this to see how we can incrementally extract from these. The example dataflow I used had a database as a source and a database as target.

create table srctab1 (
C1 VARCHAR2(10),
C2 VARCHAR2(10),
LOAD_DATE TIMESTAMP(6) WITH TIME ZONE);
create table srctab2 (
X VARCHAR2(10),
Y VARCHAR2(10),
Z VARCHAR2(10),
LOAD_DATE TIMESTAMP(6) WITH TIME ZONE);
insert into srctab1
SELECT 'a'||LEVEL, 'a'||LEVEL,CURRENT_TIMESTAMP
FROM DUAL CONNECT BY LEVEL <= 100;
insert into srctab2
SELECT 'a'||LEVEL, 'a'||LEVEL,'a'||LEVEL,CURRENT_TIMESTAMP
FROM DUAL CONNECT BY LEVEL <= 300;
commit;

Populate the metadata configuration

Populate the driver table with a row for each source/target pair replace with your application name, task name and connection key source source/target, schema name and table name. The source, the last 3 NULL values are the last extract date, the last run date and the last run key — these will be populated during the procedure execution.

insert into adrivertable values(
'YourApplicationName',
'YourTaskName',
'your_source_connection_key',
'YOURSRCSCHEMA',
'SRCTAB1',
'your_target_connection_key',
'YOURTGTSCHEMA',
'TGTTAB1',
NULL, NULL,NULL);
insert into adrivertable values(
'YourApplicationName',
'YourTaskName',
'your_source_connection_key',
'YOURSRCSCHEMA',
'SRCTAB2',
'your_target_connection_key',
'YOURTGTSCHEMA',
'TGTTAB2',
NULL, NULL,NULL);
commit;

Execute the Procedure

Initial load execution pass the value FULL;

exec INCREMENTAL_UTILS.execute_tasks(
'OCI$RESOURCE_PRINCIPAL',
'us-ashburn-1',
'your_workspace_ocid',
'YourApplicationName',
'YourTaskName',
'FULL')
insert into srctab1 
SELECT 'b'||LEVEL, 'b'||LEVEL,CURRENT_TIMESTAMP
FROM DUAL CONNECT BY LEVEL <= 100;
exec INCREMENTAL_UTILS.execute_tasks(
'OCI$RESOURCE_PRINCIPAL',
'us-ashburn-1',
'your_workspace_ocid',
'YourApplicationName',
'YourTaskName',
'INCREMENTAL')

Summary

Here we have seen a solution that makes incremental extract and load as simple as possible for the user AND not only for a single source entity but with a metadata driven approach to support many source entities. This makes it effortless. The example uses Oracle database source as the illustration but can be used effortlessly to support extract from Fusion applications using BICC connectivity also. This can then be scheduled and executed on a regular basis. Check out earlier posts such as this one Executing Data Integration Tasks using PLSQL and see what else you can do.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
David Allan

David Allan

80 Followers

Architect at @Oracle developing cloud services for data. Connect on Twitter @i_m_dave