Reusing Logic in OCI Data Integration

Parameters

Parameters are really useful in OCI Data Integration dataflows and pipelines - they are all about about reusing logic. Working smart and not doing the same thing over and over. Let’s see the different parameter types that can be defined and some use cases to illustrate the capabilities. What can be parameterized?

  • Data Entity
  • Data entity name or pattern
  • Schema/bucket
  • Connection
  • Data Asset
  • Filter / Join conditions
  • Scalar parameters
  • Data Entity — the source and target operators in dataflows can have the data entity parameterized. You can design using a CSV, run passing a CSV or a compressed CSV or many compressed CSVs (patterns of objects)! For tables you may want to modify the table name or even have the SQL Query representing the entity passed into the execution; you can have a data entity that is a reference to an existing table/object/query or a new table/object. A use case for this could be for example a generic dataflow which has a source table say in an Oracle database and a target Parquet object say in Object Storage, each of these can be parameterized and then the task used to export a table specified at runtime along with the folder/object name where the data is written. We can then take this even further and parameterize the data asset and connection representing the database and export from an ADW or ATP or any of the other data asset types!
  • Data Entity name or pattern can be parameterized as a string. Rather than parameterizing the entire entity the name can be parameterized.
  • Schema/Bucket — the source and target operators in dataflows can have the schema parameterized. Many applications will have a cookie cutter approach with a tenant/customer in each schema or same logic for different buckets, the schema content/entities are the same, just switching the schema allows reuse of DataFlow for many uses. The schema can be a database schema or a bucket even.
  • Connection — sometimes the connection mechanism has to be switched at runtime for different uses for different authentication/authorization. The source and target operators in dataflows can have the connection parameterized.
  • Data Asset — many customers will have cookie cutter applications with tenants/customers in different databases for example. If you parameterize the data asset you would have to parameterize the connection also. The source and target operators in dataflows can have the data asset parameterized.
  • Filter/Join conditions as parameters allow controlling what data is to be processed. The entire filter or join can be parameterized. If you have a generic flow that extracts changes but the column to identify changes is different across source tables / objects then you can parameterize the entire condition.
  • Scalar valued (eg text) parameters — this allows allow controlling how expressions are defined at runtime. This can be configured in many operators within dataflow such as within the source operator itself as a SQL query, within other operators such as filter, join, lookup, pivot, split or expression/aggregate expressions. For example you could parameterize part of a filter condition to get orders created after a value. The value can be passed as a parameter (FILTER_1.ORDER_TXNS.ORDER_DATE > $LAST_LOAD_DATE) or you could parameterize an entire source operator using custom SQL and SOURCE_QUERY parameter could be defined as SELECT * from WATERMARKDATA where LAST_UPDATED > CURRENT_DATE-1 it can then be referenced as ${SOURCE_QUERY}. These are incremental extract use cases from a source where we could parameterize either the whole query or by passing the date of the last extract into a dataflow and the parameter will be used in a filter condition.

Rules

Rules are really useful in OCI Data Integration dataflows — they are all about about building generic logic for shaping and transforming data. Many times in designs whether its in a programming language like SQL, Python, Java etc we build hard-wired solutions that are rigid to change.

Wrapping it up

Have you seen some of our other blogs? Check out all the blogs related to Oracle Cloud Infrastructure Data Integration. To learn more, check out some Oracle Cloud Infrastructure Data Integration Tutorials and Documentation.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
David Allan

David Allan

80 Followers

Architect at @Oracle developing cloud services for data. Connect on Twitter @i_m_dave