Transforming attributes the productive way in OCI Data Integration
This productivity feature in OCI Data Integration is one of my favorites! It lets users build expressions that work on patterns of attributes such that they don’t have to do the mundane work of defining the same transformation over and over (date formatting, number formatting, common calculations, list goes on and on). It also handles subsequent new attributes later in the data flow’s life such that you do not have keep baby-sitting changes.
This example has a number of attributes with temperature data in Fahrenheit in a csv file that I first do datatype conversion using the pattern and then compute Celsius values via a pattern also (see illustrations online for the computation). My source data looks like this, each entry has multiple temperatures (*_temp) in Fahrenheit for a specific location on a specific date, traditionally you’d have to convert attribute by attribute;
For example to convert the numbers you’d have to do TO_NUMBER(p1_temp, ‘##.#’) and then carry on for each of the other attributes.
For the Fahrenheit to Celcius conversion, you would do (p1_temp — 32) * 5/9, then do (p2_temp — 32) * 5/9 and so on. This is very tedious and also it does not handle change — new attributes added that you would also like to handle this way.
In OCI Data Integration you can perform bulk transformations on sets of attributes by;
- specify a pattern (name plus datatype) for the attributes to match
- specify a new datatype for the resultant attributes or keep the existing one
- specify a new name for the resultant attributes
- use a special token (%MACRO_INPUT%) in the expression.
For example rather than the expression in Figure 1, you can define a pattern for all attributes to match (*_temp), also its only the VARCHAR attributes that pattern is matched for, you can define the new datatype (DECIMAL below) and also the new names ($0 indicates the same name, you could do something like $0_NUM and attributes would have _NUM suffix if desired).
Then to convert to Celsius I’ve added another expression operator and an expression which does the conversion. This time I want to match the attributes which match name *_temp and are of type DECIMAL (since I just converted them). I will also keep the datatype the same since its just performing a numeric calculation.
We can see the data view of what has been done here and inspect all *p1_temp attributes, this will show us the source varchar attribute, the conversion to decimal one and also the computed celsius attribute, I just used the filter on the attributes panel to see this;
Its easy to shape the data and exclude/rename attributes based on patterns, so you can easily shape the data if you are creating new data sets to be consumed. What else? I can parameterize the dataflow above and at runtime pass in different data, some could have 6 temperature attributes, some could have one hundred, you don’t have to waste time curating the dataflow. If next week more attributes are added following the pattern all is well the dataflow does its job and transforms those new attributes!
Leaving you to do other work or enjoying time with family and friends.
For more detailed information on Oracle Cloud Infrastructure Data Integration check out the Tutorials and Documentation plus the Oracle Cloud Infrastructure Data Integration blogs here.