Types and Higher Order Functions in OCI Data Integration

David Allan
5 min readMar 22, 2023

--

OCI Data Integration supports processing of files with nested hierarchy and data types, in this blog we will look at the datatypes and higher order functions specifically related to nested hierarchy types and JSON;

Composite Type — a struct, a way to group several related properties into one place. Each properties in the structure is known as a member of the structure. Unlike an array, a structure can contain many different data types. Composite types can be accessed using: city.city_name.

Array Type — a data structure consisting of a collection of elements, each identified by an array index. Array types can be accessed using index like: json_expression.cities[0].city_name.

Map Type — maps (also known as dictionaries) are data structures storing a collection of key-value pairs. Each key is unique and allows for quick access to values. A real life example of a map could be storing the grades for students in a class (student name is key, grade is value).

The following functions in the expression operator are available to enable users to parse and process hierarchical data:

TRANSFORM

Takes an array and an anonymous function, and sets up a new array by applying the function to each element, and assigning the result to the output array.

TRANSFORM(EXPRESSION_1.HO_JSON.temp, (x -> ((x * 9) / 5) + 32 ))

Here is how to define the TRANSFORM to convert celsius to fahrenheit, use the infer datatype option to get the array datatype in output;

This transforms the array from an array of celsius values to an array of fahrenheit values;

[30, 40]

to the fahrenheit values;

[86, 104]

Another use case is to transform the structures, you may have properties you want to include in the target structure and it also may have different names, below is how to transform an array of objects which have a property called ‘name’ and in the output we want to use ‘objectName’;

TRANSFORM(EXPRESSION_1.HO_JSON.objects, (x-> TO_STRUCT('objectName', x.name) ))

TRANSFORM_KEYS

Takes a map and a function with 2 arguments (key and value), and returns a map in which the keys have the type of the result of the lambda function, and the values have the type of the column map values

TRANSFORM_KEYS(TO_MAP('c1', 1, 'c2', 100), ((k, v) -> concat(k,'id')))

This transform the map from

{"c1": 1, "c2": 100)

to this with different values for the keys (maybe you need to strip values from the key, or add values);

{"c1id": 1, "c2id": 100)

This is how this was setup below, infer datatype was used for this example, the map for demo purposes was created inline, obviously this can be from the upstream input data;

TRANSFORM_VALUES

Takes a map and a function with 2 arguments (key and value), and returns a map in which the values have the type of the result of the lambda functions, and the keys have the type of the column map keys. For example below there is a map of temperatures in celsius we transform the values in the map to fahrenheit.

TRANSFORM_VALUES(TO_MAP('c1', 30, 'c2', 40), ((k, v) -> ((v * 9) / 5) + 32))

This is how this was setup below, infer datatype was used for this example, the map for demo purposes was created inline, obviously this can be from the upstream input data;

This transforms the map from

{"c1": 30, "c2": 40)

to this (you can check an online converter if you like);

{"c1": 84, "c2": 104)

FROM_JSON

Parses a column containing a JSON string into one of the following types, with the specified schema.

  • Map, with String as the key type
  • Struct
  • Array
FROM_JSON('{\"Zip\":94065,\"City\":\"Redwood City\"}', 'STRUCT<Zip: BIGINT, City: STRING>') 

You must infer the datatype for this expression;

Then see the attribute and its datatype in the Attributes tab;

TO_JSON

Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string.

TO_JSON(TO_MAP('c1', 1, 'c2', 100))

returns

{"c1": 1, "c2": 100}

It can be defined as a VARCHAR explicitly;

TO_MAP

Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type.

TO_MAP('c1', 1, 'c2', 100)

returns a map type column, it must be defined using infer datatype;

You can then view the output attribute and its type;

TO_STRUCT

Creates a new column of Struct type. The input columns must be grouped as key-value pairs

TO_STRUCT('objectName', x.name)

Define using infer datatype;

Then view the output attribute and its type — it will be defined as COMPOSITE with members of identifier and name;

TO_ARRAY

Creates a new column as Array type. The input columns must all have the same data type.

TO_ARRAY(1,2,3,4,5)

You must use infer data type for this;

Then view the output attribute and the datatype;

This is useful for creating an array from many columns for example this may be quarter values or timestamps.

SCHEMA_OF_JSON

Parses a JSON string and infers the schema in DDL format.

SCHEMA_OF_JSON('[{\"Zip\":94065,\"ZipCodeType\":\"STANDARD\",\"City\":\"Redwood City\",\"State\":\"CA\"}]')

will return

'ARRAY<STRUCT<City:string,State:string,ZipCodeType:string,Zipcode:bigint>>'

That’s a quick overview of a few of the functions within OCI Data Integration for manipulating composite, array and map data, I hope you found it useful and would love to here any comments.

Related resources;

--

--

David Allan
David Allan

Written by David Allan

Architect at @Oracle The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

No responses yet