Working smart with Data Catalog in the world of Data Governance

The data catalog shows the data assets of an organization, it helps identify where they are and what they mean. It helps different data users know exactly where to go when data questions arise. The OCI Data Catalog arranges data into an easy to understand format so all data users can consume and use it. Let’s look at how we can productively register data assets in the catalog for data governance.

One way to define data assets is via the OCI Console, the graphical UI can be used to enter information manually, this involves registering data assets and harvesting the information and enriching it with domain specific business glossary about what it means. To register data assets there’s a number of steps that have to be done from defining its name, select type (ADW/Hive/Kafka etc), for databases entering a database name, creating connections, locating ATP/ADW wallets or generate one via the console (see here for steps), uploading wallets and so on. There’s a lot of manual steps here and if you have a few databases then it can be a chore.

In this article we will look at different techniques to catalog your inventory of systems easily and in an automated manner. The example uses autonomous databases, we could extend this to other database systems that are registered. We will look at;

  • compartment wide discovery using list (List Autonomous Databases in compartment)
  • tenancy wide discovery using search (search for autonomous database resources using Resource Search service)
  • event oriented discovery (using Event Service)

Compartment wide discovery

List autonomous databases

If you databases are organized in such a manner this is a useful way to quickly register databases in the data catalog. The API to list autonomous databases in a compartment looks like this;

adbs = db_client.list_autonomous_databases(compartment_id=compartment).data

See the python gist below for how this is done in the functions to list and create data assets. This is quite a handy script it encompasses the list/search of autonomous databases (see ListAutonomousDatabases here) and also the creation of the data asset within OCI Data Catalog. The wallet is generated right here in the script and uploaded into the data asset in Data Catalog. The only information to be added within the Data Catalog is the user name and password.

Script which supports Data Catalog creation from search/list of data assets

With the above script you can invoke create data assets via the list compartment as follows;

python -c enter_data_catalog_ocid -l enter_compartment_ocid

Tenancy wide discovery

Search for autonomous databases that are AVAILABLE

This is a quick way to search for databases across your tenancy and quickly register databases in the data catalog. The API to search (see SearchResources here) for autonomous databases looks like this;

search_client = oci.resource_search.ResourceSearchClient(config)searchbody = oci.resource_search.models.StructuredSearchDetails(query="query autonomousdatabase resources where lifeCycleState = 'AVAILABLE'")adbs = search_client.search_resources(search_details=searchbody, limit=1).data

You can see the use of this API in the gist above, note I also get the autonomous database after the search so that I can determine its type (ADW/ATP) for registering with OCI Data Catalog. Below you can see search using the script above there is no additional parameter (as in list compartment above);

python -c enter_data_catalog_ocid

Register Data Asset On Event

Create event to create data asset when ADB is created

To update data asset registered passwords you can use the SDK also or use the CLI as below, get the keys and you can easily update via scripts

oci data-catalog connection update --catalog-id enter_catalog_ocid --data-asset-key enter_data_asset_key --connection-key enter_con_key --enc-properties file://./mydataassetpass.json

This can be taken further and you can setup automatic harvesting for example on these data assets, all these APIs are there for you to make life easy and work smart. That’s all for now!

Read all about the OCI Data Catalog here…

Architect at @Oracle developing cloud services for data. Connect on Twitter @i_m_dave