The data catalog shows the data assets of an organization, it helps identify where they are and what they mean. It helps different data users know exactly where to go when data questions arise. The OCI Data Catalog arranges data into an easy to understand format so all data users can consume and use it. Let’s look at how we can productively register data assets in the catalog for data governance.
One way to define data assets is via the OCI Console, the graphical UI can be used to enter information manually, this involves registering data assets and harvesting the information and enriching it with domain specific business glossary about what it means. To register data assets there’s a number of steps that have to be done from defining its name, select type (ADW/Hive/Kafka etc), for databases entering a database name, creating connections, locating ATP/ADW wallets or generate one via the console (see here for steps), uploading wallets and so on. There’s a lot of manual steps here and if you have a few databases then it can be a chore.
In this article we will look at different techniques to catalog your inventory of systems easily and in an automated manner. The example uses autonomous databases, we could extend this to other database systems that are registered. We will look at;
- compartment wide discovery using list (List Autonomous Databases in compartment)
- tenancy wide discovery using search (search for autonomous database resources using Resource Search service)
- event oriented discovery (using Event Service)
Compartment wide discovery
This illustration uses the list autonomous database in compartment API, below you can see that API in action in the OCI Console, we list databases after selecting the compartment they reside.
If you databases are organized in such a manner this is a useful way to quickly register databases in the data catalog. The API to list autonomous databases in a compartment looks like this;
adbs = db_client.list_autonomous_databases(compartment_id=compartment).data
See the python gist below for how this is done in the functions to list and create data assets. This is quite a handy script it encompasses the list/search of autonomous databases (see ListAutonomousDatabases here) and also the creation of the data asset within OCI Data Catalog. The wallet is generated right here in the script and uploaded into the data asset in Data Catalog. The only information to be added within the Data Catalog is the user name and password.
With the above script you can invoke create data assets via the list compartment as follows;
python dc_create_data_assets_for_all_adbs.py -c enter_data_catalog_ocid -l enter_compartment_ocid
Tenancy wide discovery
This illustration uses the resource search capabilities in OCI Console, below you can see that API in action in the OCI Console, we search for autonomous databases that are in AVAILABLE lifecycle state in the tenancy.
This is a quick way to search for databases across your tenancy and quickly register databases in the data catalog. The API to search (see SearchResources here) for autonomous databases looks like this;
search_client = oci.resource_search.ResourceSearchClient(config)searchbody = oci.resource_search.models.StructuredSearchDetails(query="query autonomousdatabase resources where lifeCycleState = 'AVAILABLE'")adbs = search_client.search_resources(search_details=searchbody, limit=1).data
You can see the use of this API in the gist above, note I also get the autonomous database after the search so that I can determine its type (ADW/ATP) for registering with OCI Data Catalog. Below you can see search using the script above there is no additional parameter (as in list compartment above);
python dc_create_data_assets_for_all_adbs.py -c enter_data_catalog_ocid
Register Data Asset On Event
Whilst the previous two methods let you quickly register data assets that have already been created, in your Data Governance initiative you may want to register these when an event happens such as when an autonomous database has been created. For example you could create a data asset for the newly created autonomous database and/or also send an email to someone to complete the registration of the data asset by entering user name/password information. The example below uses the event from the container database plus a simple custom Fn function with similar code as posted in the gist for creation of data asset.
To update data asset registered passwords you can use the SDK also or use the CLI as below, get the keys and you can easily update via scripts
oci data-catalog connection update --catalog-id enter_catalog_ocid --data-asset-key enter_data_asset_key --connection-key enter_con_key --enc-properties file://./mydataassetpass.json
This can be taken further and you can setup automatic harvesting for example on these data assets, all these APIs are there for you to make life easy and work smart. That’s all for now!
Read all about the OCI Data Catalog here…