DBMS_CLOUD EXPORT_DATA and DBMS_PARALLEL_EXECUTE a match made in the cloud?

The Oracle database DBMS_CLOUD package was introduced with Oracle Autonomous databases, it provides lots of very cool procedures for interacting with Object Storage store from the Oracle database. It provides much more and can be used in on-premise installations for versions 19c and 21c (How To Setup And Use DBMS_CLOUD Package (Doc ID 2748362.1)) or using database links. As well as OCI Object Storage support it also works equally well with AWS S3 buckets, Microsoft Azure Blob Storage and much more.

The primary uses that I use it for are;

When you have lots of work to do, or lots of content to export, then it’s useful to know some techniques for distributing the work, this may be using DBMS_SCHEDULER to create parallel tasks for example or it could be to parallelize large operations. It’s the latter we will look at here, how to parallel export to OCI Object Storage using the Oracle database itself! Step in the DBMS_PARALLEL_EXECUTE package.

The DBMS_PARALLEL_EXECUTE package gives an easy way to parallelize chunks of work. You define the chunks, you define the work. The example I will look at here is to use this along with the DBMS_CLOUD.EXPORT_DATA procedure. Chunk a source and write partitioned data into OCI Object Storage;

The DBMS_PARALLEL_EXECUTE package has procedures that you can logically think of as;

  • create task — dbms_parallel_execute.create_task
  • create chunks — dbms_parallel_execute.create_chunks_by_sql
  • execute task - dbms_parallel_execute.run_task

There are some constraints into identifying the chunks, it can be either via a ROWID or via a NUMERIC column. You can creatively workaround many challenges with this. For example if using DATE datatype to chunk you can use Julian dates — the continuous count of days since the beginning of the Julian period, we can use to easily calculate elapsed days between two events (start of chunk and end of chunk).

This is an example procedure which needs to have the lower and upper parameters — the date in format YYYYMM is used as a prefix/folder name — below the data is exported as JSON, its also possible to export as Oracle’s own DataPump and other formats;

CREATE OR REPLACE PROCEDURE export_data_initial(p_lower_val IN NUMBER,p_upper_val IN NUMBER) IS 
BEGIN
DBMS_CLOUD.EXPORT_DATA(
credential_name =>'OCI$RESOURCE_PRINCIPAL',
file_uri_list =>'https://objectstorage.us-ashburn-1.oraclecloud.com/n/anamespace/b/abucket/o/stagingdata/SCH/TABLE/'||to_char(TO_DATE(p_lower_val,'J'),'YYYYMM')||'/TABLE.json',
format => json_object('type' value 'json'),
query => 'SELECT * FROM YOURTABLE WHERE YOURDATECOLUMN BETWEEN TO_DATE('||p_lower_val||',''J'') AND TO_DATE('||p_upper_val||',''J'')'
);
END;

The resultant files are written as follows (when the export format is JSON, the filename is partly generated — see here);

The example PLSQL block below chunks the source and the task for each chunk is executed, which as you see above exports to OCI Object Storage.

DECLARE
l_task VARCHAR2(30) := 'test_task3';
l_stmt CLOB;
l_sql_stmt VARCHAR2(32767);
l_chunk_id NUMBER;
l_any_rows BOOLEAN;
task_already_exists EXCEPTION;
PRAGMA exception_init ( task_already_exists, -29498 );
BEGIN
BEGIN
dbms_parallel_execute.drop_task(task_name => l_task);
EXCEPTION WHEN task_already_exists THEN NULL;
END;
dbms_parallel_execute.create_task(task_name => l_task);
l_stmt := 'select to_char(add_months (
date''2022-01-01'',
level - 1
),''J'') as start_id,
to_char(add_months (
date''2022-01-01'',
level
) - 1,''J'') as end_id
from dual
connect by level <= months_between (
date''2022-12-31'',
date''2022-01-01''
) + 1';
dbms_parallel_execute.create_chunks_by_sql(task_name => l_task,
sql_stmt => l_stmt, by_rowid => false);
dbms_parallel_execute.run_task(task_name => l_task, sql_stmt => 'BEGIN export_data_initial(:start_id, :end_id ); END;' , language_flag=> dbms_sql.native, parallel_level => 4);
END;
/

The above is an example that uses the Julian date to chunk by month, you can use the create_chunks_by_sql procedure to create your chunks and also use the date in the procedure to write to Object Storage using that date.

For debugging you can use the user_parallel_execute_chunks view which has information on each chunk including the STATUS and ERROR_MESSAGE. The DBMS_PARALLEL_EXECUTE package also has a RESUME_TASK procedure — this procedure resumes processing the chunks which have not been processed or chunks which are in PROCESSED_WITH_ERRORstate.

Thanks for reading, hope you find this interesting and it stimulates some thoughts and ideas. As you see the DBMS_CLOUD package provides lots of very cool procedures for interacting with Object Storage store from the Oracle database along with DBMS_PARALLEL_EXECUTE you can do some really useful things.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
David Allan

David Allan

80 Followers

Architect at @Oracle developing cloud services for data. Connect on Twitter @i_m_dave