DBMS_CLOUD EXPORT_DATA and DBMS_PARALLEL_EXECUTE a match made in the cloud?
The Oracle database DBMS_CLOUD package was introduced with Oracle Autonomous databases, it provides lots of very cool procedures for interacting with Object Storage store from the Oracle database. It provides much more and can be used in on-premise installations for versions 19c and 21c (How To Setup And Use DBMS_CLOUD Package (Doc ID 2748362.1)) or using database links. As well as OCI Object Storage support it also works equally well with AWS S3 buckets, Microsoft Azure Blob Storage and much more.
The primary uses that I use it for are;
- Creating external tables on objects — objects can be in OCI Object Storage, Amazon S3, Azure Blob Storage and more.
- Hybrid partitioned external tables (think hot/cold storage)
- Exporting from databases to object storage
- Copying data into tables from object storage
When you have lots of work to do, or lots of content to export, then it’s useful to know some techniques for distributing the work, this may be using DBMS_SCHEDULER to create parallel tasks for example or it could be to parallelize large operations. It’s the latter we will look at here, how to parallel export to OCI Object Storage using the Oracle database itself! Step in the DBMS_PARALLEL_EXECUTE package.
The DBMS_PARALLEL_EXECUTE package gives an easy way to parallelize chunks of work. You define the chunks, you define the work. The example I will look at here is to use this along with the DBMS_CLOUD.EXPORT_DATA procedure. Chunk a source and write partitioned data into OCI Object Storage;
The DBMS_PARALLEL_EXECUTE package has procedures that you can logically think of as;
- create task — dbms_parallel_execute.create_task
- create chunks — dbms_parallel_execute.create_chunks_by_sql
- execute task - dbms_parallel_execute.run_task
There are some constraints into identifying the chunks, it can be either via a ROWID or via a NUMERIC column. You can creatively workaround many challenges with this. For example if using DATE datatype to chunk you can use Julian dates — the continuous count of days since the beginning of the Julian period, we can use to easily calculate elapsed days between two events (start of chunk and end of chunk).
This is an example procedure which needs to have the lower and upper parameters — the date in format YYYYMM is used as a prefix/folder name — below the data is exported as JSON, its also possible to export as Oracle’s own DataPump and other formats;
CREATE OR REPLACE PROCEDURE export_data_initial(p_lower_val IN NUMBER,p_upper_val IN NUMBER) IS
format => json_object('type' value 'json'),
query => 'SELECT * FROM YOURTABLE WHERE YOURDATECOLUMN BETWEEN TO_DATE('||p_lower_val||',''J'') AND TO_DATE('||p_upper_val||',''J'')'
The resultant files are written as follows (when the export format is JSON, the filename is partly generated — see here);
The example PLSQL block below chunks the source and the task for each chunk is executed, which as you see above exports to OCI Object Storage.
l_task VARCHAR2(30) := 'test_task3';
PRAGMA exception_init ( task_already_exists, -29498 );
dbms_parallel_execute.drop_task(task_name => l_task);
EXCEPTION WHEN task_already_exists THEN NULL;
END; dbms_parallel_execute.create_task(task_name => l_task);
l_stmt := 'select to_char(add_months (
level - 1
),''J'') as start_id,
) - 1,''J'') as end_id
connect by level <= months_between (
) + 1'; dbms_parallel_execute.create_chunks_by_sql(task_name => l_task,
sql_stmt => l_stmt, by_rowid => false);
dbms_parallel_execute.run_task(task_name => l_task, sql_stmt => 'BEGIN export_data_initial(:start_id, :end_id ); END;' , language_flag=> dbms_sql.native, parallel_level => 4);
The above is an example that uses the Julian date to chunk by month, you can use the create_chunks_by_sql procedure to create your chunks and also use the date in the procedure to write to Object Storage using that date.
For debugging you can use the user_parallel_execute_chunks view which has information on each chunk including the STATUS and ERROR_MESSAGE. The DBMS_PARALLEL_EXECUTE package also has a RESUME_TASK procedure — this procedure resumes processing the chunks which have not been processed or chunks which are in
Thanks for reading, hope you find this interesting and it stimulates some thoughts and ideas. As you see the DBMS_CLOUD package provides lots of very cool procedures for interacting with Object Storage store from the Oracle database along with DBMS_PARALLEL_EXECUTE you can do some really useful things.