Discovering Entities in Documents and Images with OCI Vision and Data Integration

Example table in a PDF document from OCI AI Vision service
Vision API in action within OCI Console
  1. the POST API call to create the document job
  2. the retrieval of the document job ocid
  3. poll on the document job using the ocid from point 2 until its completed
  4. support terminate document job

Creating the Document Job

Below we are creating a new REST task (see OCI Data Integration documentation here) and using CreateDocumentJob API which is a REST POST API on the Vision service. The URL used below is

Create a REST Task in Data Integration
REGION parameter has been added.
Define REGION parameter value
Content-Type header with value.
Accept header with value.
{
"features":[
{"featureType":"TEXT_DETECTION","generateSearchablePdf":true},
{"featureType":"DOCUMENT_CLASSIFICATION","maxResults":5},
{"featureType":"LANGUAGE_CLASSIFICATION","maxResults":5},
{"featureType":"KEY_VALUE_DETECTION"},
{"featureType":"TABLE_DETECTION"}],
"inputLocation": {
"sourceType": "OBJECT_LIST_INLINE_INPUT_LOCATION",
"objectLocations": [
{
"bucketName": "a_delta_archive",
"namespaceName": "mynamespace",
"objectName": "quarter_numbers_abc.pdf"
},
{
"bucketName": "a_delta_archive",
"namespaceName": "mynamespace",
"objectName": "quarter_numbers_xyz.pdf"
}
]
},
"outputLocation": {
"bucketName": "a_delta_archive",
"namespaceName": "mynamespace",
"prefix": "visionout"
},
"compartmentId": "ocid1.compartment.oc1..mycompartment",
"displayName": "visiondata",
"isZipOutputEnabled": false
}
SYS.RESPONSE_STATUS >= 200 AND SYS.RESPONSE_STATUS <= 300 AND CAST(json_path(SYS.RESPONSE_PAYLOAD, '$.lifecycleState') AS String) == 'SUCCEEDED'
Success condition

Retrieving the Document Job OCID

The documentJobs API is an asynchronous call, so if we want this task to wait until the Vision job is actual complete, we can use the poll feature within Data Integration. Select the check box to configure a polling and termination condition…

Define a polling API call
CAST(json_path(SYS.RESPONSE_PAYLOAD, '$.id') AS String)
Get the document job OCID

Polling on the Document Job

To poll the job, we will use the get document job API from Vision. We can define the URL to use the expression name

https://vision.aiservice.${REGION}.oci.oraclecloud.com/20220125/documentJobs/#{DOCUMENT_JOB_OCID}
Define the polling API
CAST(json_path(SYS.RESPONSE_PAYLOAD, '$.lifecycleState') AS String) != 'SUCCEEDED' AND CAST(json_path(SYS.RESPONSE_PAYLOAD, '$.lifecycleState') AS String) != 'FAILED'AND CAST(json_path(SYS.RESPONSE_PAYLOAD, '$.lifecycleState') AS String) != 'TERMINATED'
Define the polling condition

Terminating a Document Job

We can define the termination call similarly using the cancelDocumentJob API

https://vision.aiservice.${REGION}.oci.oraclecloud.com/20220125/documentJobs/#{DOCUMENT_JOB_OCID}/actions/cancel
Define the terminate/cancel API
Use resource principal
REST task is completed

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
David Allan

David Allan

Architect at @Oracle developing cloud services for data. Connect on Twitter @i_m_dave