Skip to content

API

Indexer API is not generally public and may not be accessible outside of OSDU environment.

Required roles

Indexer service requires that users (and service accounts) have dedicated roles in order to use it. Users must be a member of one of the following:

  • users.datalake.viewers
  • users.datalake.editors
  • users.datalake.admins
  • users.datalake.ops

Roles can be assigned using the Entitlements Service. Please look at the API documentation for specific requirements.

In addition to service roles, users must be a member of data groups to access the data.

Required headers

The OSDU Data Platform stores data in different partitions, depending on the different accounts in the OSDU system.

A user may belong to more than one account. As a user, after logging into the OSDU portal, you need to select the account you wish to be active. Likewise, when using the Search APIs, you need to specify the active account in the header called data-partition-id. The correct data-partition-id can be obtained from the CFS services. The data-partition-id enables the search within the mapped partition. e.g.

data-partition-id: opendes

  • Optional headers

The correlation-id is a traceable ID to track the journey of a single request. The correlation-id can be a GUID on the header with a key. It is best practice to provide the correlation-id so the request can be tracked through all the services.

correlation-id: 1e0fef08-22fd-49b1-a5cc-dffa21bc0b70

If the service is initiating the request, an ID should be generated. If the correlation-id is not provided, then a new ID will be generated by the service so that the request would be traceable.

The x-collaboration is a header for a new feature that searches records in different namespaces. The x-collaboration now contains id and application parameters, separated by a comma. The id must be a UUID, and it indicates the namespace in which we will be searching. More info about collaboration header can found here.

x-collaboration: id=96d5550e-2b5e-4b84-825c-646339ee5fc7,application=pws

API Reference

Version info endpoint

Provides build and git related information.

Request

GET /api/indexer/v2/info HTTP/1.1

Example response

{
  "groupId": "org.opengroup.osdu",
  "artifactId": "indexer-gc",
  "version": "0.10.0-SNAPSHOT",
  "buildTime": "2021-07-09T14:29:51.584Z",
  "branch": "feature/GONRG-2681_Build_info",
  "commitId": "7777",
  "commitMessage": "Added copyright to version info properties file",
  "connectedOuterServices": [
    {
      "name": "elasticSearch",
      "version": "..."
    },
    {
      "name": "redis",
      "version": "..."
    }
  ]
}

This endpoint takes information from files, generated by spring-boot-maven-plugin, git-commit-id-plugin plugins. Need to specify paths for generated files to matching properties:

  • version.info.buildPropertiesPath
  • version.info.gitPropertiesPath

Reindex

Reindex a 'kind'

Reindex kind API allows users to re-index a kind without re-ingesting the records via storage API. Reindexing a kind is an asynchronous operation and when a user calls this API, it will respond with HTTP 200 if it can launch the re-indexing or appropriate error code if it cannot. The current status of the indexing can be tracked by calling search API and making query with this particular kind. Please be advised, it may take few seconds to few hours to finish the re-indexing as multiple factors contribute to latency, such as number of records in the kind, current load at the indexer service etc.

Request

POST /api/indexer/v2/reindex HTTP/1.1
{
  "kind": "opendes:welldb:wellbore:1.0.0"
}
**Curl**
curl --request POST \
  --url '/api/indexer/v2/reindex' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
  --data '{
  "kind": "opendes:welldb:wellbore:1.0.0"
}'


Prerequisite

Users must be a member of users.datalake.admins or users.datalake.ops group.

Query parameters

force_clean
  (optional, Boolean) If there is any inconsistency between the storage records and the index records, you can use this query parameter to synchronize them. If true, it will drop the current index data, apply latest schema changes & re-index records. If false, reindex API will apply the latest schema and overwrite records with the same ids. Default value is false.

Request body

kind
  (required, String) Kind to be re-indexed.

Reindex given records

Reindex records API allows users to re-index the given records by providing the record ids without re-ingesting the records via storage API. Reindexing a kind is an asynchronous operation and when a user calls this API, it will respond with HTTP 202 if it can launch the re-indexing or appropriate error code if it cannot. The response body indicates which given records were re-indexed and which ones were not found in storage. It supports up to 1000 records per API call.

Request

POST /api/indexer/v2/reindex/records HTTP/1.1
{
  "recordIds": ["opendes:work-product-component--WellLog:17763fcc18864f4f8eab62e320f8913d", "opendes:work-product-component--WellLog:566edebc-1a9f-4f4d-9a30-ed458e959ac7"]
}
**Curl**
curl --request POST \
  --url '/api/indexer/v2/reindex/records' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
  --data '{
  "recordIds": ["opendes:work-product-component--WellLog:17763fcc18864f4f8eab62e320f8913d", "opendes:work-product-component--WellLog:566edebc-1a9f-4f4d-9a30-ed458e959ac7"]
}'


Prerequisite

Users must be a member of users.datalake.admins or users.datalake.ops group.

Request body

recordIds
  (required, Array of String) Storage records to be re-indexed.

Example response

{
  "reIndexedRecords": [
    "opendes:work-product-component--WellLog:566edebc-1a9f-4f4d-9a30-ed458e959ac7"
  ],
  "notFoundRecords": [
    "opendes:work-product-component--WellLog:17763fcc18864f4f8eab62e320f8913d"
  ]
}

Reindex all records

FullReindex API allows users to re-index all the records in a given data partition without re-ingesting the records via storage API. Reindexing a kind is an asynchronous operation and when a user calls this API, it will respond with HTTP 200 if it can launch the re-indexing or appropriate error code if it cannot. The response body indicates which given records were re-indexed and which ones were not found in storage.

Request

PATCH /api/indexer/v2/reindex HTTP/1.1
**Curl**
curl --request PATCH \
  --url '/api/indexer/v2/reindex' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' 


Prerequisite

Users must be a member of users.datalake.ops group.

Query parameters

force_clean
  (optional, Boolean) If a kind has been previously indexed with a schema and if you wish to apply latest schema changes before re-indexing, than use this query parameter. It will drop the current Index schema, apply latest schema changes & re-index records. If false, reindex API will use the same schema and overwrite records with the same ids. Default value is false.

Delete API

Delete API is used to delete an index for a specific kind. Only users who belong to the Entitlement groups 'users.datalake.ops' can make calls to this API.

DELETE /api/indexer/v2/index?kind=opendes:welldb:wellbore:1.0.0
**Curl**
curl --request DELETE \
  --url '/api/indexer/v2/index?kind=opendes:welldb:wellbore:1.0.0' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' 


Data Partition provision

Configures Search backend for a data partition.

PUT /api/indexer/v2/partitions/provision HTTP/1.1
**Curl**
curl --request PUT \
  --url '/api/indexer/v2/partitions/provision' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes''


Prerequisite

Users must be a member of users.datalake.ops group.

NOTE: API should be run at-least once at the data partition provisioning to configure required resources/settings.

Schema change

Schema change event (produced by Schema Service) listener endpoint.

Note: This is internal API and shouldn't be exposed publicly.

Request

POST /api/indexer/v2/_dps/task-handlers/schema-worker HTTP/1.1
{
    "messageId": "676894654",
    "publishTime": "2017-03-19T00:00:00",
    "attributes": {
        "data-partition-id": "opendes",
        "correlation-id": "b5a281bd-f59d-4db2-9939-b2d85036fc7e"
    },
    "data": "[{\"kind\":\"slb:indexer:test-data--SchemaEventIntegration:1.0.0\",\"op\":\"create\"}]"
}

Request body

messageId
  (optional, String) Event message id.

publishTime
  (optional, String) Event publish time.

attributes.data-partition-id
  (required, String) Data partition id for which this message is targeted.

attributes.correlation-id
  (optional, String) Correlation-id to enable tracing.

data
  (required, String) Schema change event message json string. Only create and update events are supported.

Record change

Record change event (produced by Storage service) listener endpoint.

Note: This is internal API and shouldn't be exposed publicly.

Request

POST /api/indexer/v2/_dps/task-handlers/index-worker HTTP/1.1
{
    "messageId": "676895654",
    "publishTime": "2024-03-19T00:00:00",
    "attributes": {
        "data-partition-id": "opendes",
        "correlation-id": "b5a281bd-f59d-4db2-9939-b2d85036fc7e"
    },
    "data": "[{\"id\":\"opendes:master-data--Basin:0cfbffb72b2344aea5a8f92bbffd3953\",\"kind\":\"osdu:wks:master-data--Basin:1.0.0\",\"op\":\"create\"}]"
}

Request body

messageId
  (optional, String) Event message id.

publishTime
  (optional, String) Event publish time.

attributes.data-partition-id
  (required, String) Data partition id for which this message is targeted.

attributes.correlation-id
  (optional, String) Correlation-id to enable tracing.

data
  (required, String) Record change event message json string. Supported record-change event payload samples can be found here.

Monitor indexing progress

Once index-worker processes a record change event, it publishes event processing or indexing status to indexing-progress topic. Consumers can subscribe to this topic to monitor the indexing progress for a data-partition.

Please be advised, Core services do not provide any additional message level filtering capabilities, so consumers will see status updates for all kinds for a given data-partition.

Troubleshoot Indexing Issues

Get indexing status

Indexer service adds internal metadata to each record which registers the status of the indexing. The meta data includes the status and the last indexing date and time. This additional meta block helps to see the details of indexing. The format of the index meta block is as follows:

{
  "index": {
    "trace": [
      String,
      String
    ],
    "statusCode": Integer,
    "lastUpdateTime": Datetime
  }
}

Example:

{
  "results": [
    {
      "index": {
        "trace": [
          "datetime parsing error: unknown format for attribute: endDate | value: 9000-01-01T00:00:00.0000000",
          "datetime parsing error: unknown format for attribute: startDate | value: 1990-01-01T00:00:00.0000000"
        ],
        "statusCode": 400,
        "lastUpdateTime": "2018-11-16T01:44:08.687Z"
      }
    }
  ],
  "totalCount": 31895
} 

Details of the index block:

  1. trace: This field collects all the issues related to the indexing and concatenates using '|'. This is a string field.

  2. statusCode: This field determines the category of the error. This is an integer field. It can have the following values: - 200 - All OK - 404 - Schema is missing in Schema service. - 400 - Some fields were not properly mapped with the schema defined, such as the schema defined as int for field, but the input record had an attribute value of text etc.

  3. lastUpdateTime: This field captures the last time the record was updated by the Indexer service. This is datetime field, so you can do range queries on this field.

You can query the index status using the following example query:

POST /search/v2/query HTTP/1.1
{
  "kind": "*:*:*:*",
  "query": "index.statusCode:404",
  "limit": 1000,
  "returnedFields": [ "id", "index" ]
}
**Curl**
curl --request POST \
  --url /search/v2/query \
  --header 'Authorization: Token' \
  --header 'Content-Type: application/json' \
  --header 'Data-Partition-Id: opendes' \
  --data '{"kind": "*:*:*:*","query": "index.statusCode:404","returnedFields": ["index"]}'


Note: By default, the API response excludes the index attribute block. You must specify index field in returnedFields in order to see it in the response.

The above query returns all records which had problems due to fields mismatch.

Please refer to the Search service documentation for examples on different kinds of search queries.