Skip to content

API

Search API access

Required roles

The Search service requires that users have dedicated roles in order to use it. Users must be a member of:

  • users.datalake.viewers,
  • users.datalake.editors, or
  • users.datalake.admins.

Roles can be assigned using the Entitlements service. Please look at the API documentation for specific requirements.

In addition to service roles, users must be a member of data groups to access the data.

Required headers

The OSDU Data Platform stores data in different partitions, depending on the different accounts in the OSDU system.

A user may belong to more than one account. As a user, after logging into the OSDU portal, you must select the account you wish to be active. Likewise, when using the Search APIs, you must specify the active account in the header called Data-Partition-Id. The Data-Partition-Id enables the search within the mapped partition.

Data-Partition-Id: opendes

Optional headers

The Correlation-Id is a traceable ID to track the journey of a single request. The Correlation-Id can be a GUID in the header with a key. It is a best practice to provide the Correlation-Id so that the request can be tracked through all the services.

Correlation-Id: 1e0fef08-22fd-49b1-a5cc-dffa21bc0b70

If the service is initiating the request, an ID should be generated. If the Correlation-Id is not provided, then a new ID will be generated by the service so that the request will be traceable.

The x-collaboration is a header for a new feature that searches records in different namespaces. The x-collaboration now contains id and application parameters, separated by a comma. The id must be a UUID, and it indicates the namespace in which we will be searching. More info about collaboration header can found here.

x-collaboration: id=96d5550e-2b5e-4b84-825c-646339ee5fc7,application=pws

Permissions

Endpoint URL Method Minimum permissions required Data permissions required
/search/v2/query POST users.datalake.viewers Yes
/search/v2/query_with_cursor POST users.datalake.viewers Yes

Normalization

Retrieved data from the OSDU Data Platform is normalized to a common standard that allows for comparison from multiple data sources. We currently support conversion for only Unit, CRS, and DateTime, whose common standards are in SI, WGS84, and UTC respectively.

For any attribute that has a AbstractSpatialLocation schema reference, the coordinates can have attribute named AsIngestedCoordinates using an AbstractAnyCrsFeatureCollection schema reference or WGS84Coordinates attribute using an AbstractFeatureCollection schema reference. However, The search cannot use the AsIngestedCoordinates in a meaningful way, so it does not index the AsIngestedCoordinates.

NOTE: See the indexer service for a feature flag (featureFlag.asIngestedCoordinates.enabled) which can be used to enable indexing of the AsIngestedCoordinates. When enabled, search queries can use a range query on the FirstPoint X and Y coordinates (staring with release M22).

The Indexer service uses Storage service's frame of reference conversion API (/records:batch API) for conversion. If the Storage API returns with a valid converted WGS84Coordinates for the AsIngestedCoordinates, then the converted coordinates will be indexed. If the conversion fails, then the Indexer will not index the shape in the WGS84Coordinates attribute. Indexer service will index conversion error for the record with 400 error code instead. Please refer to Get indexing status for details on index status. Only the WGS84Coordinates are returned in the search response.

How conversion is handled for AsIngestedCoordinates and WGS84Coordinates:

  1. If the WGS84Coordinates block is provided in the record, then we take the values from the block. The Storage /records:batch API does no conversion.
  2. If only the AsIngestedCoordinates block is provided in the record, then the Storage /records:batch API performs the conversion for the coordinates provided. If the conversion fails, then indexing of the attribute is skipped, instead error message is indexed. You can query the Indexing status using Get indexing status.
  3. If BOTH the AsIngestedCoordinates and WGS84Coordinates blocks are provided in the record, then the AsIngestedCoordinates block is ignored, and it takes the values from the WGS84Coordinates block. The Storage /records:batch API does no conversion.

NOTE: If a storage record has correct frame of reference conversion information (meta block), then records are always normalized and indexed according to common standard mentioned above. Users can only perform queries on Search service on standardized indexed records.

Query

The OSDU Data Platform search provides a JSON-style domain-specific language that you can use to execute queries. The Query request URL and example follow:

POST /search/v2/query HTTP/1.1
{
  "kind": "osdu:wks:master-data--Well:1.0.0",
  "query": "data.FacilityName:\"A34\"",
  "offset": 0,
  "limit": 30,
  "sort": {
    "field": ["id"],
    "order": ["ASC"]
  },
  "queryAsOwner": false,
  "spatialFilter": {
    "field": "data.SpatialLocation.Wgs84Coordinates",
    "byBoundingBox": {
      "topLeft": {
        "latitude": 90,
        "longitude": -180
      },
      "bottomRight": {
        "latitude": -90,
        "longitude": 180
      }
    }
  },
  "trackTotalCount": true,
  "returnedFields": [ "kind", "id", "data.FacilityName", "data.SpatialLocation.Wgs84Coordinates", "data.TechnicalAssuranceID" ]
}
cURL
curl --request POST \
  --url '/search/v2/query' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
  --data '{
  "kind": "osdu:wks:master-data--Well:1.0.0",
  "query": "data.FacilityName:\"A34\"",
  "offset": 0,
  "limit": 30,
  "sort": {
    "field": ["id"],
    "order": ["ASC"]
  },
  "queryAsOwner": false,
  "spatialFilter": {
    "field": "data.SpatialLocation.Wgs84Coordinates",
    "byBoundingBox": {
      "topLeft": {
        "latitude": 90,
        "longitude": -180
      },
      "bottomRight": {
        "latitude": -90,
        "longitude": 180
      }
    }
  },
  "trackTotalCount": true,
   "returnedFields": [ "kind", "id", "data.FacilityName", "data.SpatialLocation.Wgs84Coordinates", "data.TechnicalAssuranceID" ]
}'


Example response:

{
  "results": [
    {
      "data": {
        "SpatialLocation.Wgs84Coordinates": {
          "geometries": [
            {
              "coordinates": [
                173.2900972,
                -39.4324222
              ],
              "type": "point"
            }
          ],
          "type": "geometrycollection"
        },
        "TechnicalAssuranceID": "opendes:reference-data--TechnicalAssuranceType:Suitable:",
        "FacilityName": "A34"
      },
      "kind": "osdu:wks:master-data--Well:1.0.0",
      "id": "opendes:master-data--Well:ca3271c789964d54a1c4d873d2c1aef1"
    }
  ],
  "totalCount": 4644
}

Note: : Once the records have been successfully ingested by the Storage service, it can take at least 30 seconds to become searchable via Search service in the OSDU Data Platform. Record level indexing status can be retrieved via index status.

Parameters

Parameter Description
kind The kind of records to query. kind is unique identifier (or a tag) given to the schema. Kind is case-insensitive. For details about the schema, refer to Schema Service. In the query, kind is a required field, and its value can be a single schema identity or a list of schema identities, such as "osdu:wks:master-data--Well:1.0.0" or ["osdu:wks:master-data--Well:1.0.0", "osdu:wks:master-data--Wellbore:1.0.0"].
query The Query string is based on Lucene query string syntax, supplemented with a specific format for describing queries to fields of object arrays indexed with the nested hint. The maximum number of clauses on a query can be 1024.
offset The starting offset from which to return results.
limit The maximum number of results to return from the given offset. If no limit is provided, then it returns 10 items. The minimum & maximum number of items that the query can fetch are 1 & 1000 respectively. (If you wish to fetch a larger set of items, use the query_with_cursor API).
sort Allows you to add one or more sorts on specific fields. The length of fields and the length of order must match. The order value must be either ASC or DESC (case insensitive). For more details and limitations about this feature, refer to Sort.
queryAsOwner If true, the result only contains the records that the user owns. If false, the result contains all records that the user is entitled to see. The default value is false.
spatialFilter A spatial filter to apply. See Geo-spatial queries for details.
trackTotalCount Tracks the accurate record count matching the query if 'true'; otherwise it is a partial count. Partial count queries are more performant. The default is 'false' and returns 10000 if matching records are higher than 10000.
aggregateBy Allows you to get a unique value of a given field, see Aggregate Queries
returnedFields Specifies the fields on which to project the results.

Important: Field names in request parameters are case-sensitive. Field values are case-insensitive, unless you are querying for an exact match with a keyword subfield for the attribute.

Note: The Offset + Limit can not be more than 10,000. See the Query with cursor topic for more efficient ways to do deep scrolling.

Query by kind

kind can be formatted as authority/data-partition-id:data-source-id:entity-type:schema-version and a required field. You can retrieve the available list of kind by using the Storage service(GET /query/kinds API). Users can make search documents by providing kind as shown:

  • Search documents just by providing a single kind:
POST /search/v2/query HTTP/1.1
{
  "kind": "osdu:wks:master-data--Wellbore:1.0.0"
}
cURL
curl --request POST \
  --url '/search/v2/query' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
  --data '{
  "kind": "osdu:wks:master-data--Wellbore:1.0.0"
}'


  • Search documents just by providing a multi-kinds:
POST /search/v2/query HTTP/1.1
{
  "kind": ["osdu:wks:master-data--Well:1.0.0","osdu:wks:master-data--Wellbore:1.0.0"]
}
cURL
curl --request POST \
  --url '/search/v2/query' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
  --data '{
  "kind": ["osdu:wks:master-data--Well:1.0.0","osdu:wks:master-data--Wellbore:1.0.0"]
}'


The query returns up to 10 (default limit) documents for the kind.

Wildcard queries on kind are also supported, refer to Cross kind queries for more information.

Additional kind attributes

The OSDU Data Platform indexer splits kind value and add few new fields (authority, source, namespace & type) on indexed record. These terms can then be queried with the query request parameter.

For example osdu:wks:master-data--Wellbore:1.0.0 will add following new indexed attribute:

{
  "authority": "osdu",
  "source": "wks",
  "namespace": "osdu:wks",
  "type": "master-data--Wellbore"
}

The OSDU Data Platform can be now queried to search based on one of these attributes.

kind case sensitivity

Search does not differentiate kind request parameter case sensitivity while querying records across kinds differing in case.

e.g. if there are 2 records of kind osdu:wks:USER:1.1.0 and 2 records of kind osdu:wks:user:1.1.0. Following query will return 4 records, since Search service considers these 2 kinds to be the same:

POST /search/v2/query HTTP/1.1
{
  "kind": ["osdu:wks:user:1.1.0"]
}

Text queries

The OSDU Data Platform provides comprehensive query options in Lucene query syntax. The query string is parsed into a series of terms and operators. A term can be a single word, such as "producing" or "well", or a phrase, surrounded by double quotes, such as "producing well", which searches for all the words in the phrase, in the same order. The default operator for the query is OR.

You can search a field in the document <field-name>:<value>. If field is not defined, then it defaults to all queryable fields, and the query will automatically attempt to determine the existing fields in the index’s mapping that are queryable, and perform the search on those fields.

The query language is quite comprehensive and can be intimidating at first glance, but the best way to actually learn it is to start with a few basic examples.

Note: kind is a required parameter and is omitted for brevity in following examples. Also, all storage record properties are in data block. Any reference to a field inside the block should be prefixed with data.

Examples

  • Search all fields that contain the text 'well':
{
  "query": "well"
}

Note: If <field-name> is not specified, the query string will automatically attempt to determine the existing fields in the index’s mapping that are queryable, and perform the search on those fields. The search query will be more performant if field names are specified in the query instead of searching across all queryable attributes. The following examples cover this:

  • Where the Basin field contains "Permian":
{
  "query": "data.Basin:Permian"
}
  • Where the Rig_Contractor field contains "Ocean" or "Drilling". OR is the default operator:
{
  "query": "data.Rig_Contractor:(Ocean OR Drilling)"
}

or

{
  "query": "data.Rig_Contractor:(Ocean Drilling)"
}
  • Where the Rig_Contractor field contains the exact phrase "Ocean Drilling":
{
  "query": "data.Rig_Contractor:\"Ocean Drilling\""
}

The Search service offers additional query patterns to query precise values. For details see exact match.

  • Where any of the fields ValueList.OriginalValue, ValueList.Value, or ValueList.AppDataType contains "PRODUCING" or "DUAINE". (Note that you need to escape the * with a backslash.)
{
  "query": "data.ValueList.\\*:(PRODUCING DUAINE)"
}

text field indexing

By default, search back-end server analyzes the values of text fields & text array fields (including text fields with nested x-osdu-indexing hints) during indexing. The Indexer service analyzer changes text field values as follows:

  • Removes most punctuation & prepositions.
  • Divides the remaining content into individual words, called tokens.
  • Changes the tokens to lowercase.

To better support a precise exact match, aggregation & sort on text field, an additional field is indexed (on kinds indexed after April 2021) for every text field, that is not analyzed (as per rules mentioned above). As an example, if record has a text field named data.name, then the indexer will add the non-analyzed field: data.name.keyword. Newly added text fields can be identified as: field-name.keyword. Also installations with keywordLower feature flag enabled have additional keywordLower field, allowing case agnostic precise search.

Note 1: The keyword and keywordLower field value can have a maximum of 256 characters, and only exact match is supported for this field, so a partial field value query will not return any response. If a text field is longer than 256 characters, then both keyword and keywordLower fields will have only the first 256 characters.

Note 2: text array fields indexed with flattened indexing hint are non-analyzed during indexing by default and do not require keyword subfield. Exact match, aggregations and sort queries on such fields do not require keyword suffix.

Exact match

Use the exact match query to search records based on a precise value using keyword subfield mentioned here, such as well ID, name, etc. on text fields.

As indexed keyword subfield is not analyzed, query on this field is case sensitive & no escaping is required for special characters covered in the reserved characters section.

Here is an example query, it two special characters, space and period, without any escaping:

{
    "query": "data.name.keyword:\"Spillpath DA no.109\""
}

The keywordLower (OSDU Data Platform deployment with keywordLower feature enabled) subfield has only one difference - it is allowing non-case sensitive search. Example:

{
    "query": "data.name.keywordLower:\"spillpath da no.109\""
}

Query null or empty values

text field's keyword subfield can also be utilized to query records by null or empty value on text attributes. null value search/index workflows are only supported on text fields.

Here is a sample query to search null value:

{
   "kind": "osdu:wks:master-data--Well:1.0.0",
   "query": "data.FacilityID.keyword:null"
}

Here is an example query to search empty value:

{
   "kind": "osdu:wks:master-data--Well:1.0.0",
   "query": "data.FacilityID.keyword:\"\""
}

Exists query

Returns documents that contain an indexed value for a field. Use the _exists_ prefix for a field to search to see if the field exists.

While a text field is deemed non-existent if the JSON value is null, following values will indicate the field does exist:

  • Empty strings, such as "".
  • keyword subfield with explicit null value.

Similarly text array field considered non-existent if the JSON value is null or [], text arrays containing null and another value , such as [null, "abc"] indicates the field does exist.

Example request:

  • Where query returns if the text field Status has any non-null value.
{
  "query": "_exists_:data.Status"
}

Reserved characters

If you need to use any of the characters which function as operators in your query itself (and not as operators), then you must escape them with a leading backslash. For example, to search for (1+1)=2, you would need to write your query as \(1\+1\)\=2.

The reserved characters are: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /

Failing to escape these special characters correctly could lead to a syntax error which prevents your query from running.

Note: < and > can’t be escaped at all. The only way to prevent them from attempting to create a range query is to remove them from the query string entirely.

Wildcards

Wildcard searches can be run on individual terms using ? to replace a single character and * to replace zero or more characters.

{
  "query": "data.Rig_Contractor:Oc?an Dr*"
}

Be aware that wildcard queries can use an enormous amount of memory and therefore can effect the performance. They should be used very sparingly.

Note: Leading wildcards are disabled by the OSDU Data Platform Search service. Allowing a wildcard at the beginning of a word, such as "*ean", is particularly heavy because all the terms in the index need to be examined, just in case they match.

Grouping

Multiple terms or clauses can be grouped together with parentheses to form sub-queries.

{
  "query": "data.Rig_Contractor:(Ocean OR Drilling) AND Exploration NOT Basin"
}

Date Format

If you need to use the date in your query, it must be in one of the following formats:

 date-opt-time = date-element ['T' [time-element] [offset]]

 Example : 2017-12-29T00:00:00.987

 Please note that the time element is optional
 date-element = std-date-element 

 std-date-element  = yyyy ['-' MM ['-' dd]]

 Example: 2017-12-29
 time-element = HH [minute-element] | [fraction]

 minute-element = ':' mm [second-element] | [fraction]

 second-element = ':' ss [fraction]

 fraction = ('.' | ',') digit+

 offset = 'Z' | (('+' | '-') HH [':' mm [':' ss [('.' | ',') SSS]]])

For more information, refer to Date format.

Query nested arrays objects

Starting with OSDU's M6 release, you can set nested hints in a data scheme's object array nodes. It leads to accurate indexing of those arrays objects in the underlying search backend.

nested attributes can be queried using the Search service in the form of the nested() function:

  • For one level "nested array":
{
  "query": "nested(<path-to-root-nested-array-node>, <root-nested-array-object-fields-query>)"
}
  • For nested (multi-level) "nested array" queries:
{
  "query": "nested(<path-to-root-nested-array-node>, nested(<path-to-subrootA-nested-array-node>, <subrootA-nested-array-object-fields-query>))"
}

Multi-level nested queries are not limited in their depth. You nest them as required by the particular schema.

In the examples below, you can see several examples of the root and multi-level nested queries. The syntax of those queries is the same as described in the previous sections. The only distinction is that their conditions are scoped by their own fields of objects of the array, pointed in the first argument of the current nested(path,(conditions)) function.

Single-level one condition nested query

  • Where work-product-component--WellboreMarkerSet has any marker with MarkerMeasuredDepth field value greater than 10000:
{
  "kind" : "osdu:wks:work-product-component--WellboreMarkerSet:1.0.0",
  "query": "nested(data.Markers, (MarkerMeasuredDepth:(>10000)))"
}

Single-level several conditions nested query

  • Where work-product-component--WellboreMarkerSet has any marker with VerticalMeasurement field value greater than 100 and VerticalMeasurementPathID field value is osdu-openness:reference-data--VerticalMeasurementPath:ELEV::
{
    "kind": "osdu:wks:master-data--Wellbore:1.0.0",
    "query": "nested(data.VerticalMeasurements, (VerticalMeasurement:(>100) AND VerticalMeasurementPathID:\"osdu-openness:reference-data--VerticalMeasurementPath:ELEV:\"))"
}

Combination of single-level nested queries

  • Where work-product-component--WellboreMarkerSet has any marker with MarkerMeasuredDepth field value greater 10000 or SurfaceDipAzimuth field value less than 360:
{
  "kind" : "osdu:wks:work-product-component--WellboreMarkerSet:1.0.0", 
  "query":"nested(data.Markers, (MarkerMeasuredDepth:(>10000))) OR nested(data.Markers, (SurfaceDipAzimuth:(<360)))"
}

Multi-level nested queries

Assume a marker object has a nested Revisions array of Revision objects that have the fields: RevisionDate and RevisionEngineer. An indexed document might then look like this:

{
  "data": {
    "Markers": [
      {
        "MarkerMeasuredDepth": 12345.6,
        "PositiveVerticalDelta": 12345.6,
        "Revisions": [
          {
            "RevisionDate": "2020-02-13T09:13:15.55+0000",
            "RevisionEngineer": "John Smith"
          }
        ]
      }
    ]
  }
}

You might want to search for work-product-component--WellboreMarkerSet that has any marker revised on a certain date by a certain engineer:

{
  "kind" : "osdu:wks:work-product-component--WellboreMarkerSet:1.0.0", 
  "query":"nested(data.Markers, nested(data.Markers.Revisions, (RevisionDate:\"2020-02-13T09:13:15.55+0000\" AND RevisionEngineer:\"John Smith\")))"
}

Nested and non-nested queries parts combinations

We can combine both types of queries in one request, such as in the following example:

{
  "kind" : "osdu:wks:work-product-component--WellboreMarkerSet:1.0.0",
  "query":"data.Name:\"Example Name\" AND nested(data.Markers, (MarkerMeasuredDepth:(>10000)))"
}

Note: Supported boolean operators for nested queries are AND, OR, NOT. These operators are case-sensitives.

Filtering using grouping with nested syntax

The nested query parser throws an exception if using a grouping with nested syntax due to the current nested query parser. As a workaround, you can rewrite the query so that it does not involve grouping.

Case 1:

Instead of:

{
  "kind":"*:*:*:*",
  "query":"((nested(data.VerticalMeasurements, (VerticalMeasurementID:\"KB\"))) AND (( kind: \"opendes:welldb:wellbore:1.1.3\") OR (kind: \"opendes:wks:wellbore:1.0.0\") OR ( kind: \"opendes:welldb:wellbore:1.1.4\"))) AND NOT data.DocumentRelationshipType:\"child\" NOT type:\"page\""
}

Use:


{
    "kind": "*:*:*:*",
    "query": "nested(data.VerticalMeasurements, (VerticalMeasurementID:\"KB\")) AND (kind: \"opendes:welldb:wellbore:1.1.3\" OR kind: \"opendes:wks:wellbore:1.0.0\" OR kind: \"opendes:welldb:wellbore:1.1.4\") NOT data.DocumentRelationshipType:\"child\" NOT type:\"page\""
}

Case 2:

Instead of:


{
  "kind":"*:*:*:*",
  "query":"((nested(data.VerticalMeasurements, (VerticalMeasurementID:\"KB\"))) AND (kind: \"opendes:wks:wellbore:1.0.0\")) AND NOT data.DocumentRelationshipType:\"child\" NOT type:\"page\""
}

Use:


{
    "kind": "*:*:*:*", 
    "query": "nested(data.VerticalMeasurements, (VerticalMeasurementID:\"KB\")) AND kind: \"opendes:wks:wellbore:1.0.0\" NOT data.DocumentRelationshipType:\"child\" NOT type:\"page\""
}

Aggregation

Allows user to get the unique value of a field specified by the aggregateBy request parameter. It supports text, numeric, and boolean fields. A maximum of 1000 unique values can be returned by this request.

Here is sample query:

POST /search/v2/query HTTP/1.1
{
  "kind": "osdu:wks:*:*",
  "aggregateBy": "kind"
}
cURL
curl --request POST \
  --url '/search/v2/query' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
  --data '{
  "kind": "osdu:wks:*:*",
  "aggregateBy": "kind"
}'


Example Response:

{
  "results": [
    {
    },
    {
    }
  ],
  "aggregations": [
    {
      "key": "osdu:wks:master-data--Wellbore:1.2.0",
      "count": 1058850
    },
    {
      "key": "osdu:wks:master-data--Wellbore:1.0.0",
      "count": 615603
    },
    {
      "key": "osdu:wks:work-product-component--Document:1.0.0",
      "count": 345894
    }
  ],
  "totalCount": 10000
}

Please see Aggregation by nested arrays objects & Aggregation on text fields sections for more details on aggregations.

Aggregation by nested arrays objects

nested attributes can be aggregated by using the nested(<path-to-root-nested-array-node>, <root-nested-array-object-fields-query>) function.

{
  "kind" : "osdu:wks:work-product-component--WellboreMarkerSet:1.0.0", 
  "aggregateBy": "nested(data.Markers, MarkerMeasuredDepth)"
}

Aggregation on text fields

As mentioned on text field indexing section, a non-analyzed subfield is added on each text field (including fields decorated with nested indexing hints on schema) to enable aggregations workflow. This can be utilized to perform aggregation query on text fields.

  • Aggregate by FacilityName:
{
  "kind": "osdu:wks:*:*",
  "aggregateBy": "data.FacilityName.keyword"
}
  • Aggregate by nested attribute TechnicalAssuranceTypeID:
{
  "kind": "osdu:wks:work-product-component--SeismicTraceData:*",
  "aggregateBy": "nested(data.TechnicalAssurances, TechnicalAssuranceTypeID.keyword)"
}

Fields decorated with flattened indexing hints on schema are by default non-analyzed. They do not require any subfield to enable aggregations.

  • Aggregate by flattened attribute FacilitySpecificationText:
{
  "kind": "osdu:wks:master-data--Wellbore:1.0.0",
  "aggregateBy": "data.FacilitySpecifications.FacilitySpecificationText"
}

Caution: aggregations on response may be empty if correct field is not supplied on aggregation query e.g. missing keyword suffix on a text field inside data block etc.

Highlight

Specifying optional field highlightedFields will add to each result additional dictionary under key highlight with phrases from selected fields which matched the query. Hightlight is working only for text and keyword fields. Specifying a field of different type or not existing in the mapping is causing that such field is ignored. There can be maximally 5 fragments up to 200 characters each for a single field.

Request:

{
  "kind": "*:*:master-data--Well:*",
  "query": "Example",
  "highlightedFields": ["data.FacilityName", "data.NameAliases.*"]
}

Record:

{
  "kind": "osdu:wks:master-data--Well:1.0.0",
  "id": "osdu:master-data--Well:example",
  "data": {
    "FacilityName": "Example test"
    "NameAliases": [
      {
        "AliasName": "Example test"
      },
      {
        "AliasName": "Example test 2" 
      },
      {
        "AliasName": "Another name" 
      }
    ],
    "Source": "Example test"
  }
}

Search response:

{
  "results": [
    {
      "data": {
          ...
      },
      "kind": "osdu:wks:master-data--Well:1.0.0",
      "id": "osdu:master-data--Well:example",
      "highlight": {
        "data.FacilityName": [
          "<em>Example</em> test"
        ],
        "data.NameAliases.AliasName": [
          "<em>Example</em> test",
          "<em>Example</em> test 2"
        ]
      }
    },
     ...
  ]
}

Sort

The sort query allows you to add one or more sorts on specific fields. Each sort can be reversed as well.

The sort feature supports text, int, float, double, long, datetime, nested object, and nested array of objects. Sorting on a geo-point, or a geo-shape type is not supported.

The records either do not have the sorted fields or have empty values that are listed last in the result.

Consider following scenarios:

  1. The opendes data partition has two kinds for welldb data source: opendes:welldb:wellbore:1.0.0 and opendes:welldb:well:1.0.0.
  2. The data.Id in opendes:welldb:wellbore:1.0.0 has been ingested as int, but data.Id in opendes:welldb:well:1.0.0 has been ingested as text.
  3. opendes:welldb:wellbore:1.0.0 has 10 records in total and 5 of them have an empty value in the data.Id field.
  4. opendes:welldb:well:1.0.0 also has 10 records in total and all of them have values in the data.Id field.
POST /search/v2/query HTTP/1.1
{
  "kind": "opendes:welldb:*:*",
  "sort": {
    "field": ["data.Id"],
    "order": ["ASC"]
  }
}
cURL
curl --request POST \
  --url '/search/v2/query' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
  --data '{
  "kind": "opendes:welldb:*:*",
  "sort": {
    "field": ["data.Id"],
    "order": ["ASC"]
  }
}'


The above request payload asks the Search service to sort on data.Id in an ascending order, and the expected response will have "totalCount: 10" instead of 20. Note that the 10 returned records are only from opendes:welldb:wellbore:1.0.0 because the data.Id in opendes:welldb:well:1.0.0 is of data type text will not be returned (see Sort on text fields for details), and it should list the 5 records which have an empty data.Id value at last.

Note: The Search service does not validate the provided sort field, whether it exists or is of the supported data types. Different kinds may have attributes with the same names, but are different data types. Therefore, it is user's responsibility to be aware and validate this in their own workflow.

The sort query could be very expensive, especially if the given kind is too broad, such as "kind": "*:*:*:*". The current time-out threshold is 60 seconds. A 504 error, "Request timed out after waiting for 1m" will be returned if the request times out. Consider making the kind parameter as narrow as possible while using the sort feature.

Sort on text fields

As mentioned on text field indexing section, a non-analyzed subfield is added on each text field to enable sort workflow. This can be utilized to perform sort query on text fields.

  • Sort by FacilityName
POST /search/v2/query HTTP/1.1
{
  "kind": "osdu:wks:*:*",
  "sort": {
    "field": ["data.FacilityName.keyword"],
    "order": ["ASC"]
  }
}

To avoid effect that uppercase letters are before lowercase letters, query can use keywordLower instead of keyword subfield.

cURL
curl --request POST \
  --url '/search/v2/query' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
  --data '{
  "kind": "osdu:wks:*:*",
  "sort": {
    "field": ["data.FacilityName.keyword"],
    "order": ["ASC"]
  }
}'


Sort on nested text fields

Sorting on text field decorated with nested indexing hints requires following syntax:

nested(path, field, mode)

mode can have following possible values: - min: sort by minimum value in the array. - max: sort by maximum value in the array.

Sorting on nested fields allow to specify condition filter which object on nested path has to fulfill to be taken into account in mode function. Often repeating the query part referring to nested path is useful. Filter is attached to top level nested field in case of sorting on field nested multiple times. Filter syntax is the same as query top level parameter, however entire query has to be within nested() context.

  • Sort by nested attribute FacilityEventTypeID where FacilityEventTypeID value is unequal to test
POST /search/v2/query HTTP/1.1
{
  "kind": "osdu:wks:master-data--Wellbore:1.1.0",
  "sort": {
    "field": [
          "nested(data.FacilityEvents, FacilityEventTypeID.keyword, min)"
    ],
    "order": [
        "ASC"
    ],
    "filter": [
        "nested(data.FacilityEvents, (NOT FacilityEventTypeID.keyword:test))"
    ]
  }
}
cURL
curl --request POST \
  --url '/search/v2/query' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
  --data '{
  "kind": "osdu:wks:master-data--Wellbore:1.1.0",
  "sort": {
    "field": [
          "nested(data.FacilityEvents, FacilityEventTypeID.keyword, min)"
    ],
    "order": [
        "ASC"
    ],
    "filter": [
        "nested(data.FacilityEvents, (NOT FacilityEventTypeID.keyword:test))"
    ]
  }
}'


Please take a look at nested sort documentation for more details & examples.

Range queries

Ranges can be specified for date, numeric, or text fields. Inclusive ranges are specified with square brackets [min TO max] and exclusive ranges with curly brackets {min TO max}. Here are some of the examples:

  • All SpudDate in 2012:
{
  "query": "data.SpudDate:[2012-01-01 TO 2012-12-31]"
}
  • Count 1..5:
{
  "query": "data.Count:[1 TO 5]"
}
  • Count from 10 upwards:
{
  "query": "data.Count:[10 TO *]"
}
  • Ranges with one side unbounded can use the following syntax:
{
  "query": "data.ProjDepth:>10"
}
  • Combine an upper and lower bound with the simplified syntax, you would need to join two clauses with an AND operator:
{
  "query": "data.ProjDepth:(>=10 AND <20)"
}
  • jobStatus tags between IN_PROGRESS & SUCCESS:
{
  "query": "tags.jobStatus:{IN_PROGRESS TO SUCCESS}"
}

Geo-spatial queries

The OSDU Data Platform supports geo-point (lat/lon pairs) & geo-shape based on GeoJson standard. The spatialFilter and query groups in the request have an AND relationship. If both of the criteria are defined in the query, then the Search service will return results which match both clauses.

The queries in this group are Geo distance, Geo polygon, and Bounding box. Only one spatial criteria can be used while defining the filter.

Note 1: Geo-spatial fields, which are indexed with GeoJSON FeatureCollection payload, in the Search service query response have a different structure compared to storage records and are optimized for search use-case. These are no valid GeoJSON. To retrieve a valid GeoJSON, use the Storage service's record API.

Note 2: Search backend requires all geo-shape to be GeoJSON and OGC standard complaint. User may see indexing issues if geo-shapes are not complaint. Most common issue violating these standards are geo-shapes with duplicate coordinates or self-intersecting polygon etc. Once users have retrieved record level indexing status or error message via index status, they are expected to fix the geo-shape and re-try ingestion to address such issues.

Geo distance query

Filters documents that include only hits that exist within a specific distance from a geo point.

POST /search/v2/query HTTP/1.1
{
  "kind": "osdu:wks:master-data--Wellbore:1.0.0",
  "spatialFilter": {
    "field": "data.ProjectedBottomHoleLocation.Wgs84Coordinates",
    "byDistance": {
      "point": {
        "latitude": 37.450727,
        "longitude": -122.174762
        },
        "distance": 1500
    }
  },
  "offset": 0,
  "limit": 30
}
cURL
curl --request POST \
  --url '/search/v2/query' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
  --data '{
  "kind": "osdu:wks:master-data--Wellbore:1.0.0",
  "spatialFilter": {
    "field": "data.ProjectedBottomHoleLocation.Wgs84Coordinates",
    "byDistance": {
      "point": {
        "latitude": 37.450727,
        "longitude": -122.174762
        },
        "distance": 1500
    }
  },
  "offset": 0,
  "limit": 30
}'


Parameter Description
field The geo-point or geo-shape field in the index on which filtering will be performed.
distance The radius of the circle centered on the specified location. Points which falls within this circle are considered to be matches. The distance can be specified in various units. See Distance units.
point.latitude Latitude of field.
point.longitude Longitude of field.

Distance units

If no unit is specified, then the default unit of the distance parameter is meter. Distance can be specified in other units, such as "1km" or "2mi" (2 miles).

Note: In the current version, the Search API only supports distance in meters. In future versions, distance in other units will be made available. The maximum value of distance is 1.5E308.

Bounding box query

A query allowing you to filter hits based on a point location within a bounding box.

POST /search/v2/query HTTP/1.1
{
  "kind": "osdu:wks:master-data--Wellbore:1.0.0",
  "spatialFilter": {
    "field": "data.ProjectedBottomHoleLocation.Wgs84Coordinates",
    "byBoundingBox": {
      "topLeft": {
        "latitude": 37.450727,
        "longitude": -122.174762
        },
      "bottomRight": {
        "latitude": 37.438485,
        "longitude": -122.156110
      }
    }
  },
  "offset": 0,
  "limit": 30
}
cURL
curl --request POST \
  --url '/search/v2/query' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
  --data '{
  "kind": "osdu:wks:master-data--Wellbore:1.0.0",
  "spatialFilter": {
    "field": "data.ProjectedBottomHoleLocation.Wgs84Coordinates",
    "byBoundingBox": {
      "topLeft": {
        "latitude": 37.450727,
        "longitude": -122.174762
        },
      "bottomRight": {
        "latitude": 37.438485,
        "longitude": -122.156110
      }
    }
  },
  "offset": 0,
  "limit": 30
}'


Parameter Description
field The geo-point or geo-shape field in the index on which filtering will be performed.
topLeft.latitude The latitude of top left corner of bounding box.
topLeft.longitude The longitude of top left corner of bounding box.
bottomRight.latitude The latitude of bottom right corner of bounding box.
bottomRight.longitude The longitude of bottom right corner of bounding box.

Geo polygon query

A query allowing you to filter hits that only fall within a closed polygon.

POST /search/v2/query HTTP/1.1
{
  "kind": "osdu:wks:master-data--Wellbore:1.0.0",
  "spatialFilter": {
    "field": "data.ProjectedBottomHoleLocation.Wgs84Coordinates",
    "byGeoPolygon": {
      "points": [
        {"longitude":-90.65, "latitude":28.56},
        {"longitude":-90.65, "latitude":35.56},
        {"longitude":-85.65, "latitude":35.56},
        {"longitude":-85.65, "latitude":28.56},
        {"longitude":-90.65, "latitude":28.56} 
      ]
    }
  },
  "offset": 0,
  "limit": 30
}
cURL
curl --request POST \
  --url '/search/v2/query' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
  --data '{
  "kind": "osdu:wks:master-data--Wellbore:1.0.0",
  "spatialFilter": {
    "field": "data.ProjectedBottomHoleLocation.Wgs84Coordinates",
    "byGeoPolygon": {
     "points": [
        {"longitude":-90.65, "latitude":28.56},
        {"longitude":-90.65, "latitude":35.56},
        {"longitude":-85.65, "latitude":35.56},
        {"longitude":-85.65, "latitude":28.56},
        {"longitude":-90.65, "latitude":28.56} 
      ]
    }
  },
  "offset": 0,
  "limit": 30
}'


Parameter Description
field The geo-point or geo-shape field in the index on which filtering will be performed.
points The list of geo-point describing polygon.

Geo polygon intersection query

A query allowing you to filter hits intersecting a closed polygon.

POST /search/v2/query HTTP/1.1
{
  "kind": "osdu:wks:master-data--Wellbore:1.0.0",
  "spatialFilter": {
    "field": "data.ProjectedBottomHoleLocation.Wgs84Coordinates",
    "byIntersection": {
      "polygons": [
        {
          "points": [
            {"longitude":-90.65, "latitude":28.56},
            {"longitude":-90.65, "latitude":35.56},
            {"longitude":-85.65, "latitude":35.56},
            {"longitude":-85.65, "latitude":28.56},
            {"longitude":-90.65, "latitude":28.56} 
          ]
        }
      ]
    }
  },
  "offset": 0,
  "limit": 30
}
cURL
curl --request POST \
  --url '/search/v2/query' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
  --data '{
  "kind": "osdu:wks:master-data--Wellbore:1.0.0",
  "spatialFilter": {
    "field": "data.ProjectedBottomHoleLocation.Wgs84Coordinates",
    "byIntersection": {
      "polygons": [
        {
          "points": [
            {"longitude":-90.65, "latitude":28.56},
            {"longitude":-90.65, "latitude":35.56},
            {"longitude":-85.65, "latitude":35.56},
            {"longitude":-85.65, "latitude":28.56},
            {"longitude":-90.65, "latitude":28.56} 
          ]
        }
      ]
    }
  }
  },
  "offset": 0,
  "limit": 30
}'


Parameter Description
field The geo-point or geo-shape field in the index on which filtering will be performed.
points The list of geo-point describing polygon.

Phrase completion (autocomplete) preview

Feature available on OSDU deployments with autocomplete feature flag enabled and bagOfWords indexer feature enabled. Users can retrieve phrase completions along with or instead of the results that may help them build more accurate queries. Suggestion behavior currently is based on completion suggester.

POST /search/v2/query HTTP/1.1
{
  "kind": "osdu:wks:master-data--WellPlanningWellbore:1.0.0",
  "query": "awseastusa",
  "suggestPhrase": "someuniquesurveyprogramid",
}

Response:
{
    "results": [...],
    "aggregations": ...,
    "phraseSuggestions": [
        "osdu:master-data--SurveyProgram:SomeUniqueSurveyProgramID:"
    ],
    "totalCount": ...
}

Query with cursor

While a search request returns a single page of results, the query_with_cursor API can be used to retrieve large numbers of results, or even all results, from a single search request, in much the same way as you would use a cursor on a traditional database.

The Cursor API is not intended for real-time user requests, but rather for processing large amounts of data.

The parameters passed in the request body are exactly the same as the query API with few exceptions. offset & aggregateBy are not valid parameters & trackTotalCount is always true in query_with_cursor API.

Note: The results that are returned from a query_with_cursor request reflect the state of the index at the time that the initial search request was made, like a snapshot in time. Subsequent changes to documents (index, update, or delete) only affect future search requests.

In order to use the query_with_cursor request, the initial search request should use the following endpoint:

POST /search/v2/query_with_cursor HTTP/1.1
{
  "kind": "osdu:wks:master-data--Well:1.0.0",
  "query": "data.FacilityName:\"A34\"",
  "limit": 30,
  "spatialFilter": {
    "field": "data.SpatialLocation.Wgs84Coordinates",
    "byBoundingBox": {
      "topLeft": {
        "latitude": 48.450727,
        "longitude": -122.174762
      },
      "bottomRight": {
        "latitude": 37.450727,
        "longitude": -122.174762
      }
    }
  },
  "returnedFields": [ "id", "data.FacilityName" ]
}
cURL
curl --request POST \
  --url '/search/v2/query_with_cursor' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
  --data '{
  "kind": "osdu:wks:master-data--Wellbore:1.0.0",
  "query": "data.FacilityName:\"A34\"",
  "limit": 30,
  "spatialFilter": {
    "field": "data.SpatialLocation.Wgs84Coordinates",
    "byBoundingBox": {
      "topLeft": {
        "latitude": 48.450727,
        "longitude": -122.174762
      },
      "bottomRight": {
        "latitude": 37.450727,
        "longitude": 22.174762
      }
    }
  },
  "returnedFields": [ "id", "data.FacilityName" ]
}'


The successful response from the above request will include a "cursor", which should be passed to the next call of the query_with_cursor API in order to retrieve the next batch of results.

POST /search/v2/query_with_cursor HTTP/1.1
{
  "kind": "osdu:wks:master-data--Well:1.0.0",
  "cursor": "cursor-key"
}
cURL
curl --request POST \
  --url '/search/v2/query_with_cursor' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
  --data '{
  "kind": "osdu:wks:master-data--Wellbore:1.0.0",
  "cursor": "cursor-key"
}'


Caution: As next batches of results are retrieved by the query_with_cursor API. API users should not expect a different cursor value in each query_with_cursor response.

Note: To process the next query_with_cursor request, the Search service keeps the search context alive for 1 minute, which is the time required to process the next batch of results. Each cursor request sets a new expiry time. The cursor will expire after 1 minute and will not return any more results if the requests are not made within the specified time.

Cross kind queries

The OSDU Data Platform search supports cross kind queries. A typical kind can be formatted as authority/data-partition-id:data-source-id:entity-type:schema-version. Each text partitioned by ':' can be replaced with wildcard characters to support cross kind search.

  • Search across all data-sources, types, and versions for opendes authority:
POST /search/v2/query HTTP/1.1
{
  "kind": "opendes:*:*:*"
}
cURL
curl --request POST \
  --url '/search/v2/query' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
  --data '{
  "kind": "opendes:*:*:*"
}'


  • Search across all data-sources and type wells with schema version 1.0.0:
POST /search/v2/query_with_cursor HTTP/1.1
{
  "kind": "opendes:*:well:1.0.0"
}
cURL
curl --request POST \
  --url '/search/v2/query' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
  --data '{
  "kind": "opendes:*:well:1.0.0"
}'


  • Search across all types and versions for the welldb namespace in opendes:
POST /search/v2/query HTTP/1.1
{
  "kind": "opendes:welldb:*:*"
}
cURL
curl --request POST \
  --url '/search/v2/query' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
  --data '{
  "kind": "opendes:welldb:*:*"
}'


Common discovery within and across kind via VirtualProperties

A single schema can define multiple properties for geo-spatial data. For example Wellbore schema defines both the GeographicBottomHoleLocation and ProjectedBottomHoleLocation properties. The json key used for spatial data is also not consistent across schemas.

This causes issues for common Search workflows like finding all entities that exist within a given area. This is because users don't know what property to query against for each type so to find all entities in a given area is complicated.

Looking beyond spatial data this is a common problem across different data types, for instance in a Wellbore schema the name is represented by the property FacilityName however this key is not used for the name in other schemas.

OSDU's M10 release has introduced a new x-osdu-virtual-properties property on Schemas to address common discovery within and across kind. This optional attribute defines a common property mapping. x-osdu-virtual-properties can be used to map any properties to a new property name that can be used for consumption. Schemas can then declare the same virtual property to allow easier cross schema consumption. Indexer service indexes new attributes based on x-osdu-virtual-properties property declaration.

Here is an example of query on one such property:

POST /search/v2/query HTTP/1.1
{
  "kind": "*:*:*:*", 
  "spatialFilter": { 
    "field": "data.VirtualProperties.DefaultLocation.Wgs84Coordinates", 
    "byGeoPolygon": { 
      "points": [
         {"longitude":-90.65, "latitude":28.56}, 
         {"longitude":-90.65, "latitude":35.56}, 
         {"longitude":-85.65, "latitude":35.56}, 
         {"longitude":-85.65, "latitude":28.56}, 
         {"longitude":-90.65, "latitude":28.56} 
      ]
    }
  }
}
cURL
curl --request POST \
  --url '/search/v2/query' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
  --header 'data-partition-id: opendes' \
  --data '{
  "kind": "*:*:*:*",
  "spatialFilter": { 
    "field": "data.VirtualProperties.DefaultLocation.Wgs84Coordinates", 
    "byGeoPolygon": { 
      "points": [ 
         {"longitude":-90.65, "latitude":28.56}, 
         {"longitude":-90.65, "latitude":35.56}, 
         {"longitude":-85.65, "latitude":35.56}, 
         {"longitude":-85.65, "latitude":28.56}, 
         {"longitude":-90.65, "latitude":28.56} 
      ]
    }
  }
}'


More information on supported default VirtualProperties can be found on Data definition's schema documentation.

Note: The virtual property declared is never added to the Storage record and used by Indexer service to index new attribute and make the data discoverable based on this property.

Exclude kinds with authority as system-meta-data in wildcard query

Some applications or systems may need to have its system meta-data searchable via OSDU search but the system meta-data are not expected to be included in the search results of normal keyword search. In order to exclude the system meta-data in normal search, OSDU community proposed "system-meta-data" as the reserved authority for the system meta-data kinds that are excluded if they are not explicitly specified in the query.

For example, assuming there is a system kind called system-meta-data:schema-service:schema:1.0.0 for schema metadata. When users try to search data with keyword wellbore as below:

{
  "kind": "*:*:*:*",
  "query": "wellbore"
} 

The meta-data from the kind system-meta-data:schema-service:schema:1.0.0 will be excluded from the search result by default.

In order to search meta-data with keyword wellbore from the kind system-meta-data:schema-service:schema:1.0.0, user should explicitly specify the kind as the example below:

{
  "kind": "system-meta-data:schema-service:schema:1.0.0",
  "query": "wellbore"
} 

References to OSDU data-definitions documents: - 6.1.2 Record kind - Appendix D.1.3 Schema Identifier kind Limitations

Version info API

Provides build and git related information for Search service.

GET /api/search/v2/info HTTP/1.1
cURL
curl --request GET \
  --url '/search/v2/query' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'data-partition-id: opendes'


Example response:

{
  "groupId": "org.opengroup.osdu",
  "artifactId": "search-azure",
  "version": "0.19.3",
  "buildTime": "2023-05-08T18:57:10.854Z",
  "branch": "master",
  "commitId": "e39447ef448287538a273dc46393b3d9f795c0c1",
  "commitMessage": "Merged PR 20150: Use 0.20.0-rc5 of core lib azure",
  "connectedOuterServices": [
    {
      "name": "elasticSearch",
      "version": "..."
    },
    {
      "name": "redis",
      "version":"..."
    }
  ]
}

This endpoint takes information from files generated by spring-boot-maven-plugin, git-commit-id-plugin plugins. Need to specify paths for generated files to matching properties:

  • version.info.buildPropertiesPath
  • version.info.gitPropertiesPath

Get indexing status

The Indexer service adds internal metadata to each record which registers the status of the indexing. The metadata includes the status and the last indexing date and time. This additional meta block helps to see the details of indexing. The format of the index meta block is as follows:

{
  "index": {
      "trace": [
          "Type: String",
          "Type: String"
      ],
      "statusCode": "Type: Integer",
      "lastUpdateTime": "Type: Datetime"
  }
}

Details of the index block:

  1. trace: This field collects all the issues related to the indexing and concatenates using '|'. This is a string field.

  2. statusCode: This field determines the category of the error. This is an integer field. It can have the following values: - 200 - All OK - 404 - Schema is missing in Schema service. - 400 - Some fields were not properly mapped with the schema defined, such as the schema defined as int for field, but the input record had an attribute value of text etc.

  3. lastUpdateTime: This field captures the last time the record was updated by the Indexer service. This is datetime field, so you can do range queries on this field.

You can query the index status using the following example query:

POST /search/v2/query HTTP/1.1
{
  "kind": "*:*:*:*",
  "query": "index.statusCode:404",
  "limit": 1000,
  "returnedFields": [ "id", "index" ]
}
cURL
curl --request POST \
  --url /search/v2/query \
  --header 'Authorization: Token' \
  --header 'Content-Type: application/json' \
  --header 'Data-Partition-Id: opendes' \
  --data '{"kind": "*:*:*:*","query": "index.statusCode:404","returnedFields": ["index"]}'


Example response:

{
    "results": [
        {
            "index": {
                "trace": [
                    "datetime parsing error: unknown format for attribute: endDate | value: 9000-01-01T00:00:00.0000000",
                    "datetime parsing error: unknown format for attribute: startDate | value: 1990-01-01T00:00:00.0000000"
                ],
                "statusCode": 400,
                "lastUpdateTime": "2018-11-16T01:44:08.687Z"
            }
        }
    ],
    "totalCount": 31895
} 

Note: By default, the API response excludes the index attribute block. You must specify index field in returnedFields in order to see it in the response.

The above query returns all records which had problems due to fields mismatch.

Known issues/limitations

nested query

  • The following features are not functional with the current nested implementation:
  • The nested fields sort query filter can now be only attached to top level nested field. Lack of control of attachment level may impact some rare use cases on structures nested multiple times, however this would require completely new different syntax that is very hard to understand.
  • The current nested query parser throws an exception if using a grouping with nested syntax due to the current nested query parser. As a workaround, you can rewrite the query so that it does not involve grouping. An example can be found here.

Cursor query

  • By default, on each data partition only 500 concurrent cursor requests can be active. Cursor expires after 1 minute timeout or if it's last page of the result. User may see response with 429 error code and Too many requests error message if request load exceeds this limit.

All queries

  • The maximum size of the response is 100MB. If the response size exceeds 100MB, a 413 status code response will be returned without any search results.