Storage Service
Introduction
After performing the basic user management procedures (create users and groups, assign users to groups, etc.) through Entitlements Service, the OSDU developer can use the Storage Service to ingest metadata information generated by applications into the Data Ecosystem. The Storage Service provides a set of APIs to manage the entire metadata life-cycle such as ingestion (persistence), modification, deletion, versioning and data schema.
Record structure
From the Storage Service perspective, the metadata to be ingested is called record. Below is a basic example of a Data Ecosystem record with a brief explanation of each field:
{
"id": "data-partition-id:hello:123456",
"kind": "schema-authority:wks:hello:1.0.0",
"acl": {
"viewers": ["data.default.viewers@data-partition-id.[osdu.opengroup.org]"],
"owners": ["data.default.owners@data-partition-id.[osdu.opengroup.org]"]
},
"legal": {
"legaltags": ["data-partition-id-sample-legaltag"],
"otherRelevantDataCountries": ["FR","US","CA"]
},
"data": {
"msg": "Hello World, Data Ecosystem!"
},
"createUser": "user@email.com",
"createTime": "2023-03-28T10:31:09.890Z",
"modifyUser": "user@email.com",
"modifyTime": "2023-03-28T10:31:09.890Z"
}
- id: (optional) Unique identifier in the Data Ecosystem. When not provided, the service will create and assign an id to the record. Must follow the naming convention:
{Data-Partition-Id}:{object-type}:{uuid}
. - kind: (mandatory) Kind of data being ingested. Must follow the naming convention:
{Schema-Authority}:{dataset-name}:{record-type}:{version}
. - acl: (mandatory) Group of users who have access to the record.
- acl.viewers: List of valid groups which will have view/read privileges over the record. We follow the naming convention such that data groups begin with
data.
. - acl.owners: List of valid groups which will have write privileges over the record. We follow the naming convention such that data groups begin with
data.
.
- acl.viewers: List of valid groups which will have view/read privileges over the record. We follow the naming convention such that data groups begin with
- legal: (mandatory) Attributes which represent the legal constraints associated with the record.
- legal.legaltags: List of legal tag names associated with the record.
- legal.otherRelevantDataCountries: List of other relevant data countries. Must have at least 2 values: where the data was ingested from and where Data Ecosystem stores the data.
- data: (mandatory) Record payload represented as a list of key-value pairs.
- createUser: ID of the user who has created the record
- createTime: Time at which the record was created
- modifyUser: ID of the user who has last updated that specific version of the record (Not present in first version of the record)
- modifyTime: Time at which that version of the record was updated (Not present in first version of the record)
Note: modifyUser
and modifyTime
values are only updated for data-block updates using the PATCH
or PUT
APIs. Metadata updates using the PATCH
API do not create a new record version nor update the modifyTime
and modifyUser
attributes.
Schema structure
Another important concept in the Data Ecosystem Storage Service is schema. Schema is a structure, also defined in JSON, which provides data type information for the record fields. In other words, the schema defines whether a given field in the record is a string
, or an integer
, or a float
, or a geopoint
, etc.
It is important to note that only fields with schema information associated with are indexed by the Search Service. For this reason, the OSDU developer must create the respective schema for his/her records kind before start ingesting records into the Data Ecosystem.
Schemas and records are tied together by the kind attribute. On top of that, a given kind can have zero or exactly one schema associated with. Having that concept in mind, the OSDU developer can make use of schema service APIs for schema management.
Note that all schema apis in Storage service are now deprecated, schema service is now used to manage schemas.