Compacting

The most common ¹ application of compacting is to simplify RDF property URIs to something more handy.

It’s as simple as:

Define a context describing your own shorthands for properties and, when needed, indicating their values are RDF resources/blanks (in contrary to RDF literals). For example here we say https://vocabs.acdh.oeaw.ac.at/schema#hasTitle should be shortened to title and https://vocabs.acdh.oeaw.ac.at/schema#hasCreator should be shortened to creator while its value should be treated as an RDF node URI (in JSON-LD):
```
{
  "title": "https://vocabs.acdh.oeaw.ac.at/schema#hasTitle",
  "creator": {
    "@id": "https://vocabs.acdh.oeaw.ac.at/schema#hasCreator",
    "@type": "@id"
  },
  "subject": {
    "@id": "https://vocabs.acdh.oeaw.ac.at/schema#hasSubject",
    "@container": "@language"
  }
}
```

Apply it to data to get nice property names:

data (in JSON-LD)

{
  "@graph": [
    {
      "@id": "http://foo/1",
      "https://vocabs.acdh.oeaw.ac.at/schema#hasTitle": [
        {"@value": "English title", @language: "en"},
        {"@value": "Deutcher Titel", @language: "de"},
      ],
      "https://vocabs.acdh.oeaw.ac.at/schema#hasCreator": {
        "@id": "http://id.acdh.oeaw.ac.at/somePerson"
      }
    }
}

result (in JSON-LD):

{
  "@graph": [
    {
      "@id": "http://foo/1",
      "title": [
        {"@value": "English title", @language: "en"},
        {"@value": "Deutcher Titel", @language: "de"},
      ],
      "creator": "http://id.acdh.oeaw.ac.at/somePerson"
    }
}

If you are processing JSON-LD, the library you’re using (you’re using one, don’t you? if not, please read this) should be able to perform the compacting for you and then you can just deserialize the resulting JSON into you programming language data structures (see a chapter below).

If you are dealing with other RDF serialization, you may easily perform the compacting on your own - see a chapter below.

Where to get a context from?

There are two obvious sources:

Your own preferences. You know what properties your webapp needs and what are corresponding RDF property URIs, so you prepare the context on your own.
Generally this should be your choice.
The ARCHE repository schema reported by the {apiBase}/describe endpoint, e.g. https://arche.acdh.oeaw.ac.at/api/describe.
It’s not a good long-term solution because ARCHE schema may not contain mappings for properties you need and for sure contains mappings for properties you don’t care about. Also there’s no guarantee of stability of this schema - it can change when ARCHE internals need it at no one will ask about your opinion. But if you need just any context to try out how compacting works, it will do the job.

Compacting JSON-LD

First let’s try compacting the JSON-LD itself.

If you are a JavaScript/Node.JS programmer, the result will be just an JS object with a structure you should find easy to deal with.

If you are using other programming language, you can either parse the resulting JSON or check the “manual compacting” methods described in the next chapter.

JavaScript/Node.JS example

(jsonld library required)

// define the context
const context = {
  "title": {
    "@id": "https://vocabs.acdh.oeaw.ac.at/schema#hasTitle",
    "@container": ["@language", "@set"]
  },
  "creator": {
    "@id": "https://vocabs.acdh.oeaw.ac.at/schema#hasCreator",
    "@type": "@id",
    "@container": "@set"
  },
  "subject": {
    "@id": "https://vocabs.acdh.oeaw.ac.at/schema#hasSubject",
    "@container": ["@language", "@set"]
  }
};
// get sample ARCHE resource data in JSON-LD
// for clarity fetch only single resource metadata and only properties mentioned in the context
const req = new XMLHttpRequest();
req.open("GET", "https://id.acdh.oeaw.ac.at/schnitzler/bahrschnitzler", true);
req.setRequestHeader('Accept', 'application/ld+json');
req.setRequestHeader('X-METADATA-READ-MODE', 'resource');
req.setRequestHeader('X-RESOURCE-PROPERTIES', ['https://vocabs.acdh.oeaw.ac.at/schema#hasTitle', 'https://vocabs.acdh.oeaw.ac.at/schema#hasSubject' , 'https://vocabs.acdh.oeaw.ac.at/schema#hasCreator'].join(','));

// response processing handle
req.onreadystatechange = function () {
  if (req.readyState === XMLHttpRequest.DONE) {
    const data = JSON.parse(req.responseText);
    // before compacting
    console.log(data);
    const promise = jsonld.compact(data, context).then(function (x) {
      // after compacting
      console.log(x);
    });
    Promise.resolve(promise);
  }
};
req.send();

Python example

(rdflib and request libraries required)

import requests
from rdflib import Graph

# definte a custom context
context = {
  'title': {
    '@id': 'https://vocabs.acdh.oeaw.ac.at/schema#hasTitle',
    '@container': ['@language', '@set']
  },
  'creator': {
    '@id': 'https://vocabs.acdh.oeaw.ac.at/schema#hasCreator',
    '@type': '@id',
    '@container': '@set'
  },
  'subject': {
    '@id': 'https://vocabs.acdh.oeaw.ac.at/schema#hasSubject',
    '@container': ['@language', '@set']
  }
}

# get sample ARCHE resource data in JSON-LD
# for clarity fetch only single resource metadata and only properties mentioned in the context
resp = requests.get(
  'https://hdl.handle.net/21.11115/0000-000E-C8A6-5',
  headers={
    'Accept': 'application/ld+json',
    'X-METADATA-READ-MODE': 'resource', 
    'X-RESOURCE-PROPERTIES': 'https://vocabs.acdh.oeaw.ac.at/schema#hasTitle,https://vocabs.acdh.oeaw.ac.at/schema#hasSubject,https://vocabs.acdh.oeaw.ac.at/schema#hasCreator'
  }
)

# before compacting
print(resp.text)
# after compacting
graph = Graph().parse(data=resp.text, format='json-ld')
print(graph.serialize(format='json-ld', context=context))

PHP example

(guzzlehttp/guzzle and ml/json-ld libraries required)

# definte a custom context
$context = '{
  "title": {
    "@id": "https://vocabs.acdh.oeaw.ac.at/schema#hasTitle",
    "@container": ["@language", "@set"]
  },
  "creator": {
    "@id": "https://vocabs.acdh.oeaw.ac.at/schema#hasCreator",
    "@type": "@id",
    "@container": "@set"
  },
  "subject": {
    "@id": "https://vocabs.acdh.oeaw.ac.at/schema#hasSubject",
    "@container": ["@language", "@set"]
  }
}';

# get sample ARCHE resource data in JSON-LD
# for clarity fetch only single resource metadata and only properties mentioned in the context
$headers = [
  'Accept' => 'application/ld+json',
  'X-METADATA-READ-MODE' => 'resource', 
  'X-RESOURCE-PROPERTIES' => 'https://vocabs.acdh.oeaw.ac.at/schema#hasTitle,https://vocabs.acdh.oeaw.ac.at/schema#hasSubject,https://vocabs.acdh.oeaw.ac.at/schema#hasCreator'
];
$client = new GuzzleHttp\Client(['headers' => $headers]);
$response = $client->request('get', 'https://hdl.handle.net/21.11115/0000-000E-C8A6-5', $headers);
$data = (string) $response->getBody();

# before compacting
print_r(json_decode($data));
# after compacting
print_r(ML\JsonLD\JsonLD::compact($data, $context));

General remarks

Typically JSON-LD libraries tend to simplify the output if possible, e.g.

Make property values an array only if it has multiple values.
- In examples above are using "@container": "@set" to enforce values after compacting to always be an array.
Skip the @graph property if there is only one subject in the graph.
- I am not aware of any compacting context settings which would assure the @graph property is always at the top level of a JSON-LD document so this is something you probably need to check and assure on your own with something like:
```
if (!document['graph']) {
  document = {
    '@context': document['context'];
    '@graph': [document]
  };
  delete document['@graph'][0]['@context'];
}
```

Manual compacting

If you are programming in another programming language, you may find it more intuitive to directly perform “compacting” to the datatypes provided by your language (e.g. Python dictionaries or PHP arrays).

Below a few examples in Python and PHP using the schema section of the ARCHE repository config as a context.

Remarks:

In the JSON-LD examples object triple values store just an IRI of the target node. In the examples above we store the reference to the target node object which makes traversing the graph slightly easier.
In JSON-LD examples we used "@container": "@language" (so-called language Indexing) for some properties. This made property values being grouped by values language tag. This is not implemented in examples below (it is not hard to add though).
In the code below we always store property values as arrays, just like with "@container": "@set" int JSON-LD examples above.

Python example

(rdflib and requests libraries required)

import rdflib
import requests

# fetch the context
context = requests.get('https://arche.acdh.oeaw.ac.at/api/describe', headers={'Accept': 'application/json'})
context = context.json()['schema']
# flip the context so it's uri->shortName
context = {v: k for k, v in context.items() if isinstance(v, str)}

# parse RDF data
rawrdf = requests.get('https://hdl.handle.net/21.11115/0000-000E-C8A6-5', headers={'Accept': 'application/n-triples'})
data = rdflib.Graph()
data.parse(data=rawrdf.text, format="nt")

# create Python-native data model based on dictionaries
nodes = {}
for (sbj, prop, obj) in data:
  sbj = str(sbj)
  prop = str(prop)
  # skip RDF properties for which we don't know the mapping
  if prop not in context:
    continue
    
  # map prop name according to the context
  prop = context[prop]

  # if the triple points to another node in the graph, maintain the reference
  if not isinstance(obj, rdflib.term.Literal):
    if str(obj) not in nodes:
      nodes[str(obj)] = {'__uri__': str(obj)}
    obj = nodes[str(obj)]
    
  # manage the data
  if sbj not in nodes:
    nodes[sbj] = {'__uri__': sbj}
  if prop not in nodes[sbj]:
    nodes[sbj][prop] = []
  nodes[sbj][prop].append(obj)

# print results:
for uri, node in nodes.items():
  print(f"{uri}:")
  for prop, values in node.items():
    if prop == '__uri__':
      continue
    print(f"\t{prop}:")
    for val in values:
      if isinstance(val, dict):
        print(f"\t\treference to the {val['__uri__']} node")
      else:
        print(f"\t\t{val}")

PHP example

(guzzlehttp/guzzle, sweetrdf/quick-rdf-io and sweetrdf/quick-rdf libraries required)

$client = new GuzzleHttp\Client();
$dataFactory = new quickRdf\DataFactory();

# fetch the context
$context = $client->request('get', 'https://arche.acdh.oeaw.ac.at/api/describe', ['headers' => ['Accept' => 'application/json']]);
$context = json_decode($context->getBody(), true)['schema'];
# flip the context so it's uri->shortName
$context = array_filter($context, fn($x) => is_scalar($x));
$context = array_flip($context);

# parse RDF data
$data = $client->request('get', 'https://hdl.handle.net/21.11115/0000-000E-C8A6-5', ['headers' => ['Accept' => 'application/n-triples']]);
$data = quickRdfIo\Util::parse($data, $dataFactory);

# perform the mapping
$nodes = [];
foreach ($data as $triple) {
  $sbj = $triple->getSubject()->getValue();
  $prop = $triple->getPredicate()->getValue();
  $obj = $triple->getObject();
  
  # skip RDF properties for which we don't know the mapping
  if (!isset($context[$prop])) {
    continue;
  }
  # map prop name according to the context
  $prop = $context[$prop];

  # if the triple points to another node in the graph, maintain the reference
  if (!($obj instanceof rdfInterface\LiteralInterface)) {
    $objUri = $obj->getValue();
    if (!isset($nodes[$objUri])) {
      $nodes[$objUri] = (object) ['__uri__' => $objUri];
    }
    $obj = $nodes[$objUri];
  }
    
  # manage the data
  if (!isset($nodes[$sbj])) {
    $nodes[$sbj] = (object) ['__uri__' => $sbj];
  }
  if (!isset($nodes[$sbj]->$prop)) {
    $nodes[$sbj]->$prop = [];
  }
  $nodes[$sbj]->$prop[] = $obj;
}

# print results
print_r($nodes);

PHP example using arche-lib Repo object

(acdhcd/arche-lib library required)

# initialize the RepoDb object and perform a simple search
$repo = acdhOeaw\arche\lib\Repo::factoryFromUrl('https://arche.acdh.oeaw.ac.at/api');
$schema = $repo->getSchema();
$searchTerm = new acdhOeaw\arche\lib\SearchTerm($schema->pid, 'https://hdl.handle.net/21.11115/0000-000E-C8A6-5');
$searchCfg = new acdhOeaw\arche\lib\SearchConfig();
$searchCfg->metadataMode = 'neighbors';
$graph = $repo->getGraphBySearchTerms([$searchTerm], $searchCfg);

# define context
$matchProp = $schema->searchMatch;
$context = [
  'https://vocabs.acdh.oeaw.ac.at/schema#hasTitle' => 'title',
  'https://vocabs.acdh.oeaw.ac.at/schema#hasDescription' => 'description',
  'https://vocabs.acdh.oeaw.ac.at/schema#hasAvailableDate' => 'createDate',
  'https://vocabs.acdh.oeaw.ac.at/schema#hasUpdatedDate' => 'modificationDate',
  'https://vocabs.acdh.oeaw.ac.at/schema#hasCreator' => 'creator',
  'https://vocabs.acdh.oeaw.ac.at/schema#hasRightsHolder' => 'rightsHolder',
  $matchProp => '__match__',
];

# map results to PHP objects according to the context
$nodes = [];
foreach ($graph->resources() as $res) {
  // $res is of type EasyRdf\Resource
  $obj = (object) ['uri' => $res->getUri()];
  $toMap = array_intersect($res->propertyUris(), array_keys($context));
  foreach ($toMap as $i) {
    $prop = $context[$i];
    $obj->$prop = [];
    foreach ($res->allLiterals($i) as $v) {
      $obj->$prop[(string) $v->getLang()] = $v->getValue();
    }
    foreach ($res->allResources($i) as $v) {
      $obj->$prop[] = $v->getUri();
    }
  }
  $nodes[$res->getUri()] = $obj;
}

# find and display nodes matching the search
$matches = array_filter($nodes, fn($x) => isset($x->__match__));
print_r($matches);

PHP example using arche-lib RepoDb object

(acdhcd/arche-lib library required)

# initialize the RepoDb object and perform a simple search
$repo = acdhOeaw\arche\lib\RepoDb::factory('pathToConfig.yaml');
$schema = $repo->getSchema();
$searchTerm = new acdhOeaw\arche\lib\SearchTerm($schema->pid, 'https://hdl.handle.net/21.11115/0000-000E-C8A6-5');
$searchCfg = new acdhOeaw\arche\lib\SearchConfig();
$searchCfg->metadataMode = 'neighbors';
$pdoStmt = $repo->getPdoStatementBySearchTerms([$searchTerm], $searchCfg);

# define context
$matchProp = $schema->searchMatch;
$context = [
  'https://vocabs.acdh.oeaw.ac.at/schema#hasTitle' => 'title',
  'https://vocabs.acdh.oeaw.ac.at/schema#hasDescription' => 'description',
  'https://vocabs.acdh.oeaw.ac.at/schema#hasAvailableDate' => 'createDate',
  'https://vocabs.acdh.oeaw.ac.at/schema#hasUpdatedDate' => 'modificationDate',
  'https://vocabs.acdh.oeaw.ac.at/schema#hasCreator' => 'creator',
  'https://vocabs.acdh.oeaw.ac.at/schema#hasRightsHolder' => 'rightsHolder',
  $matchProp => '__match__',
];

# map results to PHP objects according to the context
$nodes = [];
while ($triple = $pdoStmt->fetchObject()) {
  # $triple is an object with properties id, property, type, lang, value

  # skip RDF properties for which we don't know the mapping
  if (!isset($context[$triple->property])) {
    continue;
  }
  # map prop name according to the context
  $prop = $context[$triple->property];

  # if the triple points to another node in the graph, maintain the reference
  if ($triple->type === 'REL') {
    if (!isset($nodes[$triple->value])) {
      $nodes[$triple->value] = (object) ['__id__' => $triple->value];
    }
  }
    
  # manage the data
  if (!isset($nodes[$triple->id])) {
    $nodes[$triple->id] = (object) ['__id__' => $triple->id];
  }
  if (!isset($nodes[$triple->id]->$prop)) {
    $nodes[$triple->id]->$prop = [];
  }
  switch ($triple->type) {
    case 'ID':
      $nodes[$triple->id]->$prop[] = $triple->value;
      break;
    case 'REL':
      $nodes[$triple->id]->$prop[] = $nodes[$triple->value];
      break;
    default:
      $nodes[$triple->id]->$prop[(string) $triple->lang] = $triple->value;
  }
}

# find and display nodes matching the search
$matches = array_filter($nodes, fn($x) => isset($x->__match__));
print_r($matches);

And the only one described here but feel free to read this, this, this and that.↩︎

RDF Compacting

2023-10-17

Introduction

Compacting

Where to get a context from?

Compacting JSON-LD

JavaScript/Node.JS example

Python example

PHP example

General remarks

Manual compacting

Python example

PHP example

PHP example using arche-lib Repo object

PHP example using arche-lib RepoDb object