As described in more details here dealing with RDF data can be pretty troublesome.
To make mapping of the RDF to object-oriented programming languages data model simpler a few standardized algorithms have been developed. For us the most important ones are
While these algorithms have been originally developed for JSON-LD, they can be generally applied to any RDF data set.
Here we focus on the first one - the compacting.
The most common 1 application of compacting is to simplify RDF property URIs to something more handy.
It’s as simple as:
Define a context describing your own shorthands for properties
and, when needed, indicating their values are RDF resources/blanks (in
contrary to RDF literals). For example here we say
https://vocabs.acdh.oeaw.ac.at/schema#hasTitle
should be
shortened to title
and
https://vocabs.acdh.oeaw.ac.at/schema#hasCreator
should be
shortened to creator
while its value should be treated as
an RDF node URI (in JSON-LD):
Apply it to data to get nice property names:
data (in JSON-LD)
result (in JSON-LD):
If you are processing JSON-LD, the library you’re using (you’re using one, don’t you? if not, please read this) should be able to perform the compacting for you and then you can just deserialize the resulting JSON into you programming language data structures (see a chapter below).
If you are dealing with other RDF serialization, you may easily perform the compacting on your own - see a chapter below.
There are two obvious sources:
{apiBase}/describe
endpoint, e.g. https://arche.acdh.oeaw.ac.at/api/describe.First let’s try compacting the JSON-LD itself.
If you are a JavaScript/Node.JS programmer, the result will be just an JS object with a structure you should find easy to deal with.
If you are using other programming language, you can either parse the resulting JSON or check the “manual compacting” methods described in the next chapter.
(jsonld library required)
// define the context
const context = {
"title": {
"@id": "https://vocabs.acdh.oeaw.ac.at/schema#hasTitle",
"@container": ["@language", "@set"]
},
"creator": {
"@id": "https://vocabs.acdh.oeaw.ac.at/schema#hasCreator",
"@type": "@id",
"@container": "@set"
},
"subject": {
"@id": "https://vocabs.acdh.oeaw.ac.at/schema#hasSubject",
"@container": ["@language", "@set"]
}
};
// get sample ARCHE resource data in JSON-LD
// for clarity fetch only single resource metadata and only properties mentioned in the context
const req = new XMLHttpRequest();
req.open("GET", "https://id.acdh.oeaw.ac.at/schnitzler/bahrschnitzler", true);
req.setRequestHeader('Accept', 'application/ld+json');
req.setRequestHeader('X-METADATA-READ-MODE', 'resource');
req.setRequestHeader('X-RESOURCE-PROPERTIES', ['https://vocabs.acdh.oeaw.ac.at/schema#hasTitle', 'https://vocabs.acdh.oeaw.ac.at/schema#hasSubject' , 'https://vocabs.acdh.oeaw.ac.at/schema#hasCreator'].join(','));
// response processing handle
req.onreadystatechange = function () {
if (req.readyState === XMLHttpRequest.DONE) {
const data = JSON.parse(req.responseText);
// before compacting
console.log(data);
const promise = jsonld.compact(data, context).then(function (x) {
// after compacting
console.log(x);
});
Promise.resolve(promise);
}
};
req.send();
(rdflib
and request
libraries required)
import requests
from rdflib import Graph
# definte a custom context
context = {
'title': {
'@id': 'https://vocabs.acdh.oeaw.ac.at/schema#hasTitle',
'@container': ['@language', '@set']
},
'creator': {
'@id': 'https://vocabs.acdh.oeaw.ac.at/schema#hasCreator',
'@type': '@id',
'@container': '@set'
},
'subject': {
'@id': 'https://vocabs.acdh.oeaw.ac.at/schema#hasSubject',
'@container': ['@language', '@set']
}
}
# get sample ARCHE resource data in JSON-LD
# for clarity fetch only single resource metadata and only properties mentioned in the context
resp = requests.get(
'https://hdl.handle.net/21.11115/0000-000E-C8A6-5',
headers={
'Accept': 'application/ld+json',
'X-METADATA-READ-MODE': 'resource',
'X-RESOURCE-PROPERTIES': 'https://vocabs.acdh.oeaw.ac.at/schema#hasTitle,https://vocabs.acdh.oeaw.ac.at/schema#hasSubject,https://vocabs.acdh.oeaw.ac.at/schema#hasCreator'
}
)
# before compacting
print(resp.text)
# after compacting
graph = Graph().parse(data=resp.text, format='json-ld')
print(graph.serialize(format='json-ld', context=context))
(guzzlehttp/guzzle
and ml/json-ld
libraries
required)
# definte a custom context
$context = '{
"title": {
"@id": "https://vocabs.acdh.oeaw.ac.at/schema#hasTitle",
"@container": ["@language", "@set"]
},
"creator": {
"@id": "https://vocabs.acdh.oeaw.ac.at/schema#hasCreator",
"@type": "@id",
"@container": "@set"
},
"subject": {
"@id": "https://vocabs.acdh.oeaw.ac.at/schema#hasSubject",
"@container": ["@language", "@set"]
}
}';
# get sample ARCHE resource data in JSON-LD
# for clarity fetch only single resource metadata and only properties mentioned in the context
$headers = [
'Accept' => 'application/ld+json',
'X-METADATA-READ-MODE' => 'resource',
'X-RESOURCE-PROPERTIES' => 'https://vocabs.acdh.oeaw.ac.at/schema#hasTitle,https://vocabs.acdh.oeaw.ac.at/schema#hasSubject,https://vocabs.acdh.oeaw.ac.at/schema#hasCreator'
];
$client = new GuzzleHttp\Client(['headers' => $headers]);
$response = $client->request('get', 'https://hdl.handle.net/21.11115/0000-000E-C8A6-5', $headers);
$data = (string) $response->getBody();
# before compacting
print_r(json_decode($data));
# after compacting
print_r(ML\JsonLD\JsonLD::compact($data, $context));
Typically JSON-LD libraries tend to simplify the output if possible, e.g.
"@container": "@set"
to
enforce values after compacting to always be an array.@graph
property if there is only one subject
in the graph.
I am not aware of any compacting context settings which would
assure the @graph
property is always at the top level of a
JSON-LD document so this is something you probably need to check and
assure on your own with something like:
If you are programming in another programming language, you may find it more intuitive to directly perform “compacting” to the datatypes provided by your language (e.g. Python dictionaries or PHP arrays).
Below a few examples in Python and PHP using the schema
section of the ARCHE repository config as a context.
Remarks:
"@container": "@language"
(so-called language Indexing) for some properties. This made
property values being grouped by values language tag. This is not
implemented in examples below (it is not hard to add though)."@container": "@set"
int JSON-LD examples
above.(rdflib
and requests
libraries
required)
import rdflib
import requests
# fetch the context
context = requests.get('https://arche.acdh.oeaw.ac.at/api/describe', headers={'Accept': 'application/json'})
context = context.json()['schema']
# flip the context so it's uri->shortName
context = {v: k for k, v in context.items() if isinstance(v, str)}
# parse RDF data
rawrdf = requests.get('https://hdl.handle.net/21.11115/0000-000E-C8A6-5', headers={'Accept': 'application/n-triples'})
data = rdflib.Graph()
data.parse(data=rawrdf.text, format="nt")
# create Python-native data model based on dictionaries
nodes = {}
for (sbj, prop, obj) in data:
sbj = str(sbj)
prop = str(prop)
# skip RDF properties for which we don't know the mapping
if prop not in context:
continue
# map prop name according to the context
prop = context[prop]
# if the triple points to another node in the graph, maintain the reference
if not isinstance(obj, rdflib.term.Literal):
if str(obj) not in nodes:
nodes[str(obj)] = {'__uri__': str(obj)}
obj = nodes[str(obj)]
# manage the data
if sbj not in nodes:
nodes[sbj] = {'__uri__': sbj}
if prop not in nodes[sbj]:
nodes[sbj][prop] = []
nodes[sbj][prop].append(obj)
# print results:
for uri, node in nodes.items():
print(f"{uri}:")
for prop, values in node.items():
if prop == '__uri__':
continue
print(f"\t{prop}:")
for val in values:
if isinstance(val, dict):
print(f"\t\treference to the {val['__uri__']} node")
else:
print(f"\t\t{val}")
(guzzlehttp/guzzle
, sweetrdf/quick-rdf-io
and sweetrdf/quick-rdf
libraries required)
$client = new GuzzleHttp\Client();
$dataFactory = new quickRdf\DataFactory();
# fetch the context
$context = $client->request('get', 'https://arche.acdh.oeaw.ac.at/api/describe', ['headers' => ['Accept' => 'application/json']]);
$context = json_decode($context->getBody(), true)['schema'];
# flip the context so it's uri->shortName
$context = array_filter($context, fn($x) => is_scalar($x));
$context = array_flip($context);
# parse RDF data
$data = $client->request('get', 'https://hdl.handle.net/21.11115/0000-000E-C8A6-5', ['headers' => ['Accept' => 'application/n-triples']]);
$data = quickRdfIo\Util::parse($data, $dataFactory);
# perform the mapping
$nodes = [];
foreach ($data as $triple) {
$sbj = $triple->getSubject()->getValue();
$prop = $triple->getPredicate()->getValue();
$obj = $triple->getObject();
# skip RDF properties for which we don't know the mapping
if (!isset($context[$prop])) {
continue;
}
# map prop name according to the context
$prop = $context[$prop];
# if the triple points to another node in the graph, maintain the reference
if (!($obj instanceof rdfInterface\LiteralInterface)) {
$objUri = $obj->getValue();
if (!isset($nodes[$objUri])) {
$nodes[$objUri] = (object) ['__uri__' => $objUri];
}
$obj = $nodes[$objUri];
}
# manage the data
if (!isset($nodes[$sbj])) {
$nodes[$sbj] = (object) ['__uri__' => $sbj];
}
if (!isset($nodes[$sbj]->$prop)) {
$nodes[$sbj]->$prop = [];
}
$nodes[$sbj]->$prop[] = $obj;
}
# print results
print_r($nodes);
(acdhcd/arche-lib
library required)
# initialize the RepoDb object and perform a simple search
$repo = acdhOeaw\arche\lib\Repo::factoryFromUrl('https://arche.acdh.oeaw.ac.at/api');
$schema = $repo->getSchema();
$searchTerm = new acdhOeaw\arche\lib\SearchTerm($schema->pid, 'https://hdl.handle.net/21.11115/0000-000E-C8A6-5');
$searchCfg = new acdhOeaw\arche\lib\SearchConfig();
$searchCfg->metadataMode = 'neighbors';
$graph = $repo->getGraphBySearchTerms([$searchTerm], $searchCfg);
# define context
$matchProp = $schema->searchMatch;
$context = [
'https://vocabs.acdh.oeaw.ac.at/schema#hasTitle' => 'title',
'https://vocabs.acdh.oeaw.ac.at/schema#hasDescription' => 'description',
'https://vocabs.acdh.oeaw.ac.at/schema#hasAvailableDate' => 'createDate',
'https://vocabs.acdh.oeaw.ac.at/schema#hasUpdatedDate' => 'modificationDate',
'https://vocabs.acdh.oeaw.ac.at/schema#hasCreator' => 'creator',
'https://vocabs.acdh.oeaw.ac.at/schema#hasRightsHolder' => 'rightsHolder',
$matchProp => '__match__',
];
# map results to PHP objects according to the context
$nodes = [];
foreach ($graph->resources() as $res) {
// $res is of type EasyRdf\Resource
$obj = (object) ['uri' => $res->getUri()];
$toMap = array_intersect($res->propertyUris(), array_keys($context));
foreach ($toMap as $i) {
$prop = $context[$i];
$obj->$prop = [];
foreach ($res->allLiterals($i) as $v) {
$obj->$prop[(string) $v->getLang()] = $v->getValue();
}
foreach ($res->allResources($i) as $v) {
$obj->$prop[] = $v->getUri();
}
}
$nodes[$res->getUri()] = $obj;
}
# find and display nodes matching the search
$matches = array_filter($nodes, fn($x) => isset($x->__match__));
print_r($matches);
(acdhcd/arche-lib
library required)
# initialize the RepoDb object and perform a simple search
$repo = acdhOeaw\arche\lib\RepoDb::factory('pathToConfig.yaml');
$schema = $repo->getSchema();
$searchTerm = new acdhOeaw\arche\lib\SearchTerm($schema->pid, 'https://hdl.handle.net/21.11115/0000-000E-C8A6-5');
$searchCfg = new acdhOeaw\arche\lib\SearchConfig();
$searchCfg->metadataMode = 'neighbors';
$pdoStmt = $repo->getPdoStatementBySearchTerms([$searchTerm], $searchCfg);
# define context
$matchProp = $schema->searchMatch;
$context = [
'https://vocabs.acdh.oeaw.ac.at/schema#hasTitle' => 'title',
'https://vocabs.acdh.oeaw.ac.at/schema#hasDescription' => 'description',
'https://vocabs.acdh.oeaw.ac.at/schema#hasAvailableDate' => 'createDate',
'https://vocabs.acdh.oeaw.ac.at/schema#hasUpdatedDate' => 'modificationDate',
'https://vocabs.acdh.oeaw.ac.at/schema#hasCreator' => 'creator',
'https://vocabs.acdh.oeaw.ac.at/schema#hasRightsHolder' => 'rightsHolder',
$matchProp => '__match__',
];
# map results to PHP objects according to the context
$nodes = [];
while ($triple = $pdoStmt->fetchObject()) {
# $triple is an object with properties id, property, type, lang, value
# skip RDF properties for which we don't know the mapping
if (!isset($context[$triple->property])) {
continue;
}
# map prop name according to the context
$prop = $context[$triple->property];
# if the triple points to another node in the graph, maintain the reference
if ($triple->type === 'REL') {
if (!isset($nodes[$triple->value])) {
$nodes[$triple->value] = (object) ['__id__' => $triple->value];
}
}
# manage the data
if (!isset($nodes[$triple->id])) {
$nodes[$triple->id] = (object) ['__id__' => $triple->id];
}
if (!isset($nodes[$triple->id]->$prop)) {
$nodes[$triple->id]->$prop = [];
}
switch ($triple->type) {
case 'ID':
$nodes[$triple->id]->$prop[] = $triple->value;
break;
case 'REL':
$nodes[$triple->id]->$prop[] = $nodes[$triple->value];
break;
default:
$nodes[$triple->id]->$prop[(string) $triple->lang] = $triple->value;
}
}
# find and display nodes matching the search
$matches = array_filter($nodes, fn($x) => isset($x->__match__));
print_r($matches);