Introduction

ARCHE provides metadata in RDF which is not the most intuitive format for most programmers. The aim of this guide is to help avoid most common caveats connected with dealing with RDF.

Summary

This document goes down to two rules:

RDF data model for object-oriented programmers

RDF represents graphs with directed and labelled edges which has pretty intuitive object mapping:

There are three kinds of graph nodes in the RDF:

So far doesn’t sound bad but the devil is in the details.

The most common caveats are discussed bellow.

The bottom line is you should always use a dedicated RDF library to deal with RDF because handling it by hand is too complex and error prone.

Property names being URIs

The main problem is RDF edge labels - being mapped to object properties - are URIs. This brings two issues:

  • URIs may contain characters banned in property names of you programming language.
  • URIs are long and cumbersome to write. You don’t want to refer to each property (previous point issue aside) as myObject.http://some.namespace/someProperty.

A solution to this problem is called compacting and is discussed here.

Object IDs being URIs

Identifiers of RDF graph nodes - mapped to objects - are also URIs:

  • If you are attached to integer IDs, then sorry, it’s not the case.
  • You should pay attention to blank nodes as their identifiers can be compared only within a single RDF graph (and if they come from different RDF graphs they always differ, even if their URIs match!).

Literal objects immutability

Another complexity is brought by literals.

Literals have compound structure with value, datatype and language tag. Because of that they must be modeled as objects. The important thing is these objects have to be immutable.

Let’s consider a following RDF graph:

Resource1(someUri) --someProperty----\
                                    Literal1(someValue, someDatatype, someLang)
Resource2(otherUri) --someProperty---/

with two objects (Resource1 and Resource2) sharing a reference to a common literal1 object.

After doing something like (it’s pseudo code and not any particular programming language) resource1.someProperty.setValue(NEWVALUE) the graph should look as follows:

Resource1(someUri) --someProperty--> Literal1(NEWVALUE, someDatatype, someLang)

Resource2(otherUri) --someProperty--> Literal2(someValue, someDatatype, someLang)

and not:

Resource1(someUri) --someProperty----\
                                    Literal1(NEWVALUE, someDatatype, someLang)
Resource2(otherUri) --someProperty---/

By the way it means the resource1->someProperty->setValue(NEWVALUE) syntax should not exist in the first place and the right one should be resource1.someProperty = resource1.someProperty.withValue(NEWVALUE) or during the parsing of the RDF graph into objects of your programming language a separate Literal1 objects should be created for resource1.someProperty value and resource2.someProperty value.

Serialization formats

RDF is an abstract data model which can be expressed in (far too many to be honest) different serialization formats. Just to name the most important:

What you should remember is that a given RDF data set may have virtually countless valid serializations even in a single serialization format.

This is particularly important for RDF-XML and JSON-LD serialization where you can feel tempted to assume the data structure is stable and can be directly used by your app. It can’t.

The only safe way to go is to use a dedicated RDF parsing library which will deal with all those ambiguities.

Example

Let’s take a simple RDF graph:

Resource(http://foo/1) --http://bar/1--> Resource(http://foo/2)
    |                                     |
 http://bar/1                          http://baz
    |                                     |
    v                                     v
   Blank(_://b1) --http://baz--> Literal(baz)

and consider a small subset of its possible serializations.

Remember, all examples below represent exactly the same RDF graph!

  • n-triples - lines below in any order

    <http://foo/1> <http://bar/1> <http://foo/2> .
    <http://foo/1> <http://bar/1> _:b1 .
    <http://foo/2> <http://baz> "baz" .
    _:b1 <http://baz> "baz" .
  • turtle

    • all n-triples serializations (as n-triples is a subset of the turtle format)
    •   @prefix foo: <http://foo/> .
        foo:1 <http://bar/1> foo:2 ,
                           [<http://baz> "baz"] .
        foo:2 <http://baz> "baz" .
    •   @prefix foo: <http://bar/> .
        <http://foo/1> foo:1         <http://foo/2> ,
                                    _:b1 .
        <http://foo/2> <http://baz> "baz" .
       _:b1            <http://baz> "baz" .
    • and countless others…
  • JSON-LD:

    • flattened

      {
        "@graph": [
          {
            "@id": "_:b1",
            "http://baz": [
              {"@value": "baz"}
            ]
          },
          {
            "@id": "http://foo/1",
            "http://bar/1": [
              {"@id": "http://foo/2"},
              {"@id": "_:b1"}
            ]
          },
          {
            "@id": "http://foo/2",
            "http://baz": [
              {"@value": "baz"}
            ]
          }
        ]
      }
    • compacted (just an example, different contexts can be used resulting with different serializations)

      {
        "@context": {
          "bar": "http://bar/1",
          "baz": "http://baz"
        },
        "@graph": [
          {
            "@id": "_:b1",
            "baz": "baz"
          },
          {
            "@id": "http://foo/1",
            "bar": [
              {
                "@id": "http://foo/2"
              },
              {
                "@id": "_:b1"
              }
            ]
          },
          {
            "@id": "http://foo/2",
            "baz": "baz"
          }
        ]
      }
    • with no particular standardization

      {
        "@graph": [
          {
            "@id": "_:b1",
            "http://baz": {"@value": "baz"}
          },
          {
            "@id": "http://foo/1",
            "http://bar/1": [
              {
                "@id": "http://foo/2",
                "http://baz": [
                   {"@value": "baz"}
                ]
              },
              {"@id": "_:b1"}
            ]
          }
        ]
      }
    • and countless others…