ARCHE Suite documentation

Documentation for the ARCHE repository software stack

View the Project on GitHub acdh-oeaw/arche-docs

Dissemination services

Introduction

The way data are stored in arche is quite often not suitable for direct use, especially by humans. Who wants to browse trough metadata in raw RDF? Or view a book as a raw TEI-XML?

Dissemination services address this issue by providing transformations of arche resources into formats useful for dissemination, e.g. render TEI-XML into an HTML webpage a human being can esily view or transform RDF metadata into a BibLaTeX bibliographic entry you can use in you bibliography management software.

It’s worth noting that in most cases it’s not feasible to avoid dissemination services by putting resources being already in dissemnation-friendly formats into the arche. First, there are typically plenty of dissemination formats we want to offer and this would bring a lot of data duplication and all problems connected with the data duplication (most importantly issues with keeping copies in sync and data volume issues). Second, this would make long term maintenance troublesome as once we want to adjust a given dissemination format we need to reprocess all repository resources (in technical terms we would say it breaks the separation between data and presentation).

Definitions

In the spoken language a dissemination service has two meanings which are used interchangeable:

  1. The service providing the transformation.
    This is the technically correct meaning.
    • It’s worth noting that from arche architecture point of view the repository browsing GUI or the OAI-PMH endpoint are also dissemination services in this meaning (the first one disseminates metadata as a webpage and the latter one transforms resource’s arche metadata into OAI-PMH metadata).
  2. A set of rules describing which dissemination services (in the 1st meaning) are availble for a given resource.
    To avoid confusion this is called a mapping below.

Architecture

It’s worth noting that the arche-core is not aware of dissemination services. The whole dissemination service logic is provided by higher layers of the arche software stack: arche-lib-disserv and arche-resolver.

Dissemination service

Dissemination service in arche is any web service able to consume data stored in arche.

It can be hosted anywhere. It doesn’t matter if it can deal only with arche resources or also with other data. It just has to be reachable using an HTTP GET request and it must be able to fetch an arche resource on its own (e.g. from an URL passed to it as a part of the GET request).

Limitations

A dissemination service must be able to fetch the arche resource on its own based on the data passed to it by a GET request. It means:

Mappings data model

(Dissemination service) mappings is a set of rules for:

  1. Matching arche resources with dissemination services able to transform them.
  2. Generating an HTTP request to the dissemination service based on the arche resource metadata.

The data model for both is determined by the arche-lib-disserv library.

Mappings are defined in RDF and represented in the arche as a set of (pretty ordinary) resources.

It means you prepare mappings definitions as an RDF file and ingest them into arche as any other set of metadata.

The arche-lib-disserv doesn’t enforce any particular RDF property names but it means you must choose them on your own and define them in a config file. In examples below a cfg.schema.dissServ.propertyCfgName syntax will be used to denote RDF properties to be defined by you.

An extensive example of the mappings can be found here with RDF property mappings being defined here.

Fundamental dissemination service metadata

Each dissemination service has to:

A minimal dissemination service description could look as follows:

<https://myDissServ/URI> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> cfg.schema.dissServ.class ;
                         cfg.schema.dissServ.location                      "https://service.location/template?resURL={RES_URI}" ;
                         cfg.schema.dissServ.returnFormat                  "text/html" .

Please read further for more details.

Matching resources with dissemination services

Matching is done based on matching rules:

The matching rules system is a compromise between flexibility and complexity. While it allows to handle most common scenarios, it can’t express complex rules (e.g. with multiple indpendent alternatives like (A or B) and (C or D)).

Despite that there are two purely-technical requirements for rules:

Examples:

Dissemination service request URL generation

Once we know a given dissemination service is valid for a given repository resource, we need to be able to generate an HTTP GET request which triggering the dissemination.

The URL is generated by substituting a template with values coming from arche resource’s metadata.

Every dissemination service defines its own URL template.

The URL template may contain a two kinds of placeholders:

Every parameter value can be transformed using a set of predefined functions:

A few examples of dissemination service URL templates:

For an extensive example please take a look here.

User-defined parameters

You can define your own URL template parameters which will be substituted based either on the arche resource’s metadata values or values provided to the resolver.

Each user-defined parameter:

An example RDF definition of a parameter can look as follows:

<parameter> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> cfg.schema.dissServ.parameterClass ;
            cfg.schema.dissServ.parent                        <https://myDissServ/URI> ;
            cfg.schema.label                                  "XSL"@en ;
            cfg.schema.dissServ.parameterDefaultValue         "https://tei4arche.acdh-dev.oeaw.ac.at/xsl/test.xsl
            cfg.schema.dissServ.parameterRdfProperty          "https://vocabs.acdh.oeaw.ac.at/schema#hasCustomXsl" .

The corresponding dissemination service URL template placeholder is {XSL}.

Choosing dissemination service best matching user’s request

So far we know how to describe which dissemination services match which arche resources and how the HTTP GET request disseminating a given resource with a given service is created but it doesn’t tell us how a dissemination resource is matched with a user’s request.

Here the cfg.schema.dissServ.returnFormat dissemination’s service property plays the key role.

While making a request to the resolver, user can specify a preffered output format. This can be done either explicitely (by including the format=desiredFormat query parameter in the URL or Accept: desiredFormat HTTP header) or implicitely (e.g. a web browsers always add the Accept: text/html HTTP header without asking user about it).

Knowing the output format desired by the user the resolver checks if there is a dissemination service mathing the requested arche resource with a desired value of the cfg.schema.dissServ.returnFormat RDF property.

Things get a little complicated where there are many dissemination services able to provide the requested output format. This is particularly common situation when the client requests the text/html output format. In such a case the so called quality value is taken into account.

Dissemination service’s cfg.schema.dissServ.returnFormat property value may contain not only the output format name but also a quality value. Quality value syntax and semantics follows the one of the HTTP Accept header (see here) including the default value of 1.0.

Out of dissemination services providing the requested output format the one with the highest quality value is chosen and if many dissemination services have the same highest quality value, just the first encountered is used.

E.g. when a user requests text/xml,text/html;q=0.9 and there are following dissemination services matching the requested resource:

<service1> cfg.schema.dissServ.returnFormat "text/html;q=0.1" .
<service2> cfg.schema.dissServ.returnFormat "text/html" .
<service3> cfg.schema.dissServ.returnFormat "text/html" .

then:

It’s worth noting that:

Using arche-lib-disserv

Dissemination services matching a given arche resource

Knowing the resource URL:

include 'vendor/autoload.php';
$resUrl = 'https://arche.acdh.oeaw.ac.at/api/108253';
$repo   = acdhOeaw\arche\lib\Repo::factoryFromUrl($resUrl);
$res    = new acdhOeaw\arche\lib\disserv\RepoResource($resUrl, $repo);
$availableDissServ = $res->getDissServices();
foreach ($availableDissServ as $retType => $dissService) {
    echo "$retType: " . $dissService->getRequest($res)->getUri() . "\n";
}

Using search:

include 'vendor/autoload.php';
file_put_contents('config.yaml', file_get_contents('https://arche.acdh.oeaw.ac.at/api/describe'));
$repo       = acdhOeaw\arche\lib\Repo::factory('config.yaml');
$term       = new acdhOeaw\arche\lib\SearchTerm('http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'https://vocabs.acdh.oeaw.ac.at/schema#TopCollection');
$cfg        = new acdhOeaw\arche\lib\SearchConfig();
$cfg->class = '\acdhOeaw\arche\lib\disserv\RepoResource';
$resources  = $repo->getResourcesBySearchTerms([$term], $cfg);
foreach ($resources as $res) {
    echo "----------\n" . $res->getUri() . "\n";
    foreach ($res->getDissServices() as $retType => $dissService) {
        echo "$retType: " . $dissService->getRequest($res)->getUri() . "\n";
    }
}

Arche resources matching a given dissemination service

include 'vendor/autoload.php';
file_put_contents('config.yaml', file_get_contents('https://arche.acdh.oeaw.ac.at/api/describe'));
$repo         = acdhOeaw\arche\lib\Repo::factory('config.yaml');
$term         = new acdhOeaw\arche\lib\SearchTerm('http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'https://vocabs.acdh.oeaw.ac.at/schema#DisseminationService');
$cfg          = new acdhOeaw\arche\lib\SearchConfig();
$cfg->class   = '\acdhOeaw\arche\lib\disserv\dissemination\Service';
$dissServices = $repo->getResourcesBySearchTerms([$term], $cfg);
$dissServ     = $dissServices[0];
print_r($dissServ->getFormats());
// get up to 5 resources matching the given dissemination service
foreach($dissServ->getMatchingResources(5) as $res) {
    echo $res->getUri() . ": " . $dissServ->getRequest($res)->getUri() . "\n";
}