Conventions

General advices

Search workflow

A search API call is handled in a few steps:

  1. Finding resources matching search conditions.
    This is done either based on an explicitly given SQL query (with sql and sqlParam[] request parameters) or by so-called search terms build from property[], value[], operator[], type[] and language[] request parameters.
  2. Ordering matched resources.
    This is done based on the orderBy[], oderByLang and orderByCollation request parameters and if the orderBy[] isn’t specified, by an internal ARCHE resource id.
  3. Applying paging.
    This is done based on the offset and limit request parameters. If they aren’t provided, all matched resources are included.
  4. Generation of technical triples annotating search results.
  5. Applying the readMode, resourceProperties and relativesProperties (read more here and here).

Specyfying the search condition

With search terms

The simplest way of performing the search is by specifying so-called search terms.

A search term is a condition matching an RDF triple based on triple’s property and/or object. If an ARCHE resource has RDF triples having it as a subject and matching all requested search terms, it matches the search.

A single search term is defined by (almost) any combination of corresponding property[], operator[], value[], type[] and language[] request properties.

  • The only forbidden combination is specifying only the operator[] as this is not enough information to formulate any condition.
  • Both property[] and value[] can supply either single or multiple values which are taken as alternatives.
  • The property[] can be inverted by prepending it with a ^. This implicitly enforces the type[]=URI for the value[], if value[] is specified.
  • The default operator[] is =.
  • Default property[], value[], type[] and language[] are “any”.
    • The type[] might be implicitly enforced by operator[], presence of language[] or requesting an inversed property[].
  • When denoting an ARCHE resource as a value[], any identifier of a resource can be used (e.g. https://arche.acdh.oeaw.ac.at/api/23174, https://hdl.handle.net/21.11115/0000-000C-20E3-F, https://id.acdh.oeaw.ac.at/uuid/512c8b7b-1427-4310-8606-43b8faf5619b and https://id.acdh.oeaw.ac.at/ODeeg/Collections/AT-Vienna-KHM/KHM-ANSA-IV3456/3D-data/3Dscan_raw-data/KHM-ANSA-IV3456_raw3d.zip are equally valid ways of relating the same resource).
  • You have to use full URIs while specifying property[], type[] or URI value[]. ARCHE API doesn’t allow you to define namespace aliases and it doesn’t come with a set of predefined ones.

Examples (for explanation of the brackets syntax see the next chapters):

  • property[]=https://vocabs.acdh.oeaw.ac.at/schema#isTitleImageOf - find all resources being a title image (recognizing by existence of actual acdh:isTitleOf relation)
  • value[0][]=foo&value[0][]=bar - find all resources having any property value equal “foo” or “bar”
  • value[]=foo&operator[]=@@ - find all resources with any property matching a full text search for “foo”
  • property[0][]=https://vocabs.acdh.oeaw.ac.at/schema#hasTitle&property[0][]=https://vocabs.acdh.oeaw.ac.at/schema#hasDescription&value[]=foo&operator=@@ - find all resources having either acdh:hasTitle or acdh:hasDescription matching a full text search for “foo”
  • type[]=relation - find all resources having a triple pointing to another resource
  • property[]=^https://vocabs.acdh.oeaw.ac.at/schema#isPartOf&value[]=https://some.id - find all resources being children of the https://some.id resource
    (this can be probably done more efficiently by just using the right readMode on the https://some.id metadata endpoint)
  • property[]=https://vocabs.acdh.oeaw.ac.at/schema#hasTitle&language[]=ja - find all resources having acdh:hasTitle in Japanese
  • value[]=https://orcid.org/0000-0001-5853-2534&type[]=relation - _find all resources pointing to the resource with id https://orcid.org/0000-0001-5853-2534_
  • value[]=POINT (31.8181 30.7884)&operator=&& - find all resources spatially intersecting with the 31.8181E 30.7884N point

If multiple property[]/operator[]/value[]/type[]/language[] parameters are defined by the request, they are grouped into single search terms definitions by the same (implicit or explicit) key. Continue reading for details.

Multiple search terms parsing

When parsing the GET search request or a POST request with the body encoded as application/x-www-form-urlencoded each property[]/operator[]/value[]/type[]/language[] parameter value is assigned a key using following rules:

  • If the parameter[] syntax is used, the key is assigned automatically by taking the next number after the last existing numeric key.
  • If the parameter[key] syntax is used, the specified key is used.
    • The key can be numeric or string.
    • If the specified key already exists, the previous value is overwritten (also if the previously existing key was assign implicitly).

For example:

property[]=x&property=y                 => property: {0: x, 1: y}
property[]=a&property[1]=b&property[]=c => property: {0: a, 1: b, 2: c}
property[2]=a&property[]=b              => property: {2: a, 3: b}
property[foo]=a&property[]=b            => property: {foo: a, 0: b}
property[]=a&property[0]=b              => property: {0: b}

Then parameter values with the same key are grouped to form search terms, e.g. property[0]=x&property[1]=y&value[1]=a results into two search terms:

  • with the key 0 and condition property[]=x
  • with the key 1 and condition property[]=y and value[]=a

Passing multiple values to property[] and value[]

property[] and value[] allow to specify a set of allowed values which is interpreted as “any of”.

When using the GET search request or a POST request with the body encoded as application/x-www-form-urlencoded this should be encoded using the parameter[key][]=value1&parameter[key][]=value2&(...) syntax, e.g. value[0][]=foo&value[0][]=bar.

  • The parameter[key][0]=value1&parameter[key][1]=value2&(...) syntax will also work but the parameter[][]=value1&parameter[][]=value2&(...) syntax won’t (as it will result in parameter: {0: [value1], 1: [value2]} instead of parameter: {0: [value1, value2]})

Preparing a search query using HTTP client libraries

Hopefully most of the time you won’t create ARCHE search API requests by hand but you’ll use some HTTP client library provided by your programming language.

If you are lucky, the library will just do the job for you, e.g.

// jQuery
jQuery.ajax({
  url: 'https://arche.acdh.oeaw.ac.at/api/search',
  method: 'GET', // POST would work equally well
  data: {
    "property":  ["someProp", "otherProp"                           ],
    "value":     [""        , ["otherPropValue1", "otherPropValue2"]],
    "readMode":  "resource",
    "format":    "text/turtle"
  },
  success: function(d) {console.log(d)}
})
// or with explicit keys
jQuery.ajax({
  url: 'https://arche.acdh.oeaw.ac.at/api/search',
  method: 'GET', // POST would work equally well
  data: {
    "property":  {"0": "someProp", "1": "otherProp"},
    "value":     {                 "1": ["otherPropValue1", "otherPropValue2"]},
    "readMode":  "resource",
    "format":    "text/turtle"
  },
  success: function(d) {console.log(d)}
})
// PHP
$searchParam = [
    'property' => ['someProp', 'otherProp'                           ],
    'value'    => [''        , ['otherPropValue1', 'otherPropValue2']],
    'readMode' => 'resource',
    'format'   => 'text/turtle'
];

// just with file_get_contents() - only GET possible
$response = file_get_contents('https://arche.acdh.oeaw.ac.at/api/search?' . http_build_query($searchParam));
echo $response;

// PSR-7 & PSR-18 way provided by Guzzle - both GET and POST possible
$client = new GuzzleHttp\Client();
$getRequest = new GuzzleHttp\Psr7\Request(
  'GET', 
  'https://arche.acdh.oeaw.ac.at/api/search?' . http_build_query($searchParam)
);
$getResponse = $client->sendRequest($getRequest);
echo $getResponse->getBody();
$postRequest = new GuzzleHttp\Psr7\Request(
  'POST', 
  'https://arche.acdh.oeaw.ac.at/api/search',
  ['Content-Type' => 'application/x-www-form-urlencoded'], 
  http_build_query($searchParam)
);
$postResponse = $client->sendRequest($postRequest);
echo $postResponse->getBody();

and if you’re unlucky and your HTTP client library can’t serialize complex objects into URL query (which most notably affects Python users), you need to prepare the request a little more carefully:

# Python with requests
import requests 
resp = requests.get(
  'https://arche.acdh.oeaw.ac.at/api/search',
  params={
    'property[]': ['someProp', 'otherProp'],
    'value[1][]': ['otherPropValue1', 'otherPropValue2'],
    'readMode': 'resource',
    'format': 'text/turtle'
  }
)
print(resp.text)

Last but not least if you’re using PHP, you might want to use the arche-lib which provides object wrappers for the search terms, ARCHE repository, etc.

With explicit SQL query

The search terms are rather simple to use but provide limited flexibility. If your search is to complex for them or if you prefer to use naked SQL, the search API allows you to do it.

  • You should pass your SQL query using the sql request parameter.
  • It’s recommended to pass all literal values used in the query trough the sqlParam[] parameter.
    • That way you don’t need to care about proper escaping SQL-reserved characters.
    • It’s even more important if some values aren’t hardcoded in you app code but come from the user input (that being said a properly set up ARCHE instance will run your query as a user without any data modification rights so there should be no data deletion threat).
  • Your query must return a column named id which will be matched against ARCHE internal resource ids (the id column in the resources table). All other columns returned by your query will be just discarded.
  • To plan your query you’ll probably need to take a look at the ARCHE database schema. The most commonly used data reside in three tables (but you might be also interested in the full_text_search and spatial_search tables):
    • identifiers (id, ids) storing resource identifiers with id being internal resource id and ids all URI ids of a given resource
    • relations(id, target_id, property) storing RDF graph edges
    • metadata(mid, id, property, type, lang, value, value_n, value_t) storing all triples with literal values.
      • mid is an internal triple id which is rather useless for you
      • id, property, type, lang and value store triple’s subject, predicate, value type, value lang tag and the value itself
      • value_n stores parsed numeric value for values of numeric types - this column can be used for proper numeric comparison of values
      • value_t like value_n, just for values of type date/datetime
  • Remember your query is used only for selecting resources matching the search (point 1. in the search workflow).

Example (passing literal values used in the query using the sqlParam[] request parameter):

Find all resources with acdh:hasTitle in Japanese:

https://arche.acdh.oeaw.ac.at/api/search
  ?sql=SELECT id FROM metadata WHERE property = ? AND lang = ?
  &sqlParam[]=https://vocabs.acdh.oeaw.ac.at/schema#hasTitle
  &sqlParam[]=ja
  
https://arche.acdh.oeaw.ac.at/api/search?sql=SELECT%20id%20FROM%20metadata%20WHERE%20property%20%3D%20%3F%20AND%20lang%20%3D%20%3F&sqlParam%5B%5D=https%3A%2F%2Fvocabs.acdh.oeaw.ac.at%2Fschema%23hasTitle&sqlParam%5B%5D=ja

Ordering results

Simple case

Just use the oderbBy[] request parameter coupled, if needed, with the orderByLang parameter.

For example let’s search for all resources bigger than 1.3 GB ordering results by their acdh:hasTitle (for the detailed discussion on API parameters encoding take a look at this section):

https://arche.acdh.oeaw.ac.at/api/search
  ?property[0]=https://vocabs.acdh.oeaw.ac.at/schema#hasRawBinarySize
  &operator[0]=>
  &value[0]=13000000000
  &orderBy[0]=https://vocabs.acdh.oeaw.ac.at/schema#hasTitle
  &readMode=resource
  &format=text/turtle
  
https://arche.acdh.oeaw.ac.at/api/search?property%5B0%5D=https%3A%2F%2Fvocabs.acdh.oeaw.ac.at%2Fschema%23hasRawBinarySize&operator%5B0%5D=%3E&value%5B0%5D=10000000000&orderBy%5B0%5D=https%3A%2F%2Fvocabs.acdh.oeaw.ac.at%2Fschema%23hasTitle&readMode=resource&format=text%2Fturtle

And take a look at the returned data skipping uninteresting properties:

<https://arche.acdh.oeaw.ac.at/api/> <search://count> "7"^^<http://www.w3.org/2001/XMLSchema#integer> .

<https://arche.acdh.oeaw.ac.at/api/23174>
    <search://match>       "true"^^<http://www.w3.org/2001/XMLSchema#boolean>;
    acdh:hasRawBinarySize  "17222812879"^^<http://www.w3.org/2001/XMLSchema#long>;
    <search://order>       "7"^^<http://www.w3.org/2001/XMLSchema#positiveInteger>;
    <search://orderValue1> "KHM-ANSA-IV3456_raw3d.zip";
    acdh:hasTitle          "KHM-ANSA-IV3456_raw3d.zip"@en.
<https://arche.acdh.oeaw.ac.at/api/37779>
    <search://match>       "true"^^<http://www.w3.org/2001/XMLSchema#boolean>;
    acdh:hasRawBinarySize  "13039382156"^^<http://www.w3.org/2001/XMLSchema#long>;
    <search://order>       "6"^^<http://www.w3.org/2001/XMLSchema#positiveInteger>;
    <search://orderValue1> "KHM-ANSA-IV431_raw3d.zip";
    acdh:hasTitle          "KHM-ANSA-IV431_raw3d.zip"@en.
<https://arche.acdh.oeaw.ac.at/api/46542>
    <search://match>       "true"^^<http://www.w3.org/2001/XMLSchema#boolean>;
    acdh:hasRawBinarySize   "15927776264"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>;
    <search://order>        "5"^^<http://www.w3.org/2001/XMLSchema#positiveInteger>;
    <search://orderValue1>  "sfm_raw_04-p1.zip";
    acdh:hasTitle           "sfm_raw_04-p1.zip"@de.
(...)

Discussion:

  • The mapping of technical annotation properties to actual URIs follows the schema reported by the https://arche.acdh.oeaw.ac.at/api/describe, so that
    • {repoCfg}$.schema.searchCount is <search://match>,
    • {repoCfg}$.schema.searchOrder is <search://order>,
    • {repoCfg}$.schema.searchOrderValue + {N} is <search://orderValue1>, <search://orderValue2>, etc.
  • All resources matched by the search are marked with <search://match> true. The one which isn’t - <https://arche.acdh.oeaw.ac.at/api/> - is a technical resources used to indicate global search result properties like the number of resources matched by the search (here 7).
  • The requested order can be read from <search://order> property values. The ascending order seems to be kept.
    • If you want an descending order, just prepend the property URI with a ^ in the orderBy[] parameter, e.g. orderBy[0]=^https://vocabs.acdh.oeaw.ac.at/schema#hasTitle.
    • If you want to order by more then one property, provide many orderBy[] request parameters, e.g. orderBy[0]=firstOrderByThisProperty&orderBy[1]=thenOrderByThatProperty.
      • Multiple orderBy[] parameter values are first sorted by their (implicit or explicit) key, e.g. orderBy[foo]=someProp&orderBy[bar]=otherProp will order results first by the otherProp values and only then by the someProp values (because the bar key is smaller than the foo key).
  • Values actually used for ordering are provided in the <search://orderValue1> property.
    • As requested they are just equal to the achd:hasTitle property value of a given resource, just they lack the language tag. Here it’s obvious but things can quickly get complicated if a resource has more than one title
      • see the next chapters.
    • There will be as many <search://orderValueN> properties in the output, as many orderBy[] parameters were provided in the request, e.g. if you requested orderBy[0]=someProp&orderBy[1]=otherProp, the output will contain both <search://orderValue1> (storing someProp values actually used for sorting) and `<search://orderValue2> (storing otherProp values actually used for sorting).
      • The N-th <search://orderValueN> property stores values of the n-th-order sorting property, so e.g. for orderBy[foo]=someProp&orderBy[bar]=otherProp, the <search://orderValue1> provides values of the otherProp and <search://orderValue2> provides values of the someProp.

Collation

Different languages have different opinions on the characters order. It’s possible that the rule used by the ARCHE instance isn’t in line with what you expect. Fortunately there’s can be easily checked and controlled:

  • Inspect the $.collation.default value of the data returned by the /describe REST API endpoint to know what’s the collation used by default by a given ARCHE instance. (e.g. https://arche.acdh.oeaw.ac.at/api/describe reports en_US.UTF-8).
  • Use the orderByCollation request parameter to enforce ordering according to a given collation.
    • Inspect the $.collation.available to get the list of all collations available on a given ARCHE instance.

Multiple values of property used for ordering

A resource may have multiple values of a property used for results ordering. A typical case are labels in multiple languages but you shouldn’t optimistically assume it’s the only possible case. Consult metadata schema to check if a given property may multiple values and if they have a language tag.

In case of multiple property values ARCHE implements two rules:

  • It the orderByLang request property is provided, all values with a non-matching values are excluded.
    • It a value has no language tag (technically speaking if its type is other than rdfs:langString), it’s also included.
    • If there’s no value left for a given resource, it’s ordered as the last.
    • If a property has multiple values with the desired language tag, all of them are taken and the rule from the next point is applied.
    • It’s a global setting. You can’t assign different values for differnt orderBy[] parameter values.
  • The lowest value among the available ones is used for the sorting.
    • This is a fully arbitrary ARCHE’s design decision. We need to pick up a single value, we’re using the lowest one.

E.g. let’s assume we have following resources:

<res1> <hasTitle> "foo" ,
                  "bar"@en ;
       <hasAuthor> "Alice" .
<res2> <hasTitle> "bar"@en ,
                  "baz"@de ;
       <hasAuthor> "John" .
<res3> <otherProp> "placeholder" .

which all match the search. Now,

  • For orderBy[]=hasTitle&orderBy[]=^hasAuthor&orderByLang=en the order will be res2, res1, res3 because:
    • For hasTitle of res1 we take the lowest among foo (qualifies because it has no lang tag) and bar (qualifies because its lang tag matches the orderByLang) giving us with bar.
    • For res2 we skip baz and keep bar because both have lang tag but only the latter matches the orderByLang.
    • For res3 we get nothing, so it ends up at the end of the sort.
    • As res1 as res2 have same sorting order according to the hasTitle we continue to the second order by property for them - hasAuthor. For res1 it’s Alice and for res2 it’s John but as a reverse order was requested (note the ^ in orderBy[]=^hasAuthor), sorting is done in reverse order and res2 comes before res1
  • For orderBy[]=^hasTitle&orderByLang=de the order will be res1, res2, res3 because:
    • For res1 the foo value of the hasTitle is taken (it doesn’t have lang tag)
    • For res2 the baz value of the hasTitle is taken (matches the orderByLang)
    • res3 has no hasTitle so it goes to the end.
    • As the descending order was requested foo comes before baz and therefore res1 before res2.

Lack of value and non-literal values

If a given resource lacks a triple of the orderBy[] property with a literal value, then it’s put at the end of the search results.

By the way it means ARCHE doesn’t allow to order by an object property values. This is because an ARCHE resource may have any number of (equally important) identifiers making it impossible to tell, which one should be used (at least without introducing additional complexity to the API).

Unsupported features

  • Ordering by properties of linked resources.
    Let’s say you want to order by a title of a parent resource and then by a resource title.
    This is currently impossible.
  • Ordering by dynamically created properties.
    Especially when searching with an explicit SQL query you might want to order by a property created on-the-fly.
    This is impossible for security reasons. To prevent leaking any arbitrary data from the database a well-defined barrier between the search query and output is needed and this barrier allows to pass only ids of resources matching the search.

Paging

Search results may be paged. This is controlled by the offset and limit parameters which work exactly how they sound.

Example: fetch 3rd page (10 results per page) of https://hdl.handle.net/21.11115/0000-000E-CE35-F (some Karl Kraus subcollection) ordering them by acdh:hasTitle:

https://arche.acdh.oeaw.ac.at/api/search
  ?property[]=https://vocabs.acdh.oeaw.ac.at/schema#isPartOf
  &value[]=https://hdl.handle.net/21.11115/0000-000E-CE35-F
  &orderBy[]=https://vocabs.acdh.oeaw.ac.at/schema#hasTitle
  &limit=10
  &offset=20
  &readMode=resource
  &format=text/turtle
  
https://arche.acdh.oeaw.ac.at/api/search?property%5B%5D=https%3A%2F%2Fvocabs.acdh.oeaw.ac.at%2Fschema%23isPartOf&value%5B%5D=https%3A%2F%2Fhdl.handle.net%2F21.11115%2F0000-000E-CE35-F&orderBy%5B0%5D=https%3A%2F%2Fvocabs.acdh.oeaw.ac.at%2Fschema%23hasTitle&limit=10&offset=20&readMode=resource&format=text%2Fturtle