Conventions

RDF property URIs are quite often shortened using following prefixes:

acdh https://vocabs.acdh.oeaw.ac.at/schema#
acdhi https://id.acdh.oeaw.ac.at

The {repoCfg}$.X.Y syntax means an $.X.Y JSON path over the repository configuration returned by its describe REST API endpoint, e.g. {repoCfg}$.schema.label on https://arche.acdh.oeaw.ac.at/api resolves to https://vocabs.acdh.oeaw.ac.at/schema#hasTitle.
Full Search URLs examples always come in pairs:
- A human-readable version with non-URL-encoded parameter values and every parameter written in a new line.
- A copy-paste friendly (but human-unreadable) version allowing you to test it easily in a browser/curl/postman/whatsoever.
Short examples of particular API parameters are always provided in non-URL-encoded form (read “just copy-pasting them into the browser/curl may not work”).
Most search URL examples use readMode=resource and format=text/turtle to provide the most human-readable output allowing to focus on the topic being discussed. For a real-world usage you’re likely use a different readMode and/or output format.

General advices

If you haven’t read the using RDF in webapps guide, please do it first. This should allow us to avoid many misconceptions coming from the fact that the ARCHE REST api provides metadata in RDF.
Before making a complex search, think for a moment if what you want can’t be achieved with a readMode. A request using the readMode and a simple search condition will be simpler, easier to understand and is likely to run faster.
If your search is likely to match hundreds of resources and you don’t use paging, you’ll be better off using resourceProperties and relativesProperties parameters - see here.
When querying the API from your app, use rather POST then GET. That way you’ll avoid the risk of hitting the too long request URL issue.

Search workflow

A search API call is handled in a few steps:

Finding resources matching search conditions.
This is done either based on an explicitly given SQL query (with sql and sqlParam[] request parameters) or by so-called search terms build from property[], value[], operator[], type[] and language[] request parameters.
Ordering matched resources.
This is done based on the orderBy[], oderByLang and orderByCollation request parameters and if the orderBy[] isn’t specified, by an internal ARCHE resource id.
Applying paging.
This is done based on the offset and limit request parameters. If they aren’t provided, all matched resources are included.
Generation of technical triples annotating search results.
Applying the readMode, resourceProperties and relativesProperties (read more here and here).

Technical RDF properties provided by the search

The search results are annotated with special technical RDF properties:

subject	property	object value type	object value description
`{restAPIbaseURL}`	`{repoCfg}$.schema.searchCount`	`xsd:integer`	total number of resources matched by the search
`resourceURI`	`{repoCfg}$.schema.searchMatch`	`"true"^^xsd:boolean`	marks resources matching the search (to distinguish them from the ones fetched because of the readMode)
`resourceURI`	`{repoCfg}$.schema.searchOrder`	`xsd:positiveInteger`	order of the resource within the search results according to the `orderBy[]` request parameter(s) - see the Ordering results chapter below - only when the `orderBy[]` request parameter(s) was provided
`resourceURI`	`{repoCfg}$.schema.searchOrderValue{N}`	mixed	actual value of the RDF property indicated by the `orderBy[{N}]` request parameter used for ordering the results - see the Ordering results chapter below - only when the `orderBy[]` request parameter(s) was provided
`resourceURI`	`{repoCfg}$.schema.searchFts{N}`	`xsd:string`	`{N}`-th highlighted full text search match - only when a full text search was performed
`resourceURI`	`{repoCfg}$.schema.searchFtsProperty{N}`	object or `xsd:string`	RDF property of the `{N}`-th full text search match or a `BINARY` literal if match in the binary content - only when a full text search was performed
`resourceURI`	`{repoCfg}$.schema.searchFtsQuery{N}`	`xsd:string`	Full text search highlighting query of the `{N}`-th full text search match - only when a full text search was performed

To see how these properties look in the output, please jump to the example in the ordering results - simple case section.

Specyfying the search condition

With search terms

The simplest way of performing the search is by specifying so-called search terms.

A search term is a condition matching an RDF triple based on triple’s property and/or object. If an ARCHE resource has RDF triples having it as a subject and matching all requested search terms, it matches the search.

A single search term is defined by (almost) any combination of corresponding property[], operator[], value[], type[] and language[] request properties.

The only forbidden combination is specifying only the operator[] as this is not enough information to formulate any condition.
Both property[] and value[] can supply either single or multiple values which are taken as alternatives.
The property[] can be inverted by prepending it with a ^. This implicitly enforces the type[]=URI for the value[], if value[] is specified.
The default operator[] is =.
Default property[], value[], type[] and language[] are “any”.
- The type[] might be implicitly enforced by operator[], presence of language[] or requesting an inversed property[].
When denoting an ARCHE resource as a value[], any identifier of a resource can be used (e.g. https://arche.acdh.oeaw.ac.at/api/23174, https://hdl.handle.net/21.11115/0000-000C-20E3-F, https://id.acdh.oeaw.ac.at/uuid/512c8b7b-1427-4310-8606-43b8faf5619b and https://id.acdh.oeaw.ac.at/ODeeg/Collections/AT-Vienna-KHM/KHM-ANSA-IV3456/3D-data/3Dscan_raw-data/KHM-ANSA-IV3456_raw3d.zip are equally valid ways of relating the same resource).
You have to use full URIs while specifying property[], type[] or URI value[]. ARCHE API doesn’t allow you to define namespace aliases and it doesn’t come with a set of predefined ones.

Examples (for explanation of the brackets syntax see the next chapters):

property[]=https://vocabs.acdh.oeaw.ac.at/schema#isTitleImageOf - find all resources being a title image (recognizing by existence of actual acdh:isTitleOf relation)
value[0][]=foo&value[0][]=bar - find all resources having any property value equal “foo” or “bar”
value[]=foo&operator[]=@@ - find all resources with any property matching a full text search for “foo”
property[0][]=https://vocabs.acdh.oeaw.ac.at/schema#hasTitle&property[0][]=https://vocabs.acdh.oeaw.ac.at/schema#hasDescription&value[]=foo&operator=@@ - find all resources having either acdh:hasTitle or acdh:hasDescription matching a full text search for “foo”
type[]=relation - find all resources having a triple pointing to another resource
property[]=^https://vocabs.acdh.oeaw.ac.at/schema#isPartOf&value[]=https://some.id - find all resources being children of the https://some.id resource
(this can be probably done more efficiently by just using the right readMode on the https://some.id metadata endpoint)
property[]=https://vocabs.acdh.oeaw.ac.at/schema#hasTitle&language[]=ja - find all resources having acdh:hasTitle in Japanese
value[]=https://orcid.org/0000-0001-5853-2534&type[]=relation - _find all resources pointing to the resource with id https://orcid.org/0000-0001-5853-2534_
- In comparison to value[]=https://orcid.org/0000-0001-5853-2534 - _find all resources pointing to the resource with id https://orcid.org/0000-0001-5853-2534 and a resource with id https://orcid.org/0000-0001-5853-2534_ because this search term mathces also the <someResource> <id property> <https://orcid.org/0000-0001-5853-2534> triple.
value[]=POINT (31.8181 30.7884)&operator=&& - find all resources spatially intersecting with the 31.8181E 30.7884N point

If multiple property[]/operator[]/value[]/type[]/language[] parameters are defined by the request, they are grouped into single search terms definitions by the same (implicit or explicit) key. Continue reading for details.

Multiple search terms parsing

When parsing the GET search request or a POST request with the body encoded as application/x-www-form-urlencoded each property[]/operator[]/value[]/type[]/language[] parameter value is assigned a key using following rules:

If the parameter[] syntax is used, the key is assigned automatically by taking the next number after the last existing numeric key.
If the parameter[key] syntax is used, the specified key is used.
- The key can be numeric or string.
- If the specified key already exists, the previous value is overwritten (also if the previously existing key was assign implicitly).

For example:

property[]=x&property=y                 => property: {0: x, 1: y}
property[]=a&property[1]=b&property[]=c => property: {0: a, 1: b, 2: c}
property[2]=a&property[]=b              => property: {2: a, 3: b}
property[foo]=a&property[]=b            => property: {foo: a, 0: b}
property[]=a&property[0]=b              => property: {0: b}

Then parameter values with the same key are grouped to form search terms, e.g. property[0]=x&property[1]=y&value[1]=a results into two search terms:

with the key 0 and condition property[]=x
with the key 1 and condition property[]=y and value[]=a

Passing multiple values to property[] and value[]

property[] and value[] allow to specify a set of allowed values which is interpreted as “any of”.

When using the GET search request or a POST request with the body encoded as application/x-www-form-urlencoded this should be encoded using the parameter[key][]=value1&parameter[key][]=value2&(...) syntax, e.g. value[0][]=foo&value[0][]=bar.

The parameter[key][0]=value1&parameter[key][1]=value2&(...) syntax will also work but the parameter[][]=value1&parameter[][]=value2&(...) syntax won’t (as it will result in parameter: {0: [value1], 1: [value2]} instead of parameter: {0: [value1, value2]})

Preparing a search query using HTTP client libraries

Hopefully most of the time you won’t create ARCHE search API requests by hand but you’ll use some HTTP client library provided by your programming language.

If you are lucky, the library will just do the job for you, e.g.

// jQuery
jQuery.ajax({
  url: 'https://arche.acdh.oeaw.ac.at/api/search',
  method: 'GET', // POST would work equally well
  data: {
    "property":  ["someProp", "otherProp"                           ],
    "value":     [""        , ["otherPropValue1", "otherPropValue2"]],
    "readMode":  "resource",
    "format":    "text/turtle"
  },
  success: function(d) {console.log(d)}
})
// or with explicit keys
jQuery.ajax({
  url: 'https://arche.acdh.oeaw.ac.at/api/search',
  method: 'GET', // POST would work equally well
  data: {
    "property":  {"0": "someProp", "1": "otherProp"},
    "value":     {                 "1": ["otherPropValue1", "otherPropValue2"]},
    "readMode":  "resource",
    "format":    "text/turtle"
  },
  success: function(d) {console.log(d)}
})

// PHP
$searchParam = [
    'property' => ['someProp', 'otherProp'                           ],
    'value'    => [''        , ['otherPropValue1', 'otherPropValue2']],
    'readMode' => 'resource',
    'format'   => 'text/turtle'
];

// just with file_get_contents() - only GET possible
$response = file_get_contents('https://arche.acdh.oeaw.ac.at/api/search?' . http_build_query($searchParam));
echo $response;

// PSR-7 & PSR-18 way provided by Guzzle - both GET and POST possible
$client = new GuzzleHttp\Client();
$getRequest = new GuzzleHttp\Psr7\Request(
  'GET', 
  'https://arche.acdh.oeaw.ac.at/api/search?' . http_build_query($searchParam)
);
$getResponse = $client->sendRequest($getRequest);
echo $getResponse->getBody();
$postRequest = new GuzzleHttp\Psr7\Request(
  'POST', 
  'https://arche.acdh.oeaw.ac.at/api/search',
  ['Content-Type' => 'application/x-www-form-urlencoded'], 
  http_build_query($searchParam)
);
$postResponse = $client->sendRequest($postRequest);
echo $postResponse->getBody();

and if you’re unlucky and your HTTP client library can’t serialize complex objects into URL query (which most notably affects Python users), you need to prepare the request a little more carefully:

# Python with requests
import requests 
resp = requests.get(
  'https://arche.acdh.oeaw.ac.at/api/search',
  params={
    'property[]': ['someProp', 'otherProp'],
    'value[1][]': ['otherPropValue1', 'otherPropValue2'],
    'readMode': 'resource',
    'format': 'text/turtle'
  }
)
print(resp.text)

Last but not least if you’re using PHP, you might want to use the arche-lib which provides object wrappers for the search terms, ARCHE repository, etc.

With explicit SQL query

The search terms are rather simple to use but provide limited flexibility. If your search is to complex for them or if you prefer to use naked SQL, the search API allows you to do it.

You should pass your SQL query using the sql request parameter.
It’s recommended to pass all literal values used in the query trough the sqlParam[] parameter.
- That way you don’t need to care about proper escaping SQL-reserved characters.
- It’s even more important if some values aren’t hardcoded in you app code but come from the user input (that being said a properly set up ARCHE instance will run your query as a user without any data modification rights so there should be no data deletion threat).
Your query must return a column named id which will be matched against ARCHE internal resource ids (the id column in the resources table). All other columns returned by your query will be just discarded.
To plan your query you’ll probably need to take a look at the ARCHE database schema. The most commonly used data reside in three tables (but you might be also interested in the full_text_search and spatial_search tables):
- identifiers (id, ids) storing resource identifiers with id being internal resource id and ids all URI ids of a given resource
- relations(id, target_id, property) storing RDF graph edges
- metadata(mid, id, property, type, lang, value, value_n, value_t) storing all triples with literal values.
  - mid is an internal triple id which is rather useless for you
  - id, property, type, lang and value store triple’s subject, predicate, value type, value lang tag and the value itself
  - value_n stores parsed numeric value for values of numeric types - this column can be used for proper numeric comparison of values
  - value_t like value_n, just for values of type date/datetime
Remember your query is used only for selecting resources matching the search (point 1. in the search workflow).

Example (passing literal values used in the query using the sqlParam[] request parameter):

Find all resources with acdh:hasTitle in Japanese:

https://arche.acdh.oeaw.ac.at/api/search
  ?sql=SELECT id FROM metadata WHERE property = ? AND lang = ?
  &sqlParam[]=https://vocabs.acdh.oeaw.ac.at/schema#hasTitle
  &sqlParam[]=ja
  
https://arche.acdh.oeaw.ac.at/api/search?sql=SELECT%20id%20FROM%20metadata%20WHERE%20property%20%3D%20%3F%20AND%20lang%20%3D%20%3F&sqlParam%5B%5D=https%3A%2F%2Fvocabs.acdh.oeaw.ac.at%2Fschema%23hasTitle&sqlParam%5B%5D=ja

Ordering results

Simple case

Just use the oderbBy[] request parameter coupled, if needed, with the orderByLang parameter.

For example let’s search for all resources bigger than 1.3 GB ordering results by their acdh:hasTitle (for the detailed discussion on API parameters encoding take a look at this section):

https://arche.acdh.oeaw.ac.at/api/search
  ?property[0]=https://vocabs.acdh.oeaw.ac.at/schema#hasRawBinarySize
  &operator[0]=>
  &value[0]=13000000000
  &orderBy[0]=https://vocabs.acdh.oeaw.ac.at/schema#hasTitle
  &readMode=resource
  &format=text/turtle
  
https://arche.acdh.oeaw.ac.at/api/search?property%5B0%5D=https%3A%2F%2Fvocabs.acdh.oeaw.ac.at%2Fschema%23hasRawBinarySize&operator%5B0%5D=%3E&value%5B0%5D=10000000000&orderBy%5B0%5D=https%3A%2F%2Fvocabs.acdh.oeaw.ac.at%2Fschema%23hasTitle&readMode=resource&format=text%2Fturtle

And take a look at the returned data skipping uninteresting properties:

<https://arche.acdh.oeaw.ac.at/api/> <search://count> "7"^^<http://www.w3.org/2001/XMLSchema#integer> .

<https://arche.acdh.oeaw.ac.at/api/23174>
    <search://match>       "true"^^<http://www.w3.org/2001/XMLSchema#boolean>;
    acdh:hasRawBinarySize  "17222812879"^^<http://www.w3.org/2001/XMLSchema#long>;
    <search://order>       "7"^^<http://www.w3.org/2001/XMLSchema#positiveInteger>;
    <search://orderValue1> "KHM-ANSA-IV3456_raw3d.zip";
    acdh:hasTitle          "KHM-ANSA-IV3456_raw3d.zip"@en.
<https://arche.acdh.oeaw.ac.at/api/37779>
    <search://match>       "true"^^<http://www.w3.org/2001/XMLSchema#boolean>;
    acdh:hasRawBinarySize  "13039382156"^^<http://www.w3.org/2001/XMLSchema#long>;
    <search://order>       "6"^^<http://www.w3.org/2001/XMLSchema#positiveInteger>;
    <search://orderValue1> "KHM-ANSA-IV431_raw3d.zip";
    acdh:hasTitle          "KHM-ANSA-IV431_raw3d.zip"@en.
<https://arche.acdh.oeaw.ac.at/api/46542>
    <search://match>       "true"^^<http://www.w3.org/2001/XMLSchema#boolean>;
    acdh:hasRawBinarySize   "15927776264"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>;
    <search://order>        "5"^^<http://www.w3.org/2001/XMLSchema#positiveInteger>;
    <search://orderValue1>  "sfm_raw_04-p1.zip";
    acdh:hasTitle           "sfm_raw_04-p1.zip"@de.
(...)

Discussion:

The mapping of technical annotation properties to actual URIs follows the schema reported by the https://arche.acdh.oeaw.ac.at/api/describe, so that
- {repoCfg}$.schema.searchCount is <search://match>,
- {repoCfg}$.schema.searchOrder is <search://order>,
- {repoCfg}$.schema.searchOrderValue + {N} is <search://orderValue1>, <search://orderValue2>, etc.
All resources matched by the search are marked with <search://match> true. The one which isn’t - <https://arche.acdh.oeaw.ac.at/api/> - is a technical resources used to indicate global search result properties like the number of resources matched by the search (here 7).
The requested order can be read from <search://order> property values. The ascending order seems to be kept.
- If you want an descending order, just prepend the property URI with a ^ in the orderBy[] parameter, e.g. orderBy[0]=^https://vocabs.acdh.oeaw.ac.at/schema#hasTitle.
- If you want to order by more then one property, provide many orderBy[] request parameters, e.g. orderBy[0]=firstOrderByThisProperty&orderBy[1]=thenOrderByThatProperty.
  - Multiple orderBy[] parameter values are first sorted by their (implicit or explicit) key, e.g. orderBy[foo]=someProp&orderBy[bar]=otherProp will order results first by the otherProp values and only then by the someProp values (because the bar key is smaller than the foo key).
Values actually used for ordering are provided in the <search://orderValue1> property.
- As requested they are just equal to the achd:hasTitle property value of a given resource, just they lack the language tag. Here it’s obvious but things can quickly get complicated if a resource has more than one title
  - see the next chapters.
- There will be as many <search://orderValueN> properties in the output, as many orderBy[] parameters were provided in the request, e.g. if you requested orderBy[0]=someProp&orderBy[1]=otherProp, the output will contain both <search://orderValue1> (storing someProp values actually used for sorting) and `<search://orderValue2> (storing otherProp values actually used for sorting).
  - The N-th <search://orderValueN> property stores values of the n-th-order sorting property, so e.g. for orderBy[foo]=someProp&orderBy[bar]=otherProp, the <search://orderValue1> provides values of the otherProp and <search://orderValue2> provides values of the someProp.

Collation

Different languages have different opinions on the characters order. It’s possible that the rule used by the ARCHE instance isn’t in line with what you expect. Fortunately there’s can be easily checked and controlled:

Inspect the $.collation.default value of the data returned by the /describe REST API endpoint to know what’s the collation used by default by a given ARCHE instance. (e.g. https://arche.acdh.oeaw.ac.at/api/describe reports en_US.UTF-8).
Use the orderByCollation request parameter to enforce ordering according to a given collation.
- Inspect the $.collation.available to get the list of all collations available on a given ARCHE instance.

Multiple values of property used for ordering

A resource may have multiple values of a property used for results ordering. A typical case are labels in multiple languages but you shouldn’t optimistically assume it’s the only possible case. Consult metadata schema to check if a given property may multiple values and if they have a language tag.

In case of multiple property values ARCHE implements two rules:

It the orderByLang request property is provided, all values with a non-matching values are excluded.
- It a value has no language tag (technically speaking if its type is other than rdfs:langString), it’s also included.
- If there’s no value left for a given resource, it’s ordered as the last.
- If a property has multiple values with the desired language tag, all of them are taken and the rule from the next point is applied.
- It’s a global setting. You can’t assign different values for differnt orderBy[] parameter values.
The lowest value among the available ones is used for the sorting.
- This is a fully arbitrary ARCHE’s design decision. We need to pick up a single value, we’re using the lowest one.

E.g. let’s assume we have following resources:

<res1> <hasTitle> "foo" ,
                  "bar"@en ;
       <hasAuthor> "Alice" .
<res2> <hasTitle> "bar"@en ,
                  "baz"@de ;
       <hasAuthor> "John" .
<res3> <otherProp> "placeholder" .

which all match the search. Now,

For orderBy[]=hasTitle&orderBy[]=^hasAuthor&orderByLang=en the order will be res2, res1, res3 because:
- For hasTitle of res1 we take the lowest among foo (qualifies because it has no lang tag) and bar (qualifies because its lang tag matches the orderByLang) giving us with bar.
- For res2 we skip baz and keep bar because both have lang tag but only the latter matches the orderByLang.
- For res3 we get nothing, so it ends up at the end of the sort.
- As res1 as res2 have same sorting order according to the hasTitle we continue to the second order by property for them - hasAuthor. For res1 it’s Alice and for res2 it’s John but as a reverse order was requested (note the ^ in orderBy[]=^hasAuthor), sorting is done in reverse order and res2 comes before res1
For orderBy[]=^hasTitle&orderByLang=de the order will be res1, res2, res3 because:
- For res1 the foo value of the hasTitle is taken (it doesn’t have lang tag)
- For res2 the baz value of the hasTitle is taken (matches the orderByLang)
- res3 has no hasTitle so it goes to the end.
- As the descending order was requested foo comes before baz and therefore res1 before res2.

Lack of value and non-literal values

If a given resource lacks a triple of the orderBy[] property with a literal value, then it’s put at the end of the search results.

By the way it means ARCHE doesn’t allow to order by an object property values. This is because an ARCHE resource may have any number of (equally important) identifiers making it impossible to tell, which one should be used (at least without introducing additional complexity to the API).

Unsupported features

Ordering by properties of linked resources.
Let’s say you want to order by a title of a parent resource and then by a resource title.
This is currently impossible.
Ordering by dynamically created properties.
Especially when searching with an explicit SQL query you might want to order by a property created on-the-fly.
This is impossible for security reasons. To prevent leaking any arbitrary data from the database a well-defined barrier between the search query and output is needed and this barrier allows to pass only ids of resources matching the search.

Paging

Search results may be paged. This is controlled by the offset and limit parameters which work exactly how they sound.

You almost for sure want to combine paging with explicit ordering (see the previous chapter).
- It’s technically safe to use paging without providing orderBy[]. In such a case ARCHE resources are ordered by their internal identifiers which doesn’t provide any intuitive order but assures a stable ordering.
You can always check the total number of resources matching the search by inspecting the <restAPIbaseURL> <{repoCfg}$.schema.searchCount> "count" . response triple, e.g.
```
<https://arche.acdh.oeaw.ac.at/api/> <search://count> "7"^^<http://www.w3.org/2001/XMLSchema#integer> .
```
The requested values of the offset and limit aren’t included in the output. ARCHE hopes you can remember them :-)

Example: fetch 3rd page (10 results per page) of https://hdl.handle.net/21.11115/0000-000E-CE35-F (some Karl Kraus subcollection) ordering them by acdh:hasTitle:

https://arche.acdh.oeaw.ac.at/api/search
  ?property[]=https://vocabs.acdh.oeaw.ac.at/schema#isPartOf
  &value[]=https://hdl.handle.net/21.11115/0000-000E-CE35-F
  &orderBy[]=https://vocabs.acdh.oeaw.ac.at/schema#hasTitle
  &limit=10
  &offset=20
  &readMode=resource
  &format=text/turtle
  
https://arche.acdh.oeaw.ac.at/api/search?property%5B%5D=https%3A%2F%2Fvocabs.acdh.oeaw.ac.at%2Fschema%23isPartOf&value%5B%5D=https%3A%2F%2Fhdl.handle.net%2F21.11115%2F0000-000E-CE35-F&orderBy%5B0%5D=https%3A%2F%2Fvocabs.acdh.oeaw.ac.at%2Fschema%23hasTitle&limit=10&offset=20&readMode=resource&format=text%2Fturtle

Full text search

On the resource matching side (1st step of the search workflow) the full text search works pretty intuitively:

Just use the @@ as the operartor[].
If you want to limit search to a given property(ies), use the property[] parameter.
- You can use a special BINARY property name to limit the search to the binary payload of resources.
Remember that what is covered with the full text search index depends on the given repository instance config (see the fullTextSearch section, e.g. of https://github.com/acdh-oeaw/arche-docker-config/blob/arche/yaml/repo.yaml).
- The binary content is indexed only for resources with certain MIME types and only up to a certain size.
- Certain RDF properties might be excluded from the indexing or only certain properties can be indexed.
- For the https://arche.acdh.oeaw.ac.at instance only text/plain, text/xml, text/turtle, text/html, text/csv, application/xml, application/pdf and application/json binary payloads of size up to 1 GB and all metadata properties are being indexed.
For the exact description on how the full text search phrase is parsed please refer to the Postgresql documentation on the websearch_to_tsquery() function here and here.

Examples:

Search for resources containing the Japan-Bibliographie phrase.

https://arche.acdh.oeaw.ac.at/api/search
  ?operator[]=@@
  &value[]=Japan-Bibliographie
  &readMode=resource
  &format=text/turtle

https://arche.acdh.oeaw.ac.at/api/search?operator%5B%5D=%40%40&value%5B%5D=Japan-Bibliographie&readMode=resource&format=text%2fturtle

Search for resources containing Alexandria in their binary payload.

https://arche.acdh.oeaw.ac.at/api/search
  ?operator[]=@@
  &property[]=BINARY
  &value[]=Alexandria
  &readMode=resource
  &format=text/turtle

https://arche.acdh.oeaw.ac.at/api/search?operator%5B%5D=%40%40&property%5B%5D=BINARY&value%5B%5D=Alexandria&readMode=resource&format=text%2fturtle

Highlighting the results

When you use search terms-based search, the highlighting works just out of the box.

The highlighted matches can be found in the <resource> <{repoCfg}$.schema.searchFts{N}> "highlighted text" RDF properties in the output where {N} is a consecutive number from 1 to the number of matched metadata properties, e.g. (for the clarity we skip all resource metadata properties with the resourceProperties[]=propertyWhichDoesNotExist parameter leaving only technical properties generated by the search):

https://arche.acdh.oeaw.ac.at/api/search
  ?operator[]=@@
  &value[]=Japan-Bibliographie
  &readMode=resource
  &format=text/turtle
  &resourceProperties[]=propertyWhichDoesNotExist

https://arche.acdh.oeaw.ac.at/api/search?operator%5B%5D=%40%40&value%5B%5D=Japan-Bibliographie&readMode=resource&format=text%2fturtle&resourceProperties%5B%5D=propertyWhichDoesNotExist

resulting in something like

@prefix n0: <https://arche.acdh.oeaw.ac.at/api/>.
@prefix n1: <search://>.
@prefix n2: <https://arche.acdh.oeaw.ac.at/>.
@prefix n3: <https://vocabs.acdh.oeaw.ac.at/schema#>.

<https://arche.acdh.oeaw.ac.at/api/> n1:count "3"^^<http://www.w3.org/2001/XMLSchema#integer>.
<https://arche.acdh.oeaw.ac.at/api/30465> 
    n1:match "true"^^<http://www.w3.org/2001/XMLSchema#boolean>;
    n1:fts1 "<b>Japan</b>-<b>Bibliographie</b> 1980–2000 (JB 80) – Thesaurus \n       2019-04-04Z \n       https://creativecommons.org/licenses/by/4.0/ \n       496descriptors";
    n1:ftsQuery1 "Japan-Bibliographie";
    n1:ftsProperty1 "BINARY".
<https://arche.acdh.oeaw.ac.at/api/24690>
    n1:match "true"^^<http://www.w3.org/2001/XMLSchema#boolean>;
    n1:fts1 "Deutschsprachige <b>Japan</b>-<b>Bibliographie</b> 1980-2000 Datenbank mit über 30.000 bibliographischen Einträgen"@de;
    n1:ftsQuery1 "Japan-Bibliographie";
    n1:ftsProperty1 n3:hasAlternativeTitle;
    n1:fts2 "<b>Japan</b>-<b>Bibliographie</b> 1980-2000"@de;
    n1:ftsQuery2 "Japan-Bibliographie";
    n1:ftsProperty2 n3:hasTitle.
<https://arche.acdh.oeaw.ac.at/api/40725>
    n1:match "true"^^<http://www.w3.org/2001/XMLSchema#boolean>;
    n1:fts1 "Deutschsprachige <b>Japan</b>-<b>Bibliographie</b> 1980-2000 Datenbank mit über 30.000 bibliographischen Einträgen"@de;
    n1:ftsQuery1 "Japan-Bibliographie";
    n1:ftsProperty1 n3:hasAlternativeTitle.

We can see four technical triples here:

search://match triples marking resources matching the search
search://fts1 and search://fts2 (search://fts{N} in general) triples providing the highlighted search matches
- We can see that in case of two resources there was only a single match and in case of one resource, two matches were found.
search://ftsQuery1 and search://ftsQuery2 (search://ftsQuery{N} in general) triples informing which highlighting query was used to perform the highlighting of a corresponding search://fts{N} triple value.
- Here the value is always the same as we have only single full text search filter but look in advanced example below.
search://ftsProperty1 and search://ftsProperty2 (search://ftsProperty{N} in general) triples informing which metadata property matched the full text search.
- We can see that, depending on the resource, it was either a binary payload, a acdh:hasAlternativeTitle or acdh:hasTitle and in case of one resource there were two matches (both in the acdh:hasAlternativeTitle and the acdh:hasTitle).

In case of multiple full text search filters, highlighting is by default provided for all of them.

E.g. let’s search for resources containing both Japan-Bibliographie and Datenbank phrases (here we also employ the trick to filter out non-technical properties from the output):

https://arche.acdh.oeaw.ac.at/api/search
  ?operator[]=@@
  &value[]=Japan-Bibliographie
  ?operator[]=@@
  &value[]=Datenbank
  &readMode=resource
  &format=text/turtle
  &resourceProperties[]=propertyWhichDoesNotExist

https://arche.acdh.oeaw.ac.at/api/search?operator%5B%5D=%40%40&value%5B%5D=Japan-Bibliographie&operator%5B%5D=%40%40&value%5B%5D=Datenbank&readMode=resource&format=text%2fturtle&resourceProperties%5B%5D=propertyWhichDoesNotExist

resulting in something like

@prefix n0: <https://arche.acdh.oeaw.ac.at/api/>.
@prefix n1: <search://>.
@prefix n2: <https://arche.acdh.oeaw.ac.at/>.
@prefix n3: <https://vocabs.acdh.oeaw.ac.at/schema#>.

<https://arche.acdh.oeaw.ac.at/api/> n1:count "2"^^<http://www.w3.org/2001/XMLSchema#integer>.
<https://arche.acdh.oeaw.ac.at/api/40725> 
    n1:match "true"^^<http://www.w3.org/2001/XMLSchema#boolean>;
    n1:fts1 "<b>Datenbank</b> mit über 30.000 bibliographischen Einträgen, die nach folgenden Kriterien gesammelt wurden:\n\t* Japanbezug\n\t* deutschsprachig\n\t* veröffentlicht"@de;
    n1:ftsQuery1 "Datenbank";
    n1:ftsProperty1 n3:hasDescription;
    n1:fts2 "Deutschsprachige Japan-Bibliographie 1980-2000 <b>Datenbank</b> mit über 30.000 bibliographischen Einträgen"@de;
    n1:ftsQuery2 "Datenbank";
    n1:ftsProperty2 n3:hasAlternativeTitle;
    n1:fts3 "Deutschsprachige <b>Japan</b>-<b>Bibliographie</b> 1980-2000 Datenbank mit über 30.000 bibliographischen Einträgen"@de;
    n1:ftsQuery3 "Japan-Bibliographie";
    n1:ftsProperty3 n3:hasAlternativeTitle.
<https://arche.acdh.oeaw.ac.at/api/24690> 
    n1:match "true"^^<http://www.w3.org/2001/XMLSchema#boolean>;
    n1:fts1 "<b>Japan</b>-<b>Bibliographie</b> 1980-2000"@de;
    n1:ftsQuery1 "Japan-Bibliographie";
    n1:ftsProperty1 n3:hasTitle;
    n1:fts2 "<b>Datenbank</b> mit über 30.000 bibliographischen Einträgen, die nach folgenden Kriterien gesammelt wurden:\n\t* Japanbezug\n\t* deutschsprachig\n\t* veröffentlicht"@de;
    n1:ftsQuery2 "Datenbank";
    n1:ftsProperty2 n3:hasDescription;
    n1:fts3 "Deutschsprachige Japan-Bibliographie 1980-2000 <b>Datenbank</b> mit über 30.000 bibliographischen Einträgen"@de;
    n1:ftsQuery3 "Datenbank";
    n1:ftsProperty3 n3:hasAlternativeTitle;
    n1:fts4 "Deutschsprachige <b>Japan</b>-<b>Bibliographie</b> 1980-2000 Datenbank mit über 30.000 bibliographischen Einträgen"@de;
    n1:ftsQuery4 "Japan-Bibliographie";
    n1:ftsProperty4 n3:hasAlternativeTitle.

As we can see there are more highlighted results provided now and the search://ftsQuery{N} properties can be useful to determine which highlighted phrase comes from which full text search query.

Adjusting the highlighting configuration

If you want to adjust the way the highlighting is performed, please read this documentation first and then provide the desired configuration values using ftsStartSel[], ftsStopSel[], ftsMinWords[], ftsMaxWords[], ftsShortWord[], ftsHighlightAll[], ftsMaxFragments[] and ftsFragmentDelimiter[] request parameters, e.g. to change the default <b> tag used for highlighting to the <em> one:

https://arche.acdh.oeaw.ac.at/api/search
  ?operator[]=@@
  &value[]=Japan-Bibliographie
  &readMode=resource
  &format=text/turtle
  &ftsStartSel[]=<em>
  &ftsStopSel[]=</em>

https://arche.acdh.oeaw.ac.at/api/search?operator%5B%5D=%40%40&value%5B%5D=Japan-Bibliographie&readMode=resource&format=text%2fturtle&ftsStartSel%5B%5D=%3Cem%3E&ftsStopSel%5B%5D=%3C%2Fem%3E

In case of multiple full text search filters, parameters can be specified separately for each of them, e.g. to highlight the Japan-Bibliographie matches with <em> and Datenbank matches with <b>:

https://arche.acdh.oeaw.ac.at/api/search
  ?operator[]=@@
  &value[]=Japan-Bibliographie
  &ftsStartSel[]=<em>
  &ftsStopSel[]=</em>
  &operator[]=@@
  &value[]=Datenbank
  &ftsStartSel[]=<b>
  &ftsStopSel[]=</b>
  &readMode=resource
  &format=text/turtle

https://arche.acdh.oeaw.ac.at/api/search?operator%5B%5D=%40%40&value%5B%5D=Japan-Bibliographie&ftsStartSel%5B%5D=%3Cem%3E&ftsStopSel%5B%5D=%3C%2Fem%3E&operator%5B%5D=%40%40&value%5B%5D=Datenbank&ftsStartSel%5B%5D=%3Cb%3E&ftsStopSel%5B%5D=%3C%2Fb%3E&readMode=resource&format=text%2fturtle

Last but not least the query(ies) used to perform the highlighting can be specified explicitly using the ftsQuery[] parameter and properties to which they are applied can be limited with the ftsProperty[] parameter. This is particularly useful when performing an SQL query-based search (see below) but can be also used for some advanced scenarios.

If the ftsQuery[] parameter is provided it overrides highlighting queries extracted from the search terms, e.g. to search for resources containing both Japan-Bibliographie and 2000 phrases but highlight only Japan-Bibliographie phrase matches and only in acdh:hasTitle and acdh:hasAlternativeTitle metadata property:

https://arche.acdh.oeaw.ac.at/api/search
  ?operator[]=@@
  &value[]=Japan-Bibliographie
  &operator[]=@@
  &value[]=2000
  &ftsQuery[]=Japan-Bibliographie
  &ftsProperty[0][0]=https://vocabs.acdh.oeaw.ac.at/schema#hasTitle
  &ftsProperty[0][1]=https://vocabs.acdh.oeaw.ac.at/schema#hasAlternativeTitle
  &readMode=resource
  &format=text/turtle
  
https://arche.acdh.oeaw.ac.at/api/search?operator%5B%5D=%40%40&value%5B%5D=Japan-Bibliographie&operator%5B%5D=%40%40&value%5B%5D=2000&ftsQuery%5B%5D=Japan-Bibliographie&ftsProperty%5B0%5D%5B0%5D=https%3A%2F%2Fvocabs.acdh.oeaw.ac.at%2Fschema%23hasTitle&ftsProperty%5B0%5D%5B1%5D=https%3A%2F%2Fvocabs.acdh.oeaw.ac.at%2Fschema%23hasAlternativeTitle&readMode=resource&format=text%2fturtle

It is worth noting that in this case:

The syntax for specyfying multiple metadata properties to be used for the highlighting is just the same as for filtering using the search terms (a nested array).
We have one resource matched by the search with no highlighted results (because the match was in the binary and we limited highlighting to acdh:hasTitle and acdh:hasAlternativeTitle metadata properties).

Full text search within an SQL query search

You can also use the full text search while performing as SQL-based search. The SQL query for performing the full text search goes as follows:

SELECT coalesce(fts.id, iid, m.id) AS id
FROM full_text_search fts LEFT JOIN metadata m USING (mid)
WHERE websearch_to_tsquery('simple', 'SEARCH PHRASE') @@ segments

If you want to limit the search to a given property or a binary content, you should add to the WHERE clause:

AND fts.id IS NOT NULL for searching only in the binary content
AND property IN ('list', 'of', 'allowed', 'properties') for searching only in given metadata properties other than resource identifiers
AND iid IS NOT NULL for searching in resource identifiers

E.g. to search for resources containing Alexandria in their binary payload.

https://arche.acdh.oeaw.ac.at/api/search
  ?sql=SELECT coalesce(fts.id, iid, m.id) AS id
       FROM full_text_search fts LEFT JOIN metadata m USING (mid)
       WHERE websearch_to_tsquery('simple', ?) @@ segments
             AND fts.id IS NOT NULL
  &sqlParam[]=Alexandria
  &readMode=resource
  &format=text/turtle

https://arche.acdh.oeaw.ac.at/api/search?sql=SELECT%20coalesce%28fts.id%2C%20iid%2C%20m.id%29%20AS%20id%20FROM%20full_text_search%20fts%20LEFT%20JOIN%20metadata%20m%20USING%20%28mid%29%20WHERE%20websearch_to_tsquery%28%27simple%27%2C%20%3F%29%20%40%40%20segments%20AND%20fts.id%20IS%20NOT%20NULL&sqlParam%5B%5D=Alexandria&readMode=resource&format=text%2fturtle

To highlight full text search matches while using the SQL query search, the highlighting phrase has to be specified using the ftsQuery[] parameter and if you limited the search to particular properties, you should limit highlighting accordingly using the ftsProperty[] parameter.

E.g. to search for resources containing Alexandria in their binary payload with highlighting of the matching phrases:

https://arche.acdh.oeaw.ac.at/api/search
  ?sql=SELECT coalesce(fts.id, iid, m.id) AS id
       FROM full_text_search fts LEFT JOIN metadata m USING (mid)
       WHERE websearch_to_tsquery('simple', ?) @@ segments
             AND fts.id IS NOT NULL
  &sqlParam[]=Alexandria
  &ftsQuery[]=Alexandria
  &ftsProperty[]=BINARY
  &readMode=resource
  &format=text/turtle

https://arche.acdh.oeaw.ac.at/api/search?sql=SELECT%20coalesce%28fts.id%2C%20iid%2C%20m.id%29%20AS%20id%20FROM%20full_text_search%20fts%20LEFT%20JOIN%20metadata%20m%20USING%20%28mid%29%20WHERE%20websearch_to_tsquery%28%27simple%27%2C%20%3F%29%20%40%40%20segments%20AND%20fts.id%20IS%20NOT%20NULL&sqlParam%5B%5D=Alexandria&ftsQuery%5B%5D=Alexandria&ftsProperty%5B%5D=BINARY&readMode=resource&format=text%2fturtle

ARCHE Search API for programmers

2023-08-25