RDF property URIs are quite often shortened using following prefixes:
acdh https://vocabs.acdh.oeaw.ac.at/schema#
acdhi https://id.acdh.oeaw.ac.at
The {repoCfg}$.X.Y
syntax means an
$.X.Y
JSON
path over the repository configuration returned by its describe
REST API endpoint, e.g. {repoCfg}$.schema.label
on https://arche.acdh.oeaw.ac.at/api resolves to
https://vocabs.acdh.oeaw.ac.at/schema#hasTitle
.
Full Search URLs examples always come in pairs:
Short examples of particular API parameters are always provided in non-URL-encoded form (read “just copy-pasting them into the browser/curl may not work”).
Most search URL examples use readMode=resource
and
format=text/turtle
to provide the most human-readable
output allowing to focus on the topic being discussed. For a real-world
usage you’re likely use a different readMode
and/or output
format.
readMode
and a simple search condition
will be simpler, easier to understand and is likely to run faster.A search API call is handled in a few steps:
sql
and sqlParam[]
request parameters) or by
so-called search terms build from property[]
,
value[]
, operator[]
, type[]
and
language[]
request parameters.orderBy[]
,
oderByLang
and orderByCollation
request
parameters and if the orderBy[]
isn’t specified, by an
internal ARCHE resource id.offset
and limit
request parameters. If they aren’t provided, all matched resources are
included.readMode
, resourceProperties
and relativesProperties
(read more here
and here).The search results are annotated with special technical RDF properties:
subject | property | object value type | object value description |
---|---|---|---|
{restAPIbaseURL} |
{repoCfg}$.schema.searchCount |
xsd:integer |
total number of resources matched by the search |
resourceURI |
{repoCfg}$.schema.searchMatch |
"true"^^xsd:boolean |
marks resources matching the search (to distinguish them from the ones fetched because of the readMode) |
resourceURI |
{repoCfg}$.schema.searchOrder |
xsd:positiveInteger |
order of the resource within the search results according to the
orderBy[] request parameter(s) - see the Ordering results chapter below - only when
the orderBy[] request parameter(s) was provided |
resourceURI |
{repoCfg}$.schema.searchOrderValue{N} |
mixed | actual value of the RDF property indicated by the
orderBy[{N}] request parameter used for ordering the
results - see the Ordering results
chapter below - only when the orderBy[] request
parameter(s) was provided |
resourceURI |
{repoCfg}$.schema.searchFts{N} |
xsd:string |
{N} -th highlighted full text search match - only when a
full text search was performed |
resourceURI |
{repoCfg}$.schema.searchFtsProperty{N} |
object or xsd:string |
RDF property of the {N} -th full text search match or a
BINARY literal if match in the binary content - only when a
full text search was performed |
resourceURI |
{repoCfg}$.schema.searchFtsQuery{N} |
xsd:string |
Full text search highlighting query of the {N} -th full
text search match - only when a full text search was performed |
To see how these properties look in the output, please jump to the example in the ordering results - simple case section.
The simplest way of performing the search is by specifying so-called search terms.
A search term is a condition matching an RDF triple based on triple’s property and/or object. If an ARCHE resource has RDF triples having it as a subject and matching all requested search terms, it matches the search.
A single search term is defined by (almost) any combination
of corresponding property[]
, operator[]
,
value[]
, type[]
and language[]
request properties.
operator[]
as this is not enough information to formulate
any condition.property[]
and value[]
can supply
either single or multiple values which are taken as alternatives.property[]
can be inverted by prepending it with a
^
. This implicitly enforces the type[]=URI
for
the value[]
, if value[]
is specified.operator[]
is =
.property[]
, value[]
,
type[]
and language[]
are “any”.
type[]
might be implicitly enforced by
operator[]
, presence of language[]
or
requesting an inversed property[]
.value[]
, any
identifier of a resource can be used
(e.g. https://arche.acdh.oeaw.ac.at/api/23174
,
https://hdl.handle.net/21.11115/0000-000C-20E3-F
,
https://id.acdh.oeaw.ac.at/uuid/512c8b7b-1427-4310-8606-43b8faf5619b
and
https://id.acdh.oeaw.ac.at/ODeeg/Collections/AT-Vienna-KHM/KHM-ANSA-IV3456/3D-data/3Dscan_raw-data/KHM-ANSA-IV3456_raw3d.zip
are equally valid ways of relating the same resource).property[]
, type[]
or URI
value[]
. ARCHE API doesn’t allow you to define
namespace aliases and it doesn’t come with a set of predefined
ones.Examples (for explanation of the brackets syntax see the next chapters):
property[]=https://vocabs.acdh.oeaw.ac.at/schema#isTitleImageOf
- find all resources being a title image (recognizing by existence
of actual acdh:isTitleOf relation)value[0][]=foo&value[0][]=bar
- find all
resources having any property value equal “foo” or “bar”value[]=foo&operator[]=@@
- find all resources
with any property matching a full text search for “foo”property[0][]=https://vocabs.acdh.oeaw.ac.at/schema#hasTitle&property[0][]=https://vocabs.acdh.oeaw.ac.at/schema#hasDescription&value[]=foo&operator=@@
- find all resources having either acdh:hasTitle or
acdh:hasDescription matching a full text search for “foo”type[]=relation
- find all resources having a
triple pointing to another resourceproperty[]=^https://vocabs.acdh.oeaw.ac.at/schema#isPartOf&value[]=https://some.id
- find all resources being children of the https://some.id resourceproperty[]=https://vocabs.acdh.oeaw.ac.at/schema#hasTitle&language[]=ja
- find all resources having acdh:hasTitle in Japanesevalue[]=https://orcid.org/0000-0001-5853-2534&type[]=relation
- _find all resources pointing to the resource with id https://orcid.org/0000-0001-5853-2534_
value[]=https://orcid.org/0000-0001-5853-2534
- _find all
resources pointing to the resource with id https://orcid.org/0000-0001-5853-2534 and a resource
with id https://orcid.org/0000-0001-5853-2534_ because this
search term mathces also the
<someResource> <id property> <https://orcid.org/0000-0001-5853-2534>
triple.value[]=POINT (31.8181 30.7884)&operator=&&
- find all resources spatially intersecting with the 31.8181E
30.7884N pointIf multiple
property[]
/operator[]
/value[]
/type[]
/language[]
parameters are defined by the request, they are grouped into single
search terms definitions by the same (implicit or explicit)
key. Continue reading for details.
When parsing the GET search request or a POST request with the body
encoded as application/x-www-form-urlencoded
each
property[]
/operator[]
/value[]
/type[]
/language[]
parameter value is assigned a key using following rules:
parameter[]
syntax is used, the key is assigned
automatically by taking the next number after the last existing numeric
key.parameter[key]
syntax is used, the specified key
is used.
For example:
property[]=x&property=y => property: {0: x, 1: y}
property[]=a&property[1]=b&property[]=c => property: {0: a, 1: b, 2: c}
property[2]=a&property[]=b => property: {2: a, 3: b}
property[foo]=a&property[]=b => property: {foo: a, 0: b}
property[]=a&property[0]=b => property: {0: b}
Then parameter values with the same key are grouped to form
search terms, e.g.
property[0]=x&property[1]=y&value[1]=a
results into
two search terms:
0
and condition
property[]=x
1
and condition property[]=y
and value[]=a
property[]
and value[]
allow to specify a
set of allowed values which is interpreted as “any of”.
When using the GET search request or a POST request with the body
encoded as application/x-www-form-urlencoded
this should be
encoded using the
parameter[key][]=value1¶meter[key][]=value2&(...)
syntax, e.g. value[0][]=foo&value[0][]=bar
.
parameter[key][0]=value1¶meter[key][1]=value2&(...)
syntax will also work but the
parameter[][]=value1¶meter[][]=value2&(...)
syntax won’t (as it will result in
parameter: {0: [value1], 1: [value2]}
instead of
parameter: {0: [value1, value2]}
)Hopefully most of the time you won’t create ARCHE search API requests by hand but you’ll use some HTTP client library provided by your programming language.
If you are lucky, the library will just do the job for you, e.g.
// jQuery
jQuery.ajax({
url: 'https://arche.acdh.oeaw.ac.at/api/search',
method: 'GET', // POST would work equally well
data: {
"property": ["someProp", "otherProp" ],
"value": ["" , ["otherPropValue1", "otherPropValue2"]],
"readMode": "resource",
"format": "text/turtle"
},
success: function(d) {console.log(d)}
})
// or with explicit keys
jQuery.ajax({
url: 'https://arche.acdh.oeaw.ac.at/api/search',
method: 'GET', // POST would work equally well
data: {
"property": {"0": "someProp", "1": "otherProp"},
"value": { "1": ["otherPropValue1", "otherPropValue2"]},
"readMode": "resource",
"format": "text/turtle"
},
success: function(d) {console.log(d)}
})
// PHP
$searchParam = [
'property' => ['someProp', 'otherProp' ],
'value' => ['' , ['otherPropValue1', 'otherPropValue2']],
'readMode' => 'resource',
'format' => 'text/turtle'
];
// just with file_get_contents() - only GET possible
$response = file_get_contents('https://arche.acdh.oeaw.ac.at/api/search?' . http_build_query($searchParam));
echo $response;
// PSR-7 & PSR-18 way provided by Guzzle - both GET and POST possible
$client = new GuzzleHttp\Client();
$getRequest = new GuzzleHttp\Psr7\Request(
'GET',
'https://arche.acdh.oeaw.ac.at/api/search?' . http_build_query($searchParam)
);
$getResponse = $client->sendRequest($getRequest);
echo $getResponse->getBody();
$postRequest = new GuzzleHttp\Psr7\Request(
'POST',
'https://arche.acdh.oeaw.ac.at/api/search',
['Content-Type' => 'application/x-www-form-urlencoded'],
http_build_query($searchParam)
);
$postResponse = $client->sendRequest($postRequest);
echo $postResponse->getBody();
and if you’re unlucky and your HTTP client library can’t serialize complex objects into URL query (which most notably affects Python users), you need to prepare the request a little more carefully:
# Python with requests
import requests
resp = requests.get(
'https://arche.acdh.oeaw.ac.at/api/search',
params={
'property[]': ['someProp', 'otherProp'],
'value[1][]': ['otherPropValue1', 'otherPropValue2'],
'readMode': 'resource',
'format': 'text/turtle'
}
)
print(resp.text)
Last but not least if you’re using PHP, you might want to use the arche-lib which provides object wrappers for the search terms, ARCHE repository, etc.
The search terms are rather simple to use but provide limited flexibility. If your search is to complex for them or if you prefer to use naked SQL, the search API allows you to do it.
sql
request
parameter.sqlParam[]
parameter.
id
which will be
matched against ARCHE internal resource ids (the id
column
in the resources
table). All other columns returned by your
query will be just discarded.full_text_search
and spatial_search
tables):
identifiers (id, ids)
storing resource identifiers with
id
being internal resource id and ids
all URI
ids of a given resourcerelations(id, target_id, property)
storing RDF graph
edgesmetadata(mid, id, property, type, lang, value, value_n, value_t)
storing all triples with literal values.
mid
is an internal triple id which is rather useless
for youid
, property
, type
,
lang
and value
store triple’s subject,
predicate, value type, value lang tag and the value itselfvalue_n
stores parsed numeric value for values of
numeric types - this column can be used for proper numeric comparison of
valuesvalue_t
like value_n
, just for values of
type date/datetimeExample (passing literal values used in the query using the
sqlParam[]
request parameter):
Find all resources with acdh:hasTitle in Japanese:
https://arche.acdh.oeaw.ac.at/api/search
?sql=SELECT id FROM metadata WHERE property = ? AND lang = ?
&sqlParam[]=https://vocabs.acdh.oeaw.ac.at/schema#hasTitle
&sqlParam[]=ja
https://arche.acdh.oeaw.ac.at/api/search?sql=SELECT%20id%20FROM%20metadata%20WHERE%20property%20%3D%20%3F%20AND%20lang%20%3D%20%3F&sqlParam%5B%5D=https%3A%2F%2Fvocabs.acdh.oeaw.ac.at%2Fschema%23hasTitle&sqlParam%5B%5D=ja
Just use the oderbBy[]
request parameter coupled, if
needed, with the orderByLang
parameter.
For example let’s search for all resources bigger than 1.3 GB
ordering results by their acdh:hasTitle
(for the detailed
discussion on API parameters encoding take a look at this section):
https://arche.acdh.oeaw.ac.at/api/search
?property[0]=https://vocabs.acdh.oeaw.ac.at/schema#hasRawBinarySize
&operator[0]=>
&value[0]=13000000000
&orderBy[0]=https://vocabs.acdh.oeaw.ac.at/schema#hasTitle
&readMode=resource
&format=text/turtle
https://arche.acdh.oeaw.ac.at/api/search?property%5B0%5D=https%3A%2F%2Fvocabs.acdh.oeaw.ac.at%2Fschema%23hasRawBinarySize&operator%5B0%5D=%3E&value%5B0%5D=10000000000&orderBy%5B0%5D=https%3A%2F%2Fvocabs.acdh.oeaw.ac.at%2Fschema%23hasTitle&readMode=resource&format=text%2Fturtle
And take a look at the returned data skipping uninteresting properties:
<https://arche.acdh.oeaw.ac.at/api/> <search://count> "7"^^<http://www.w3.org/2001/XMLSchema#integer> .
<https://arche.acdh.oeaw.ac.at/api/23174>
<search://match> "true"^^<http://www.w3.org/2001/XMLSchema#boolean>;
acdh:hasRawBinarySize "17222812879"^^<http://www.w3.org/2001/XMLSchema#long>;
<search://order> "7"^^<http://www.w3.org/2001/XMLSchema#positiveInteger>;
<search://orderValue1> "KHM-ANSA-IV3456_raw3d.zip";
acdh:hasTitle "KHM-ANSA-IV3456_raw3d.zip"@en.
<https://arche.acdh.oeaw.ac.at/api/37779>
<search://match> "true"^^<http://www.w3.org/2001/XMLSchema#boolean>;
acdh:hasRawBinarySize "13039382156"^^<http://www.w3.org/2001/XMLSchema#long>;
<search://order> "6"^^<http://www.w3.org/2001/XMLSchema#positiveInteger>;
<search://orderValue1> "KHM-ANSA-IV431_raw3d.zip";
acdh:hasTitle "KHM-ANSA-IV431_raw3d.zip"@en.
<https://arche.acdh.oeaw.ac.at/api/46542>
<search://match> "true"^^<http://www.w3.org/2001/XMLSchema#boolean>;
acdh:hasRawBinarySize "15927776264"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>;
<search://order> "5"^^<http://www.w3.org/2001/XMLSchema#positiveInteger>;
<search://orderValue1> "sfm_raw_04-p1.zip";
acdh:hasTitle "sfm_raw_04-p1.zip"@de.
(...)
Discussion:
{repoCfg}$.schema.searchCount
is
<search://match>
,{repoCfg}$.schema.searchOrder
is
<search://order>
,{repoCfg}$.schema.searchOrderValue
+ {N}
is <search://orderValue1>
,
<search://orderValue2>
, etc.<search://match> true
. The one which isn’t -
<https://arche.acdh.oeaw.ac.at/api/>
- is a technical
resources used to indicate global search result properties like the
number of resources matched by the search (here 7).<search://order>
property values. The ascending order
seems to be kept.
^
in the orderBy[]
parameter,
e.g. orderBy[0]=^https://vocabs.acdh.oeaw.ac.at/schema#hasTitle
.orderBy[]
request parameters,
e.g. orderBy[0]=firstOrderByThisProperty&orderBy[1]=thenOrderByThatProperty
.
orderBy[]
parameter values are first sorted by
their (implicit or explicit) key, e.g.
orderBy[foo]=someProp&orderBy[bar]=otherProp
will order
results first by the otherProp
values and only then by the
someProp
values (because the bar
key is
smaller than the foo
key).<search://orderValue1>
property.
achd:hasTitle
property value of a given resource, just they lack the language tag.
Here it’s obvious but things can quickly get complicated if a resource
has more than one title
<search://orderValueN>
properties in the output, as many orderBy[]
parameters were
provided in the request, e.g. if you requested
orderBy[0]=someProp&orderBy[1]=otherProp
, the output
will contain both <search://orderValue1>
(storing
someProp
values actually used for sorting) and
`<search://orderValue2>
(storing
otherProp
values actually used for sorting).
<search://orderValueN>
property stores
values of the n-th-order sorting property, so e.g. for
orderBy[foo]=someProp&orderBy[bar]=otherProp
, the
<search://orderValue1>
provides values of the
otherProp
and <search://orderValue2>
provides values of the someProp
.Different languages have different opinions on the characters order. It’s possible that the rule used by the ARCHE instance isn’t in line with what you expect. Fortunately there’s can be easily checked and controlled:
$.collation.default
value of the data
returned by the /describe
REST API endpoint to know what’s
the collation used by default by a given ARCHE instance. (e.g. https://arche.acdh.oeaw.ac.at/api/describe reports
en_US.UTF-8
).orderByCollation
request parameter to enforce
ordering according to a given collation.
$.collation.available
to get the list of
all collations available on a given ARCHE instance.A resource may have multiple values of a property used for results ordering. A typical case are labels in multiple languages but you shouldn’t optimistically assume it’s the only possible case. Consult metadata schema to check if a given property may multiple values and if they have a language tag.
In case of multiple property values ARCHE implements two rules:
orderByLang
request property is provided, all
values with a non-matching values are excluded.
rdfs:langString
), it’s also included.orderBy[]
parameter values.E.g. let’s assume we have following resources:
<res1> <hasTitle> "foo" ,
"bar"@en ;
<hasAuthor> "Alice" .
<res2> <hasTitle> "bar"@en ,
"baz"@de ;
<hasAuthor> "John" .
<res3> <otherProp> "placeholder" .
which all match the search. Now,
orderBy[]=hasTitle&orderBy[]=^hasAuthor&orderByLang=en
the order will be res2
, res1
,
res3
because:
hasTitle
of res1
we take the lowest
among foo
(qualifies because it has no lang tag) and
bar
(qualifies because its lang tag matches the
orderByLang
) giving us with bar
.res2
we skip baz
and keep
bar
because both have lang tag but only the latter matches
the orderByLang
.res3
we get nothing, so it ends up at the end of
the sort.res1
as res2
have same sorting order
according to the hasTitle
we continue to the second order
by property for them - hasAuthor
. For res1
it’s Alice
and for res2
it’s John
but as a reverse order was requested (note the ^
in
orderBy[]=^hasAuthor
), sorting is done in reverse order and
res2
comes before res1
orderBy[]=^hasTitle&orderByLang=de
the order
will be res1
, res2
, res3
because:
res1
the foo
value of the
hasTitle
is taken (it doesn’t have lang tag)res2
the baz
value of the
hasTitle
is taken (matches the
orderByLang
)res3
has no hasTitle
so it goes to the
end.foo
comes before
baz
and therefore res1
before
res2
.If a given resource lacks a triple of the orderBy[]
property with a literal value, then it’s put at the end
of the search results.
By the way it means ARCHE doesn’t allow to order by an object property values. This is because an ARCHE resource may have any number of (equally important) identifiers making it impossible to tell, which one should be used (at least without introducing additional complexity to the API).
Search results may be paged. This is controlled by the
offset
and limit
parameters which work exactly
how they sound.
You almost for sure want to combine paging with explicit ordering (see the previous chapter).
orderBy[]
. In such a case ARCHE resources are ordered by
their internal identifiers which doesn’t provide any intuitive order but
assures a stable ordering.You can always check the total number of resources matching the
search by inspecting the
<restAPIbaseURL> <{repoCfg}$.schema.searchCount> "count" .
response triple, e.g.
<https://arche.acdh.oeaw.ac.at/api/> <search://count> "7"^^<http://www.w3.org/2001/XMLSchema#integer> .
The requested values of the offset
and
limit
aren’t included in the output. ARCHE hopes you can
remember them :-)
Example: fetch 3rd page (10 results per page) of https://hdl.handle.net/21.11115/0000-000E-CE35-F (some Karl Kraus subcollection) ordering them by acdh:hasTitle:
https://arche.acdh.oeaw.ac.at/api/search
?property[]=https://vocabs.acdh.oeaw.ac.at/schema#isPartOf
&value[]=https://hdl.handle.net/21.11115/0000-000E-CE35-F
&orderBy[]=https://vocabs.acdh.oeaw.ac.at/schema#hasTitle
&limit=10
&offset=20
&readMode=resource
&format=text/turtle
https://arche.acdh.oeaw.ac.at/api/search?property%5B%5D=https%3A%2F%2Fvocabs.acdh.oeaw.ac.at%2Fschema%23isPartOf&value%5B%5D=https%3A%2F%2Fhdl.handle.net%2F21.11115%2F0000-000E-CE35-F&orderBy%5B0%5D=https%3A%2F%2Fvocabs.acdh.oeaw.ac.at%2Fschema%23hasTitle&limit=10&offset=20&readMode=resource&format=text%2Fturtle
On the resource matching side (1st step of the search workflow) the full text search works pretty intuitively:
@@
as the operartor[]
.property[]
parameter.
BINARY
property name to limit the
search to the binary payload of resources.fullTextSearch
section, e.g. of https://github.com/acdh-oeaw/arche-docker-config/blob/arche/yaml/repo.yaml).
text/plain
, text/xml
,
text/turtle
, text/html
, text/csv
,
application/xml
, application/pdf
and
application/json
binary payloads of size up to 1 GB and all
metadata properties are being indexed.websearch_to_tsquery()
function here
and here.Examples:
Search for resources containing the Japan-Bibliographie phrase.
https://arche.acdh.oeaw.ac.at/api/search
?operator[]=@@
&value[]=Japan-Bibliographie
&readMode=resource
&format=text/turtle
https://arche.acdh.oeaw.ac.at/api/search?operator%5B%5D=%40%40&value%5B%5D=Japan-Bibliographie&readMode=resource&format=text%2fturtle
Search for resources containing Alexandria in their binary payload.
https://arche.acdh.oeaw.ac.at/api/search
?operator[]=@@
&property[]=BINARY
&value[]=Alexandria
&readMode=resource
&format=text/turtle
https://arche.acdh.oeaw.ac.at/api/search?operator%5B%5D=%40%40&property%5B%5D=BINARY&value%5B%5D=Alexandria&readMode=resource&format=text%2fturtle
When you use search terms-based search, the highlighting works just out of the box.
The highlighted matches can be found in the
<resource> <{repoCfg}$.schema.searchFts{N}> "highlighted text"
RDF properties in the output where {N}
is a consecutive
number from 1 to the number of matched metadata properties, e.g. (for
the clarity we skip all resource metadata properties with the
resourceProperties[]=propertyWhichDoesNotExist
parameter
leaving only technical properties generated by the search):
https://arche.acdh.oeaw.ac.at/api/search
?operator[]=@@
&value[]=Japan-Bibliographie
&readMode=resource
&format=text/turtle
&resourceProperties[]=propertyWhichDoesNotExist
https://arche.acdh.oeaw.ac.at/api/search?operator%5B%5D=%40%40&value%5B%5D=Japan-Bibliographie&readMode=resource&format=text%2fturtle&resourceProperties%5B%5D=propertyWhichDoesNotExist
resulting in something like
@prefix n0: <https://arche.acdh.oeaw.ac.at/api/>.
@prefix n1: <search://>.
@prefix n2: <https://arche.acdh.oeaw.ac.at/>.
@prefix n3: <https://vocabs.acdh.oeaw.ac.at/schema#>.
<https://arche.acdh.oeaw.ac.at/api/> n1:count "3"^^<http://www.w3.org/2001/XMLSchema#integer>.
<https://arche.acdh.oeaw.ac.at/api/30465>
n1:match "true"^^<http://www.w3.org/2001/XMLSchema#boolean>;
n1:fts1 "<b>Japan</b>-<b>Bibliographie</b> 1980–2000 (JB 80) – Thesaurus \n 2019-04-04Z \n https://creativecommons.org/licenses/by/4.0/ \n 496descriptors";
n1:ftsQuery1 "Japan-Bibliographie";
n1:ftsProperty1 "BINARY".
<https://arche.acdh.oeaw.ac.at/api/24690>
n1:match "true"^^<http://www.w3.org/2001/XMLSchema#boolean>;
n1:fts1 "Deutschsprachige <b>Japan</b>-<b>Bibliographie</b> 1980-2000 Datenbank mit über 30.000 bibliographischen Einträgen"@de;
n1:ftsQuery1 "Japan-Bibliographie";
n1:ftsProperty1 n3:hasAlternativeTitle;
n1:fts2 "<b>Japan</b>-<b>Bibliographie</b> 1980-2000"@de;
n1:ftsQuery2 "Japan-Bibliographie";
n1:ftsProperty2 n3:hasTitle.
<https://arche.acdh.oeaw.ac.at/api/40725>
n1:match "true"^^<http://www.w3.org/2001/XMLSchema#boolean>;
n1:fts1 "Deutschsprachige <b>Japan</b>-<b>Bibliographie</b> 1980-2000 Datenbank mit über 30.000 bibliographischen Einträgen"@de;
n1:ftsQuery1 "Japan-Bibliographie";
n1:ftsProperty1 n3:hasAlternativeTitle.
We can see four technical triples here:
search://match
triples marking resources matching the
searchsearch://fts1
and search://fts2
(search://fts{N}
in general) triples providing the
highlighted search matches
search://ftsQuery1
and search://ftsQuery2
(search://ftsQuery{N}
in general) triples informing which
highlighting query was used to perform the highlighting of a
corresponding search://fts{N}
triple value.
search://ftsProperty1
and
search://ftsProperty2
(search://ftsProperty{N}
in general) triples informing which metadata property matched the full
text search.
acdh:hasAlternativeTitle
or
acdh:hasTitle
and in case of one resource there were two
matches (both in the acdh:hasAlternativeTitle
and the
acdh:hasTitle
).In case of multiple full text search filters, highlighting is by default provided for all of them.
E.g. let’s search for resources containing both Japan-Bibliographie and Datenbank phrases (here we also employ the trick to filter out non-technical properties from the output):
https://arche.acdh.oeaw.ac.at/api/search
?operator[]=@@
&value[]=Japan-Bibliographie
?operator[]=@@
&value[]=Datenbank
&readMode=resource
&format=text/turtle
&resourceProperties[]=propertyWhichDoesNotExist
https://arche.acdh.oeaw.ac.at/api/search?operator%5B%5D=%40%40&value%5B%5D=Japan-Bibliographie&operator%5B%5D=%40%40&value%5B%5D=Datenbank&readMode=resource&format=text%2fturtle&resourceProperties%5B%5D=propertyWhichDoesNotExist
resulting in something like
@prefix n0: <https://arche.acdh.oeaw.ac.at/api/>.
@prefix n1: <search://>.
@prefix n2: <https://arche.acdh.oeaw.ac.at/>.
@prefix n3: <https://vocabs.acdh.oeaw.ac.at/schema#>.
<https://arche.acdh.oeaw.ac.at/api/> n1:count "2"^^<http://www.w3.org/2001/XMLSchema#integer>.
<https://arche.acdh.oeaw.ac.at/api/40725>
n1:match "true"^^<http://www.w3.org/2001/XMLSchema#boolean>;
n1:fts1 "<b>Datenbank</b> mit über 30.000 bibliographischen Einträgen, die nach folgenden Kriterien gesammelt wurden:\n\t* Japanbezug\n\t* deutschsprachig\n\t* veröffentlicht"@de;
n1:ftsQuery1 "Datenbank";
n1:ftsProperty1 n3:hasDescription;
n1:fts2 "Deutschsprachige Japan-Bibliographie 1980-2000 <b>Datenbank</b> mit über 30.000 bibliographischen Einträgen"@de;
n1:ftsQuery2 "Datenbank";
n1:ftsProperty2 n3:hasAlternativeTitle;
n1:fts3 "Deutschsprachige <b>Japan</b>-<b>Bibliographie</b> 1980-2000 Datenbank mit über 30.000 bibliographischen Einträgen"@de;
n1:ftsQuery3 "Japan-Bibliographie";
n1:ftsProperty3 n3:hasAlternativeTitle.
<https://arche.acdh.oeaw.ac.at/api/24690>
n1:match "true"^^<http://www.w3.org/2001/XMLSchema#boolean>;
n1:fts1 "<b>Japan</b>-<b>Bibliographie</b> 1980-2000"@de;
n1:ftsQuery1 "Japan-Bibliographie";
n1:ftsProperty1 n3:hasTitle;
n1:fts2 "<b>Datenbank</b> mit über 30.000 bibliographischen Einträgen, die nach folgenden Kriterien gesammelt wurden:\n\t* Japanbezug\n\t* deutschsprachig\n\t* veröffentlicht"@de;
n1:ftsQuery2 "Datenbank";
n1:ftsProperty2 n3:hasDescription;
n1:fts3 "Deutschsprachige Japan-Bibliographie 1980-2000 <b>Datenbank</b> mit über 30.000 bibliographischen Einträgen"@de;
n1:ftsQuery3 "Datenbank";
n1:ftsProperty3 n3:hasAlternativeTitle;
n1:fts4 "Deutschsprachige <b>Japan</b>-<b>Bibliographie</b> 1980-2000 Datenbank mit über 30.000 bibliographischen Einträgen"@de;
n1:ftsQuery4 "Japan-Bibliographie";
n1:ftsProperty4 n3:hasAlternativeTitle.
As we can see there are more highlighted results provided now and the
search://ftsQuery{N}
properties can be useful to determine
which highlighted phrase comes from which full text search query.
If you want to adjust the way the highlighting is performed, please
read this
documentation first and then provide the desired configuration
values using ftsStartSel[]
, ftsStopSel[]
,
ftsMinWords[]
, ftsMaxWords[]
,
ftsShortWord[]
, ftsHighlightAll[]
,
ftsMaxFragments[]
and ftsFragmentDelimiter[]
request parameters, e.g. to change the default <b>
tag used for highlighting to the <em>
one:
https://arche.acdh.oeaw.ac.at/api/search
?operator[]=@@
&value[]=Japan-Bibliographie
&readMode=resource
&format=text/turtle
&ftsStartSel[]=<em>
&ftsStopSel[]=</em>
https://arche.acdh.oeaw.ac.at/api/search?operator%5B%5D=%40%40&value%5B%5D=Japan-Bibliographie&readMode=resource&format=text%2fturtle&ftsStartSel%5B%5D=%3Cem%3E&ftsStopSel%5B%5D=%3C%2Fem%3E
In case of multiple full text search filters, parameters can be
specified separately for each of them, e.g. to highlight the
Japan-Bibliographie matches with <em>
and
Datenbank matches with <b>
:
https://arche.acdh.oeaw.ac.at/api/search
?operator[]=@@
&value[]=Japan-Bibliographie
&ftsStartSel[]=<em>
&ftsStopSel[]=</em>
&operator[]=@@
&value[]=Datenbank
&ftsStartSel[]=<b>
&ftsStopSel[]=</b>
&readMode=resource
&format=text/turtle
https://arche.acdh.oeaw.ac.at/api/search?operator%5B%5D=%40%40&value%5B%5D=Japan-Bibliographie&ftsStartSel%5B%5D=%3Cem%3E&ftsStopSel%5B%5D=%3C%2Fem%3E&operator%5B%5D=%40%40&value%5B%5D=Datenbank&ftsStartSel%5B%5D=%3Cb%3E&ftsStopSel%5B%5D=%3C%2Fb%3E&readMode=resource&format=text%2fturtle
Last but not least the query(ies) used to perform the highlighting
can be specified explicitly using the ftsQuery[]
parameter
and properties to which they are applied can be limited with the
ftsProperty[]
parameter. This is particularly useful when
performing an SQL query-based search (see below) but can be also used
for some advanced scenarios.
If the ftsQuery[]
parameter is provided it overrides
highlighting queries extracted from the search terms, e.g. to search for
resources containing both Japan-Bibliographie and 2000
phrases but highlight only Japan-Bibliographie phrase matches
and only in acdh:hasTitle
and
acdh:hasAlternativeTitle
metadata property:
https://arche.acdh.oeaw.ac.at/api/search
?operator[]=@@
&value[]=Japan-Bibliographie
&operator[]=@@
&value[]=2000
&ftsQuery[]=Japan-Bibliographie
&ftsProperty[0][0]=https://vocabs.acdh.oeaw.ac.at/schema#hasTitle
&ftsProperty[0][1]=https://vocabs.acdh.oeaw.ac.at/schema#hasAlternativeTitle
&readMode=resource
&format=text/turtle
https://arche.acdh.oeaw.ac.at/api/search?operator%5B%5D=%40%40&value%5B%5D=Japan-Bibliographie&operator%5B%5D=%40%40&value%5B%5D=2000&ftsQuery%5B%5D=Japan-Bibliographie&ftsProperty%5B0%5D%5B0%5D=https%3A%2F%2Fvocabs.acdh.oeaw.ac.at%2Fschema%23hasTitle&ftsProperty%5B0%5D%5B1%5D=https%3A%2F%2Fvocabs.acdh.oeaw.ac.at%2Fschema%23hasAlternativeTitle&readMode=resource&format=text%2fturtle
It is worth noting that in this case:
acdh:hasTitle
and acdh:hasAlternativeTitle
metadata properties).You can also use the full text search while performing as SQL-based search. The SQL query for performing the full text search goes as follows:
SELECT coalesce(fts.id, iid, m.id) AS id
FROM full_text_search fts LEFT JOIN metadata m USING (mid)
WHERE websearch_to_tsquery('simple', 'SEARCH PHRASE') @@ segments
If you want to limit the search to a given property or a binary
content, you should add to the WHERE
clause:
AND fts.id IS NOT NULL
for searching only in the binary
contentAND property IN ('list', 'of', 'allowed', 'properties')
for searching only in given metadata properties other than resource
identifiersAND iid IS NOT NULL
for searching in resource
identifiersE.g. to search for resources containing Alexandria in their binary payload.
https://arche.acdh.oeaw.ac.at/api/search
?sql=SELECT coalesce(fts.id, iid, m.id) AS id
FROM full_text_search fts LEFT JOIN metadata m USING (mid)
WHERE websearch_to_tsquery('simple', ?) @@ segments
AND fts.id IS NOT NULL
&sqlParam[]=Alexandria
&readMode=resource
&format=text/turtle
https://arche.acdh.oeaw.ac.at/api/search?sql=SELECT%20coalesce%28fts.id%2C%20iid%2C%20m.id%29%20AS%20id%20FROM%20full_text_search%20fts%20LEFT%20JOIN%20metadata%20m%20USING%20%28mid%29%20WHERE%20websearch_to_tsquery%28%27simple%27%2C%20%3F%29%20%40%40%20segments%20AND%20fts.id%20IS%20NOT%20NULL&sqlParam%5B%5D=Alexandria&readMode=resource&format=text%2fturtle
To highlight full text search matches while using the SQL query
search, the highlighting phrase has to be specified using the
ftsQuery[]
parameter and if you limited the search to
particular properties, you should limit highlighting accordingly using
the ftsProperty[]
parameter.
E.g. to search for resources containing Alexandria in their binary payload with highlighting of the matching phrases:
https://arche.acdh.oeaw.ac.at/api/search
?sql=SELECT coalesce(fts.id, iid, m.id) AS id
FROM full_text_search fts LEFT JOIN metadata m USING (mid)
WHERE websearch_to_tsquery('simple', ?) @@ segments
AND fts.id IS NOT NULL
&sqlParam[]=Alexandria
&ftsQuery[]=Alexandria
&ftsProperty[]=BINARY
&readMode=resource
&format=text/turtle
https://arche.acdh.oeaw.ac.at/api/search?sql=SELECT%20coalesce%28fts.id%2C%20iid%2C%20m.id%29%20AS%20id%20FROM%20full_text_search%20fts%20LEFT%20JOIN%20metadata%20m%20USING%20%28mid%29%20WHERE%20websearch_to_tsquery%28%27simple%27%2C%20%3F%29%20%40%40%20segments%20AND%20fts.id%20IS%20NOT%20NULL&sqlParam%5B%5D=Alexandria&ftsQuery%5B%5D=Alexandria&ftsProperty%5B%5D=BINARY&readMode=resource&format=text%2fturtle