Documentation

SkosVocabulary extends MetadataCollection
in package

A specialization of the MetadataCollection class for ingesting SKOS vocabularies.

Given an RDF graph with exactly one node of type skos:ConceptSchema it ingests the skos:ConceptSchema and skos:Concept nodes as well as, depending on the configuration:

  • skos:Collection and skos:OrderedCollection nodes
  • nodes being RDF triple objects in above-mentioned nodes

All other nodes in the RDF graph are removed by the preprocess() method.

Tags
author

zozlak

Table of Contents

Constants

ALLOWED_CONFLICT_REASONS_REGEX  = '/Resource [0-9]+ locked|Transaction [0-9]+ locked|Owned by other request|Lock not available|duplicate key value|deadlock detected/'
CREATE  = 2
ERRMODE_FAIL  = 'fail'
ERRMODE_INCLUDE  = 'include'
ERRMODE_PASS  = 'pass'
EXACTMATCH_DROP  = 'drop'
EXACTMATCH_KEEP  = 'keep'
EXACTMATCH_LITERAL  = 'literal'
EXACTMATCH_MERGE  = 'merge'
NETWORKERROR_SLEEP  = 3
NMSP_DC  = 'http://purl.org/dc/elements/1.1/'
NMSP_DCT  = 'http://purl.org/dc/terms/'
NMSP_SKOS  = 'http://www.w3.org/2004/02/skos/core#'
RELATIONS_DROP  = 'drop'
RELATIONS_KEEP  = 'keep'
RELATIONS_LITERAL  = 'literal'
SKIP  = 1
STATE_NEW  = 'new'
STATE_OK  = 'ok'
STATE_UPDATE  = 'update'

Properties

$debug  : bool|int
Turns debug messages on.
$addTitle  : bool
Should the title property be added automatically for ingested resources missing it.
$repo  : Repo
Repository connection object
$addParentProperty  : bool
Should skos:concept, skos:collection adn skos:orderedCollection resources be connected with the skos:schema repository resource with the repository's parent RDF property?
$allowedNmsp  : array<string|int, string>|null
$allowedResourceNmsp  : array<string|int, string>|null
$autoCommit  : int
Number of resource automatically triggering a commit (0 - no auto commit)
$exactMatchMode  : string
How to handle skos:exactMatch triples with object outside the current vocabulary
$exactMatchModeSchema  : string
How to handle skos:exactMatch triples with object within the current vocabulary
$file  : string
$format  : string
$importCollections  : bool
Should skos:Collection and skos:OrderedCollection resources be ingested?
$normalizer  : UriNormalizer
$preprocessed  : bool
Is the metadata graph preprocessed already?
$relationsMode  : string
How to handle skos:semanticRelation triples other then skos:exactMatch with object outside the current vocabulary
$relationsModeSchema  : string
How to handle skos:semanticRelation triples other then skos:exactMatch with object within the current vocabulary
$resource  : RepoResource|null
Parent resource for all imported graph nodes
$schema  : Schema
$skosRelations  : array<string|int, string>
$state  : string
$titleProperties  : array<string|int, string>
RDF properties to use for repository resource titles.
$vocabularyUrl  : NamedNodeInterface

Methods

__construct()  : mixed
Creates a new metadata parser.
__destruct()  : mixed
forceUpdate()  : self
fromUrl()  : self
getState()  : string
Returns the state of the vocabulary in the repository:
import()  : array<string|int, RepoResource|ClientException>
Ingests the vocabulary and removes obsolete vocabulary entities (repository resources which were not a part of the ingestion but point to the schema repository resource with skos:inScheme or repoCfg:parent)
preprocess()  : MetadataCollection
Performs preprocessing - removes literal IDs, promotes URIs to IDs, etc.
setAddParentProperty()  : self
When $add is set to true, all repository resources representing imported skos entities are linked with the skos:Schema repository resource with a repository's parent property.
setAddTitle()  : MetadataCollection
Sets if the title property should be automatically added for ingested resources which are missing it.
setAllowedNamespaces()  : self
Set RDF property filter for skos resources.
setAllowedResourceNamespaces()  : self
Defines namespaces of RDF properties allowed to keep object values.
setAutoCommit()  : MetadataCollection
Controls the automatic commit behaviour.
setExactMatchMode()  : self
Sets up skos:exactMatch RDF triples handling where the object belongs or not belongs to a current vocabulary.
setImportCollections()  : self
Sets up if skos:Collection and skos:OrderedCollection nodes should be ingested into the repository.
setResource()  : MetadataCollection
Sets the repository resource being parent of all resources in the graph imported by the import() method.
setSkosRelationsMode()  : self
Sets up skos:semanticRelation RDF triples handling where the object belongs or not belongs to a current vocabulary.
setTitleProperties()  : self
Sets up which RDF properties a repository resource title for skos entities should be derived from.
assureLiterals()  : void
assureParents()  : void
assureTitles()  : void
dropNodes()  : void
dropProperties()  : void
filterResources()  : array<string|int, TermInterface>
Returns set of resources to be imported skipping all other.
fixReferences()  : void
To avoid creation of duplicated resources it must be assured every resource is referenced acrossed the whole graph with only one URI
mergeConcepts()  : void
processExactMatches()  : array<string|int, string>
processRelations()  : void
promoteBNodesToUris()  : void
Promotes BNodes to their first ID and fixes references to them.
promoteUrisToIds()  : void
Promotes subjects being fully qualified URLs to ids.
removeLiteralIds()  : void
Removes literal ids from the graph.
removeObsolete()  : void
sanitizeResource()  : DatasetNode
Cleans up resource metadata.

Constants

ALLOWED_CONFLICT_REASONS_REGEX

public mixed ALLOWED_CONFLICT_REASONS_REGEX = '/Resource [0-9]+ locked|Transaction [0-9]+ locked|Owned by other request|Lock not available|duplicate key value|deadlock detected/'

NMSP_DC

public mixed NMSP_DC = 'http://purl.org/dc/elements/1.1/'

NMSP_SKOS

public mixed NMSP_SKOS = 'http://www.w3.org/2004/02/skos/core#'

Properties

$debug

Turns debug messages on.

public static bool|int $debug = false

There are three levels:

  • false or 0 - no debug messages at all
  • true or 1 - basic information on preprocessing stages and detailed information on ingestion progress
  • 2 - detailed information on both preprocessing and ingestion progress

$addTitle

Should the title property be added automatically for ingested resources missing it.

protected bool $addTitle = false

$addParentProperty

Should skos:concept, skos:collection adn skos:orderedCollection resources be connected with the skos:schema repository resource with the repository's parent RDF property?

private bool $addParentProperty = true

$allowedNmsp

private array<string|int, string>|null $allowedNmsp = null

$allowedResourceNmsp

private array<string|int, string>|null $allowedResourceNmsp = []

$autoCommit

Number of resource automatically triggering a commit (0 - no auto commit)

private int $autoCommit = 0

$exactMatchMode

How to handle skos:exactMatch triples with object outside the current vocabulary

private string $exactMatchMode = self::EXACTMATCH_MERGE

$exactMatchModeSchema

How to handle skos:exactMatch triples with object within the current vocabulary

private string $exactMatchModeSchema = self::EXACTMATCH_MERGE

$importCollections

Should skos:Collection and skos:OrderedCollection resources be ingested?

private bool $importCollections = false

$preprocessed

Is the metadata graph preprocessed already?

private bool $preprocessed = false

$relationsMode

How to handle skos:semanticRelation triples other then skos:exactMatch with object outside the current vocabulary

private string $relationsMode = self::RELATIONS_DROP

$relationsModeSchema

How to handle skos:semanticRelation triples other then skos:exactMatch with object within the current vocabulary

private string $relationsModeSchema = self::RELATIONS_KEEP

$skosRelations

private static array<string|int, string> $skosRelations = [\zozlak\RdfConstants::SKOS_BROADER, \zozlak\RdfConstants::SKOS_BROADER_TRANSITIVE, \zozlak\RdfConstants::SKOS_BROAD_MATCH, \zozlak\RdfConstants::SKOS_CLOSE_MATCH, \zozlak\RdfConstants::SKOS_EXACT_MATCH, \zozlak\RdfConstants::SKOS_HAS_TOP_CONCEPT, \zozlak\RdfConstants::SKOS_IN_SCHEME, \zozlak\RdfConstants::SKOS_MAPPING_RELATION, \zozlak\RdfConstants::SKOS_NARROWER, \zozlak\RdfConstants::SKOS_NARROWER_TRANSITIVE, \zozlak\RdfConstants::SKOS_NARROW_MATCH, \zozlak\RdfConstants::SKOS_RELATED, \zozlak\RdfConstants::SKOS_RELATED_MATCH, \zozlak\RdfConstants::SKOS_SEMANTIC_RELATION, \zozlak\RdfConstants::SKOS_TOP_CONCEPT_OF]

$titleProperties

RDF properties to use for repository resource titles.

private array<string|int, string> $titleProperties = [\zozlak\RdfConstants::SKOS_PREF_LABEL, \zozlak\RdfConstants::SKOS_ALT_LABEL]

Methods

__construct()

Creates a new metadata parser.

public __construct(Repo $repo, string $file[, mixed $format = null ][, string|null $uri = null ]) : mixed
Parameters
$repo : Repo
$file : string
$format : mixed = null
$uri : string|null = null

fromUrl()

public static fromUrl(Repo $repo, string $url) : self
Parameters
$repo : Repo
$url : string
Return values
self

getState()

Returns the state of the vocabulary in the repository:

public getState() : string
  • SkosVocabulary::STATE_NEW - there's no such vocabulary in the repository
  • SkosVocabulary::STATE_OK - the vocabulary is the same as in the repository
  • SkosVocabulary::STATE_UPDATE - there is a corresponding vocabulary in the repository but it requires updating
Return values
string

import()

Ingests the vocabulary and removes obsolete vocabulary entities (repository resources which were not a part of the ingestion but point to the schema repository resource with skos:inScheme or repoCfg:parent)

public import([string $namespace = '' ][, int $singleOutNmsp = self::CREATE ][, string $errorMode = self::ERRMODE_FAIL ][, int $concurrency = 3 ][, int $retriesOnConflict = 3 ]) : array<string|int, RepoResource|ClientException>
Parameters
$namespace : string = ''
$singleOutNmsp : int = self::CREATE
$errorMode : string = self::ERRMODE_FAIL
$concurrency : int = 3
$retriesOnConflict : int = 3
Return values
array<string|int, RepoResource|ClientException>

setAddParentProperty()

When $add is set to true, all repository resources representing imported skos entities are linked with the skos:Schema repository resource with a repository's parent property.

public setAddParentProperty(bool $add) : self
Parameters
$add : bool
Return values
self

setAllowedNamespaces()

Set RDF property filter for skos resources.

public setAllowedNamespaces(array<string|int, string>|null $nmsp) : self

Repository id and label properties are always allowed.

Parameters
$nmsp : array<string|int, string>|null

null allows all properties

Return values
self

setAllowedResourceNamespaces()

Defines namespaces of RDF properties allowed to keep object values.

public setAllowedResourceNamespaces(array<string|int, string>|null $allowed) : self

SKOS properties, id, parent and rdf:type RDF properties are always allowed to have object values.

Object values of other properties of SKOS entities will be turned into literals of type xsd:anyURI.

Such an approach prevents creation of unnecessary repository resources but can lead to resulting data being incompatible with ontologies they were following (as datatype and object properties are mutually exclusive in owl) which may or might be a problem for you.

Parameters
$allowed : array<string|int, string>|null

List of allowed namespaces. When null, all object values are kept.

Return values
self

setAutoCommit()

Controls the automatic commit behaviour.

public setAutoCommit(int $count) : MetadataCollection

Even when you use autocommit, you should commit your transaction after Indexer::index() (the only exception is when you set auto commit to 1 forcing commiting each and every resource separately but you probably don't want to do that for performance reasons).

Parameters
$count : int

number of resource automatically triggering a commit (0 - no auto commit)

Return values
MetadataCollection

setExactMatchMode()

Sets up skos:exactMatch RDF triples handling where the object belongs or not belongs to a current vocabulary.

public setExactMatchMode(string $inVocabulary, string $notInVocabulary) : self

Both parameters can take following values:

  • SkosVocabulary::EXACTMATCH_KEEP - leave the triple as it is
  • SkosVocabulary::EXACTMATCH_DROP - remove the triple
  • SkosVocabulary::EXACTMATCH_MERGE - merge subject and object into one repository resource
  • SkosVocabulary::EXACTMATCH_LITERAL - turn triple's object into a literal of type xsd:anyURI (please note it produces RDF which doesn't follow SKOS as SKOS relations are OWL object properties)
Parameters
$inVocabulary : string
$notInVocabulary : string
Return values
self

setImportCollections()

Sets up if skos:Collection and skos:OrderedCollection nodes should be ingested into the repository.

public setImportCollections(bool $import) : self
Parameters
$import : bool
Return values
self

setSkosRelationsMode()

Sets up skos:semanticRelation RDF triples handling where the object belongs or not belongs to a current vocabulary.

public setSkosRelationsMode(string $inVocabulary, string $notInVocabulary) : self

Both parameters can take following values:

  • SkosVocabulary::RELATION_KEEP - leave the triple as it is
  • SkosVocabulary::RELATION_DROP - remove the triple
  • SkosVocabulary::RELATION_LITERAL - turn triple's object into a literal of type xsd:anyURI (please note it produces RDF which doesn't follow SKOS as SKOS relations are OWL object properties)
Parameters
$inVocabulary : string
$notInVocabulary : string
Return values
self

setTitleProperties()

Sets up which RDF properties a repository resource title for skos entities should be derived from.

public setTitleProperties(array<string|int, string> $properties) : self

First property providing a title value is being used.

Parameters
$properties : array<string|int, string>
Return values
self

assureLiterals()

private assureLiterals(array<string|int, TermInterface$entities) : void
Parameters
$entities : array<string|int, TermInterface>

assureParents()

private assureParents(array<string|int, TermInterface$entities) : void
Parameters
$entities : array<string|int, TermInterface>

assureTitles()

private assureTitles(array<string|int, TermInterface$entities) : void
Parameters
$entities : array<string|int, TermInterface>

dropNodes()

private dropNodes(array<string|int, TermInterface$entities) : void
Parameters
$entities : array<string|int, TermInterface>

dropProperties()

private dropProperties(array<string|int, TermInterface$entities) : void
Parameters
$entities : array<string|int, TermInterface>

filterResources()

Returns set of resources to be imported skipping all other.

private filterResources(string $namespace, int $singleOutNmsp) : array<string|int, TermInterface>
Parameters
$namespace : string

repository resources will be created for all resources in this namespace

$singleOutNmsp : int

should repository resources be created for URIs outside $namespace (MetadataCollection::SKIP or MetadataCollection::CREATE)

Return values
array<string|int, TermInterface>

fixReferences()

To avoid creation of duplicated resources it must be assured every resource is referenced acrossed the whole graph with only one URI

private fixReferences() : void

As it doesn't matter which exactly, the resource URI itself is a convenient choice

mergeConcepts()

private mergeConcepts(TermInterface $into, TermInterface $res) : void
Parameters
$into : TermInterface
$res : TermInterface

processExactMatches()

private processExactMatches(array<string|int, TermInterface$entities) : array<string|int, string>
Parameters
$entities : array<string|int, TermInterface>
Return values
array<string|int, string>

processRelations()

private processRelations(array<string|int, NamedNodeInterface$entities) : void
Parameters
$entities : array<string|int, NamedNodeInterface>

promoteBNodesToUris()

Promotes BNodes to their first ID and fixes references to them.

private promoteBNodesToUris() : void

promoteUrisToIds()

Promotes subjects being fully qualified URLs to ids.

private promoteUrisToIds() : void

removeLiteralIds()

Removes literal ids from the graph.

private removeLiteralIds() : void

removeObsolete()

private removeObsolete(array<string|int, RepoResource|ClientException$imported[, int $concurrency = 3 ][, int $retriesOnConflict = 3 ]) : void
Parameters
$imported : array<string|int, RepoResource|ClientException>
$concurrency : int = 3
$retriesOnConflict : int = 3

sanitizeResource()

Cleans up resource metadata.

private sanitizeResource(TermInterface $res) : DatasetNode
Parameters
$res : TermInterface
Tags
throws
InvalidArgumentException
Return values
DatasetNode

        
On this page

Search results