SkosVocabulary
extends MetadataCollection
in package
A specialization of the MetadataCollection class for ingesting SKOS vocabularies.
Given an RDF graph with exactly one node of type skos:ConceptSchema it ingests the skos:ConceptSchema and skos:Concept nodes as well as, depending on the configuration:
- skos:Collection and skos:OrderedCollection nodes
- nodes being RDF triple objects in above-mentioned nodes
All other nodes in the RDF graph are removed by the preprocess()
method.
Tags
Table of Contents
Constants
- ALLOWED_CONFLICT_REASONS_REGEX = '/Resource [0-9]+ locked|Transaction [0-9]+ locked|Owned by other request|Lock not available|duplicate key value|deadlock detected/'
- CREATE = 2
- ERRMODE_FAIL = 'fail'
- ERRMODE_INCLUDE = 'include'
- ERRMODE_PASS = 'pass'
- EXACTMATCH_DROP = 'drop'
- EXACTMATCH_KEEP = 'keep'
- EXACTMATCH_LITERAL = 'literal'
- EXACTMATCH_MERGE = 'merge'
- NETWORKERROR_SLEEP = 3
- NMSP_DC = 'http://purl.org/dc/elements/1.1/'
- NMSP_DCT = 'http://purl.org/dc/terms/'
- NMSP_SKOS = 'http://www.w3.org/2004/02/skos/core#'
- RELATIONS_DROP = 'drop'
- RELATIONS_KEEP = 'keep'
- RELATIONS_LITERAL = 'literal'
- SKIP = 1
- STATE_NEW = 'new'
- STATE_OK = 'ok'
- STATE_UPDATE = 'update'
Properties
- $debug : bool|int
- Turns debug messages on.
- $addTitle : bool
- Should the title property be added automatically for ingested resources missing it.
- $repo : Repo
- Repository connection object
- $addParentProperty : bool
- Should skos:concept, skos:collection adn skos:orderedCollection resources be connected with the skos:schema repository resource with the repository's parent RDF property?
- $allowedNmsp : array<string|int, string>|null
- $allowedResourceNmsp : array<string|int, string>|null
- $autoCommit : int
- Number of resource automatically triggering a commit (0 - no auto commit)
- $exactMatchMode : string
- How to handle skos:exactMatch triples with object outside the current vocabulary
- $exactMatchModeSchema : string
- How to handle skos:exactMatch triples with object within the current vocabulary
- $file : string
- $format : string
- $importCollections : bool
- Should skos:Collection and skos:OrderedCollection resources be ingested?
- $normalizer : UriNormalizer
- $preprocessed : bool
- Is the metadata graph preprocessed already?
- $relationsMode : string
- How to handle skos:semanticRelation triples other then skos:exactMatch with object outside the current vocabulary
- $relationsModeSchema : string
- How to handle skos:semanticRelation triples other then skos:exactMatch with object within the current vocabulary
- $resource : RepoResource|null
- Parent resource for all imported graph nodes
- $schema : Schema
- $skosRelations : array<string|int, string>
- $state : string
- $titleProperties : array<string|int, string>
- RDF properties to use for repository resource titles.
- $vocabularyUrl : NamedNodeInterface
Methods
- __construct() : mixed
- Creates a new metadata parser.
- __destruct() : mixed
- forceUpdate() : self
- fromUrl() : self
- getState() : string
- Returns the state of the vocabulary in the repository:
- import() : array<string|int, RepoResource|ClientException>
- Ingests the vocabulary and removes obsolete vocabulary entities (repository resources which were not a part of the ingestion but point to the schema repository resource with skos:inScheme or repoCfg:parent)
- preprocess() : MetadataCollection
- Performs preprocessing - removes literal IDs, promotes URIs to IDs, etc.
- setAddParentProperty() : self
- When $add is set to true, all repository resources representing imported skos entities are linked with the skos:Schema repository resource with a repository's parent property.
- setAddTitle() : MetadataCollection
- Sets if the title property should be automatically added for ingested resources which are missing it.
- setAllowedNamespaces() : self
- Set RDF property filter for skos resources.
- setAllowedResourceNamespaces() : self
- Defines namespaces of RDF properties allowed to keep object values.
- setAutoCommit() : MetadataCollection
- Controls the automatic commit behaviour.
- setExactMatchMode() : self
- Sets up skos:exactMatch RDF triples handling where the object belongs or not belongs to a current vocabulary.
- setImportCollections() : self
- Sets up if skos:Collection and skos:OrderedCollection nodes should be ingested into the repository.
- setResource() : MetadataCollection
- Sets the repository resource being parent of all resources in the graph imported by the import() method.
- setSkosRelationsMode() : self
- Sets up skos:semanticRelation RDF triples handling where the object belongs or not belongs to a current vocabulary.
- setTitleProperties() : self
- Sets up which RDF properties a repository resource title for skos entities should be derived from.
- assureLiterals() : void
- assureParents() : void
- assureTitles() : void
- dropNodes() : void
- dropProperties() : void
- filterResources() : array<string|int, TermInterface>
- Returns set of resources to be imported skipping all other.
- fixReferences() : void
- To avoid creation of duplicated resources it must be assured every resource is referenced acrossed the whole graph with only one URI
- mergeConcepts() : void
- processExactMatches() : array<string|int, string>
- processRelations() : void
- promoteBNodesToUris() : void
- Promotes BNodes to their first ID and fixes references to them.
- promoteUrisToIds() : void
- Promotes subjects being fully qualified URLs to ids.
- removeLiteralIds() : void
- Removes literal ids from the graph.
- removeObsolete() : void
- sanitizeResource() : DatasetNode
- Cleans up resource metadata.
Constants
ALLOWED_CONFLICT_REASONS_REGEX
public
mixed
ALLOWED_CONFLICT_REASONS_REGEX
= '/Resource [0-9]+ locked|Transaction [0-9]+ locked|Owned by other request|Lock not available|duplicate key value|deadlock detected/'
CREATE
public
mixed
CREATE
= 2
ERRMODE_FAIL
public
mixed
ERRMODE_FAIL
= 'fail'
ERRMODE_INCLUDE
public
mixed
ERRMODE_INCLUDE
= 'include'
ERRMODE_PASS
public
mixed
ERRMODE_PASS
= 'pass'
EXACTMATCH_DROP
public
mixed
EXACTMATCH_DROP
= 'drop'
EXACTMATCH_KEEP
public
mixed
EXACTMATCH_KEEP
= 'keep'
EXACTMATCH_LITERAL
public
mixed
EXACTMATCH_LITERAL
= 'literal'
EXACTMATCH_MERGE
public
mixed
EXACTMATCH_MERGE
= 'merge'
NETWORKERROR_SLEEP
public
mixed
NETWORKERROR_SLEEP
= 3
NMSP_DC
public
mixed
NMSP_DC
= 'http://purl.org/dc/elements/1.1/'
NMSP_DCT
public
mixed
NMSP_DCT
= 'http://purl.org/dc/terms/'
NMSP_SKOS
public
mixed
NMSP_SKOS
= 'http://www.w3.org/2004/02/skos/core#'
RELATIONS_DROP
public
mixed
RELATIONS_DROP
= 'drop'
RELATIONS_KEEP
public
mixed
RELATIONS_KEEP
= 'keep'
RELATIONS_LITERAL
public
mixed
RELATIONS_LITERAL
= 'literal'
SKIP
public
mixed
SKIP
= 1
STATE_NEW
public
mixed
STATE_NEW
= 'new'
STATE_OK
public
mixed
STATE_OK
= 'ok'
STATE_UPDATE
public
mixed
STATE_UPDATE
= 'update'
Properties
$debug
Turns debug messages on.
public
static bool|int
$debug
= false
There are three levels:
-
false
or0
- no debug messages at all -
true
or1
- basic information on preprocessing stages and detailed information on ingestion progress -
2
- detailed information on both preprocessing and ingestion progress
$addTitle
Should the title property be added automatically for ingested resources missing it.
protected
bool
$addTitle
= false
$repo
Repository connection object
protected
Repo
$repo
$addParentProperty
Should skos:concept, skos:collection adn skos:orderedCollection resources be connected with the skos:schema repository resource with the repository's parent RDF property?
private
bool
$addParentProperty
= true
$allowedNmsp
private
array<string|int, string>|null
$allowedNmsp
= null
$allowedResourceNmsp
private
array<string|int, string>|null
$allowedResourceNmsp
= []
$autoCommit
Number of resource automatically triggering a commit (0 - no auto commit)
private
int
$autoCommit
= 0
$exactMatchMode
How to handle skos:exactMatch triples with object outside the current vocabulary
private
string
$exactMatchMode
= self::EXACTMATCH_MERGE
$exactMatchModeSchema
How to handle skos:exactMatch triples with object within the current vocabulary
private
string
$exactMatchModeSchema
= self::EXACTMATCH_MERGE
$file
private
string
$file
$format
private
string
$format
$importCollections
Should skos:Collection and skos:OrderedCollection resources be ingested?
private
bool
$importCollections
= false
$normalizer
private
UriNormalizer
$normalizer
$preprocessed
Is the metadata graph preprocessed already?
private
bool
$preprocessed
= false
$relationsMode
How to handle skos:semanticRelation triples other then skos:exactMatch with object outside the current vocabulary
private
string
$relationsMode
= self::RELATIONS_DROP
$relationsModeSchema
How to handle skos:semanticRelation triples other then skos:exactMatch with object within the current vocabulary
private
string
$relationsModeSchema
= self::RELATIONS_KEEP
$resource
Parent resource for all imported graph nodes
private
RepoResource|null
$resource
= null
$schema
private
Schema
$schema
$skosRelations
private
static array<string|int, string>
$skosRelations
= [\zozlak\RdfConstants::SKOS_BROADER, \zozlak\RdfConstants::SKOS_BROADER_TRANSITIVE, \zozlak\RdfConstants::SKOS_BROAD_MATCH, \zozlak\RdfConstants::SKOS_CLOSE_MATCH, \zozlak\RdfConstants::SKOS_EXACT_MATCH, \zozlak\RdfConstants::SKOS_HAS_TOP_CONCEPT, \zozlak\RdfConstants::SKOS_IN_SCHEME, \zozlak\RdfConstants::SKOS_MAPPING_RELATION, \zozlak\RdfConstants::SKOS_NARROWER, \zozlak\RdfConstants::SKOS_NARROWER_TRANSITIVE, \zozlak\RdfConstants::SKOS_NARROW_MATCH, \zozlak\RdfConstants::SKOS_RELATED, \zozlak\RdfConstants::SKOS_RELATED_MATCH, \zozlak\RdfConstants::SKOS_SEMANTIC_RELATION, \zozlak\RdfConstants::SKOS_TOP_CONCEPT_OF]
$state
private
string
$state
$titleProperties
RDF properties to use for repository resource titles.
private
array<string|int, string>
$titleProperties
= [\zozlak\RdfConstants::SKOS_PREF_LABEL, \zozlak\RdfConstants::SKOS_ALT_LABEL]
$vocabularyUrl
private
NamedNodeInterface
$vocabularyUrl
Methods
__construct()
Creates a new metadata parser.
public
__construct(Repo $repo, string $file[, mixed $format = null ][, string|null $uri = null ]) : mixed
Parameters
- $repo : Repo
- $file : string
- $format : mixed = null
- $uri : string|null = null
__destruct()
public
__destruct() : mixed
forceUpdate()
public
forceUpdate() : self
Return values
selffromUrl()
public
static fromUrl(Repo $repo, string $url) : self
Parameters
- $repo : Repo
- $url : string
Return values
selfgetState()
Returns the state of the vocabulary in the repository:
public
getState() : string
- SkosVocabulary::STATE_NEW - there's no such vocabulary in the repository
- SkosVocabulary::STATE_OK - the vocabulary is the same as in the repository
- SkosVocabulary::STATE_UPDATE - there is a corresponding vocabulary in the repository but it requires updating
Return values
stringimport()
Ingests the vocabulary and removes obsolete vocabulary entities (repository resources which were not a part of the ingestion but point to the schema repository resource with skos:inScheme or repoCfg:parent)
public
import([string $namespace = '' ][, int $singleOutNmsp = self::CREATE ][, string $errorMode = self::ERRMODE_FAIL ][, int $concurrency = 3 ][, int $retriesOnConflict = 3 ]) : array<string|int, RepoResource|ClientException>
Parameters
- $namespace : string = ''
- $singleOutNmsp : int = self::CREATE
- $errorMode : string = self::ERRMODE_FAIL
- $concurrency : int = 3
- $retriesOnConflict : int = 3
Return values
array<string|int, RepoResource|ClientException>preprocess()
Performs preprocessing - removes literal IDs, promotes URIs to IDs, etc.
public
preprocess() : MetadataCollection
Return values
MetadataCollectionsetAddParentProperty()
When $add is set to true, all repository resources representing imported skos entities are linked with the skos:Schema repository resource with a repository's parent property.
public
setAddParentProperty(bool $add) : self
Parameters
- $add : bool
Return values
selfsetAddTitle()
Sets if the title property should be automatically added for ingested resources which are missing it.
public
setAddTitle(bool $add) : MetadataCollection
Parameters
- $add : bool
Return values
MetadataCollectionsetAllowedNamespaces()
Set RDF property filter for skos resources.
public
setAllowedNamespaces(array<string|int, string>|null $nmsp) : self
Repository id and label properties are always allowed.
Parameters
- $nmsp : array<string|int, string>|null
-
null allows all properties
Return values
selfsetAllowedResourceNamespaces()
Defines namespaces of RDF properties allowed to keep object values.
public
setAllowedResourceNamespaces(array<string|int, string>|null $allowed) : self
SKOS properties, id, parent and rdf:type RDF properties are always allowed to have object values.
Object values of other properties of SKOS entities will be turned into literals of type xsd:anyURI.
Such an approach prevents creation of unnecessary repository resources but can lead to resulting data being incompatible with ontologies they were following (as datatype and object properties are mutually exclusive in owl) which may or might be a problem for you.
Parameters
- $allowed : array<string|int, string>|null
-
List of allowed namespaces. When null, all object values are kept.
Return values
selfsetAutoCommit()
Controls the automatic commit behaviour.
public
setAutoCommit(int $count) : MetadataCollection
Even when you use autocommit, you should commit your transaction after
Indexer::index()
(the only exception is when you set auto commit to 1
forcing commiting each and every resource separately but you probably
don't want to do that for performance reasons).
Parameters
- $count : int
-
number of resource automatically triggering a commit (0 - no auto commit)
Return values
MetadataCollectionsetExactMatchMode()
Sets up skos:exactMatch RDF triples handling where the object belongs or not belongs to a current vocabulary.
public
setExactMatchMode(string $inVocabulary, string $notInVocabulary) : self
Both parameters can take following values:
-
SkosVocabulary::EXACTMATCH_KEEP
- leave the triple as it is -
SkosVocabulary::EXACTMATCH_DROP
- remove the triple -
SkosVocabulary::EXACTMATCH_MERGE
- merge subject and object into one repository resource -
SkosVocabulary::EXACTMATCH_LITERAL
- turn triple's object into a literal of type xsd:anyURI (please note it produces RDF which doesn't follow SKOS as SKOS relations are OWL object properties)
Parameters
- $inVocabulary : string
- $notInVocabulary : string
Return values
selfsetImportCollections()
Sets up if skos:Collection and skos:OrderedCollection nodes should be ingested into the repository.
public
setImportCollections(bool $import) : self
Parameters
- $import : bool
Return values
selfsetResource()
Sets the repository resource being parent of all resources in the graph imported by the import() method.
public
setResource(RepoResource|null $res) : MetadataCollection
Parameters
- $res : RepoResource|null
Tags
Return values
MetadataCollectionsetSkosRelationsMode()
Sets up skos:semanticRelation RDF triples handling where the object belongs or not belongs to a current vocabulary.
public
setSkosRelationsMode(string $inVocabulary, string $notInVocabulary) : self
Both parameters can take following values:
-
SkosVocabulary::RELATION_KEEP
- leave the triple as it is -
SkosVocabulary::RELATION_DROP
- remove the triple -
SkosVocabulary::RELATION_LITERAL
- turn triple's object into a literal of type xsd:anyURI (please note it produces RDF which doesn't follow SKOS as SKOS relations are OWL object properties)
Parameters
- $inVocabulary : string
- $notInVocabulary : string
Return values
selfsetTitleProperties()
Sets up which RDF properties a repository resource title for skos entities should be derived from.
public
setTitleProperties(array<string|int, string> $properties) : self
First property providing a title value is being used.
Parameters
- $properties : array<string|int, string>
Return values
selfassureLiterals()
private
assureLiterals(array<string|int, TermInterface> $entities) : void
Parameters
- $entities : array<string|int, TermInterface>
assureParents()
private
assureParents(array<string|int, TermInterface> $entities) : void
Parameters
- $entities : array<string|int, TermInterface>
assureTitles()
private
assureTitles(array<string|int, TermInterface> $entities) : void
Parameters
- $entities : array<string|int, TermInterface>
dropNodes()
private
dropNodes(array<string|int, TermInterface> $entities) : void
Parameters
- $entities : array<string|int, TermInterface>
dropProperties()
private
dropProperties(array<string|int, TermInterface> $entities) : void
Parameters
- $entities : array<string|int, TermInterface>
filterResources()
Returns set of resources to be imported skipping all other.
private
filterResources(string $namespace, int $singleOutNmsp) : array<string|int, TermInterface>
Parameters
- $namespace : string
-
repository resources will be created for all resources in this namespace
- $singleOutNmsp : int
-
should repository resources be created for URIs outside $namespace (MetadataCollection::SKIP or MetadataCollection::CREATE)
Return values
array<string|int, TermInterface>fixReferences()
To avoid creation of duplicated resources it must be assured every resource is referenced acrossed the whole graph with only one URI
private
fixReferences() : void
As it doesn't matter which exactly, the resource URI itself is a convenient choice
mergeConcepts()
private
mergeConcepts(TermInterface $into, TermInterface $res) : void
Parameters
- $into : TermInterface
- $res : TermInterface
processExactMatches()
private
processExactMatches(array<string|int, TermInterface> $entities) : array<string|int, string>
Parameters
- $entities : array<string|int, TermInterface>
Return values
array<string|int, string>processRelations()
private
processRelations(array<string|int, NamedNodeInterface> $entities) : void
Parameters
- $entities : array<string|int, NamedNodeInterface>
promoteBNodesToUris()
Promotes BNodes to their first ID and fixes references to them.
private
promoteBNodesToUris() : void
promoteUrisToIds()
Promotes subjects being fully qualified URLs to ids.
private
promoteUrisToIds() : void
removeLiteralIds()
Removes literal ids from the graph.
private
removeLiteralIds() : void
removeObsolete()
private
removeObsolete(array<string|int, RepoResource|ClientException> $imported[, int $concurrency = 3 ][, int $retriesOnConflict = 3 ]) : void
Parameters
- $imported : array<string|int, RepoResource|ClientException>
- $concurrency : int = 3
- $retriesOnConflict : int = 3
sanitizeResource()
Cleans up resource metadata.
private
sanitizeResource(TermInterface $res) : DatasetNode
Parameters
- $res : TermInterface