MetadataCollection
extends Dataset
in package
Class for importing whole metadata graph into the repository.
Tags
Table of Contents
Constants
- ALLOWED_CONFLICT_REASONS_REGEX = '/Resource [0-9]+ locked|Transaction [0-9]+ locked|Owned by other request|Lock not available|duplicate key value|deadlock detected/'
- CREATE = 2
- ERRMODE_FAIL = 'fail'
- ERRMODE_INCLUDE = 'include'
- ERRMODE_PASS = 'pass'
- NETWORKERROR_SLEEP = 3
- SKIP = 1
Properties
- $debug : bool|int
- Turns debug messages on.
- $addTitle : bool
- Should the title property be added automatically for ingested resources missing it.
- $repo : Repo
- Repository connection object
- $autoCommit : int
- Number of resource automatically triggering a commit (0 - no auto commit)
- $normalizer : UriNormalizer
- $preprocessed : bool
- Is the metadata graph preprocessed already?
- $resource : RepoResource|null
- Parent resource for all imported graph nodes
- $schema : Schema
Methods
- __construct() : mixed
- Creates a new metadata parser.
- import() : array<string|int, RepoResource|ClientException>
- Imports the whole graph by looping over all resources.
- preprocess() : MetadataCollection
- Performs preprocessing - removes literal IDs, promotes URIs to IDs, etc.
- setAddTitle() : MetadataCollection
- Sets if the title property should be automatically added for ingested resources which are missing it.
- setAutoCommit() : MetadataCollection
- Controls the automatic commit behaviour.
- setResource() : MetadataCollection
- Sets the repository resource being parent of all resources in the graph imported by the import() method.
- filterResources() : array<string|int, TermInterface>
- Returns set of resources to be imported skipping all other.
- fixReferences() : void
- To avoid creation of duplicated resources it must be assured every resource is referenced acrossed the whole graph with only one URI
- promoteBNodesToUris() : void
- Promotes BNodes to their first ID and fixes references to them.
- promoteUrisToIds() : void
- Promotes subjects being fully qualified URLs to ids.
- removeLiteralIds() : void
- Removes literal ids from the graph.
- sanitizeResource() : DatasetNode
- Cleans up resource metadata.
Constants
ALLOWED_CONFLICT_REASONS_REGEX
public
mixed
ALLOWED_CONFLICT_REASONS_REGEX
= '/Resource [0-9]+ locked|Transaction [0-9]+ locked|Owned by other request|Lock not available|duplicate key value|deadlock detected/'
CREATE
public
mixed
CREATE
= 2
ERRMODE_FAIL
public
mixed
ERRMODE_FAIL
= 'fail'
ERRMODE_INCLUDE
public
mixed
ERRMODE_INCLUDE
= 'include'
ERRMODE_PASS
public
mixed
ERRMODE_PASS
= 'pass'
NETWORKERROR_SLEEP
public
mixed
NETWORKERROR_SLEEP
= 3
SKIP
public
mixed
SKIP
= 1
Properties
$debug
Turns debug messages on.
public
static bool|int
$debug
= false
There are three levels:
-
false
or0
- no debug messages at all -
true
or1
- basic information on preprocessing stages and detailed information on ingestion progress -
2
- detailed information on both preprocessing and ingestion progress
$addTitle
Should the title property be added automatically for ingested resources missing it.
protected
bool
$addTitle
= false
$repo
Repository connection object
protected
Repo
$repo
$autoCommit
Number of resource automatically triggering a commit (0 - no auto commit)
private
int
$autoCommit
= 0
$normalizer
private
UriNormalizer
$normalizer
$preprocessed
Is the metadata graph preprocessed already?
private
bool
$preprocessed
= false
$resource
Parent resource for all imported graph nodes
private
RepoResource|null
$resource
= null
$schema
private
Schema
$schema
Methods
__construct()
Creates a new metadata parser.
public
__construct(Repo $repo, mixed $input[, string|null $format = null ]) : mixed
Parameters
- $repo : Repo
- $input : mixed
- $format : string|null = null
Tags
import()
Imports the whole graph by looping over all resources.
public
import(string $namespace, int $singleOutNmsp[, string $errorMode = self::ERRMODE_FAIL ][, int $concurrency = 3 ][, int $retries = 6 ]) : array<string|int, RepoResource|ClientException>
A repository resource is created for every node containing at least one identifer and:
- with at least one outgoing edge (there's at least one triple having the node as a subject) of property other than identifier property
- or being within $namespace
- or when $singleOutNmsp equals to MetadataCollection::CREATE
Resources without identifier property are skipped as we are unable to identify them on the next import (which would lead to duplication).
Resource with a fully qualified URI is considered as having the identifier property value (its URI is promoted to it).
Resources in the graph can denote relationships in any way but all object URIs already existing in the repository and all object URIs in the $namespace will be turned into ACDH ids.
Parameters
- $namespace : string
-
repository resources will be created for all resources in this namespace
- $singleOutNmsp : int
-
should repository resources be created representing URIs outside $namespace (MetadataCollection::SKIP or MetadataCollection::CREATE)
- $errorMode : string = self::ERRMODE_FAIL
-
what should happen if an error is encountered? One of:
- MetadataCollection::ERRMODE_FAIL - the first encountered error throws an exception.
- MetadataCollection::ERRMODE_PASS - the first encountered error turns off the autocommit but ingestion is continued. When all resources are processed and there was no errors, an array of RepoResource objects is returned. If there was an error, an exception is thrown.
- MetadataCollection::ERRMODE_INCLUDE - the first encountered error turns off the autocommit but ingestion is continued. The returned array contains RepoResource objects for successful ingestions and Exception objects for failed ones.
- $concurrency : int = 3
-
number of parallel requests to the repository allowed during the import
- $retries : int = 6
-
how many ingestion attempts should be taken if the repository resource is locked by other request or a network error occurs
Tags
Return values
array<string|int, RepoResource|ClientException>preprocess()
Performs preprocessing - removes literal IDs, promotes URIs to IDs, etc.
public
preprocess() : MetadataCollection
Return values
MetadataCollectionsetAddTitle()
Sets if the title property should be automatically added for ingested resources which are missing it.
public
setAddTitle(bool $add) : MetadataCollection
Parameters
- $add : bool
Return values
MetadataCollectionsetAutoCommit()
Controls the automatic commit behaviour.
public
setAutoCommit(int $count) : MetadataCollection
Even when you use autocommit, you should commit your transaction after
Indexer::index()
(the only exception is when you set auto commit to 1
forcing commiting each and every resource separately but you probably
don't want to do that for performance reasons).
Parameters
- $count : int
-
number of resource automatically triggering a commit (0 - no auto commit)
Return values
MetadataCollectionsetResource()
Sets the repository resource being parent of all resources in the graph imported by the import() method.
public
setResource(RepoResource|null $res) : MetadataCollection
Parameters
- $res : RepoResource|null
Tags
Return values
MetadataCollectionfilterResources()
Returns set of resources to be imported skipping all other.
private
filterResources(string $namespace, int $singleOutNmsp) : array<string|int, TermInterface>
Parameters
- $namespace : string
-
repository resources will be created for all resources in this namespace
- $singleOutNmsp : int
-
should repository resources be created for URIs outside $namespace (MetadataCollection::SKIP or MetadataCollection::CREATE)
Return values
array<string|int, TermInterface>fixReferences()
To avoid creation of duplicated resources it must be assured every resource is referenced acrossed the whole graph with only one URI
private
fixReferences() : void
As it doesn't matter which exactly, the resource URI itself is a convenient choice
promoteBNodesToUris()
Promotes BNodes to their first ID and fixes references to them.
private
promoteBNodesToUris() : void
promoteUrisToIds()
Promotes subjects being fully qualified URLs to ids.
private
promoteUrisToIds() : void
removeLiteralIds()
Removes literal ids from the graph.
private
removeLiteralIds() : void
sanitizeResource()
Cleans up resource metadata.
private
sanitizeResource(TermInterface $res) : DatasetNode
Parameters
- $res : TermInterface