Documentation

MetadataCollection extends Dataset
in package

Class for importing whole metadata graph into the repository.

Tags
author

zozlak

Table of Contents

Constants

ALLOWED_CONFLICT_REASONS_REGEX  = '/Resource [0-9]+ locked|Transaction [0-9]+ locked|Owned by other request|Lock not available|duplicate key value|deadlock detected/'
CREATE  = 2
ERRMODE_FAIL  = 'fail'
ERRMODE_INCLUDE  = 'include'
ERRMODE_PASS  = 'pass'
NETWORKERROR_SLEEP  = 3
SKIP  = 1

Properties

$debug  : bool|int
Turns debug messages on.
$addTitle  : bool
Should the title property be added automatically for ingested resources missing it.
$repo  : Repo
Repository connection object
$autoCommit  : int
Number of resource automatically triggering a commit (0 - no auto commit)
$normalizer  : UriNormalizer
$preprocessed  : bool
Is the metadata graph preprocessed already?
$resource  : RepoResource|null
Parent resource for all imported graph nodes
$schema  : Schema

Methods

__construct()  : mixed
Creates a new metadata parser.
import()  : array<string|int, RepoResource|ClientException>
Imports the whole graph by looping over all resources.
preprocess()  : MetadataCollection
Performs preprocessing - removes literal IDs, promotes URIs to IDs, etc.
setAddTitle()  : MetadataCollection
Sets if the title property should be automatically added for ingested resources which are missing it.
setAutoCommit()  : MetadataCollection
Controls the automatic commit behaviour.
setResource()  : MetadataCollection
Sets the repository resource being parent of all resources in the graph imported by the import() method.
filterResources()  : array<string|int, TermInterface>
Returns set of resources to be imported skipping all other.
fixReferences()  : void
To avoid creation of duplicated resources it must be assured every resource is referenced acrossed the whole graph with only one URI
promoteBNodesToUris()  : void
Promotes BNodes to their first ID and fixes references to them.
promoteUrisToIds()  : void
Promotes subjects being fully qualified URLs to ids.
removeLiteralIds()  : void
Removes literal ids from the graph.
sanitizeResource()  : DatasetNode
Cleans up resource metadata.

Constants

ALLOWED_CONFLICT_REASONS_REGEX

public mixed ALLOWED_CONFLICT_REASONS_REGEX = '/Resource [0-9]+ locked|Transaction [0-9]+ locked|Owned by other request|Lock not available|duplicate key value|deadlock detected/'

Properties

$debug

Turns debug messages on.

public static bool|int $debug = false

There are three levels:

  • false or 0 - no debug messages at all
  • true or 1 - basic information on preprocessing stages and detailed information on ingestion progress
  • 2 - detailed information on both preprocessing and ingestion progress

$addTitle

Should the title property be added automatically for ingested resources missing it.

protected bool $addTitle = false

$autoCommit

Number of resource automatically triggering a commit (0 - no auto commit)

private int $autoCommit = 0

$preprocessed

Is the metadata graph preprocessed already?

private bool $preprocessed = false

Methods

__construct()

Creates a new metadata parser.

public __construct(Repo $repo, mixed $input[, string|null $format = null ]) : mixed
Parameters
$repo : Repo
$input : mixed
$format : string|null = null
Tags
see
Util::parse

import()

Imports the whole graph by looping over all resources.

public import(string $namespace, int $singleOutNmsp[, string $errorMode = self::ERRMODE_FAIL ][, int $concurrency = 3 ][, int $retries = 6 ]) : array<string|int, RepoResource|ClientException>

A repository resource is created for every node containing at least one identifer and:

  • with at least one outgoing edge (there's at least one triple having the node as a subject) of property other than identifier property
  • or being within $namespace
  • or when $singleOutNmsp equals to MetadataCollection::CREATE

Resources without identifier property are skipped as we are unable to identify them on the next import (which would lead to duplication).

Resource with a fully qualified URI is considered as having the identifier property value (its URI is promoted to it).

Resources in the graph can denote relationships in any way but all object URIs already existing in the repository and all object URIs in the $namespace will be turned into ACDH ids.

Parameters
$namespace : string

repository resources will be created for all resources in this namespace

$singleOutNmsp : int

should repository resources be created representing URIs outside $namespace (MetadataCollection::SKIP or MetadataCollection::CREATE)

$errorMode : string = self::ERRMODE_FAIL

what should happen if an error is encountered? One of:

  • MetadataCollection::ERRMODE_FAIL - the first encountered error throws an exception.
  • MetadataCollection::ERRMODE_PASS - the first encountered error turns off the autocommit but ingestion is continued. When all resources are processed and there was no errors, an array of RepoResource objects is returned. If there was an error, an exception is thrown.
  • MetadataCollection::ERRMODE_INCLUDE - the first encountered error turns off the autocommit but ingestion is continued. The returned array contains RepoResource objects for successful ingestions and Exception objects for failed ones.
$concurrency : int = 3

number of parallel requests to the repository allowed during the import

$retries : int = 6

how many ingestion attempts should be taken if the repository resource is locked by other request or a network error occurs

Tags
throws
InvalidArgumentException
throws
IndexerException
throws
ClientException
Return values
array<string|int, RepoResource|ClientException>

setAutoCommit()

Controls the automatic commit behaviour.

public setAutoCommit(int $count) : MetadataCollection

Even when you use autocommit, you should commit your transaction after Indexer::index() (the only exception is when you set auto commit to 1 forcing commiting each and every resource separately but you probably don't want to do that for performance reasons).

Parameters
$count : int

number of resource automatically triggering a commit (0 - no auto commit)

Return values
MetadataCollection

filterResources()

Returns set of resources to be imported skipping all other.

private filterResources(string $namespace, int $singleOutNmsp) : array<string|int, TermInterface>
Parameters
$namespace : string

repository resources will be created for all resources in this namespace

$singleOutNmsp : int

should repository resources be created for URIs outside $namespace (MetadataCollection::SKIP or MetadataCollection::CREATE)

Return values
array<string|int, TermInterface>

fixReferences()

To avoid creation of duplicated resources it must be assured every resource is referenced acrossed the whole graph with only one URI

private fixReferences() : void

As it doesn't matter which exactly, the resource URI itself is a convenient choice

promoteBNodesToUris()

Promotes BNodes to their first ID and fixes references to them.

private promoteBNodesToUris() : void

promoteUrisToIds()

Promotes subjects being fully qualified URLs to ids.

private promoteUrisToIds() : void

removeLiteralIds()

Removes literal ids from the graph.

private removeLiteralIds() : void

sanitizeResource()

Cleans up resource metadata.

private sanitizeResource(TermInterface $res) : DatasetNode
Parameters
$res : TermInterface
Tags
throws
InvalidArgumentException
Return values
DatasetNode

        
On this page

Search results