NAME
Datahub::Factory::Importer::EIZ - Import data from the ErfgoedInzicht OAI-PMH endpoint
SYNOPSIS
use Datahub::Factory::Importer::EIZ;
use Data::Dumper qw(Dumper);
my $oai = Datahub::Factory::Importer::EIZ->new(
url => 'https://endpoint.eiz.be/oai',
metadataPrefix => 'oai_lido',
set => '2011',
pid_module => 'rcf',
pid_username => 'datahub',
pid_password => 'datahub',
pid_rcf_container_name => 'datahub',
);
$oai->importer->each(sub {
my $item = shift;
print Dumper($item);
});
DESCRIPTION
Datahub::Factory::Importer::EIZ imports data from the ErfgoedInzicht OAI-PMH endpoint. By default it uses the ListRecords
verb to return all records using the oai_lido format. It is possible to only return records from a single Set or those created, modified or deleted between two dates (from and until).
It automatically deals with resumptionTokens, so client code does not have to implement paging.
To support PIDs, it uses Rackspace Cloud Files to fetch PID CSV's and convert them to temporary sqlite tables.
Provide pid_username
, pid_password
and pid_rcf_container_name
.
PARAMETERS
The endpoint
parameter and some PID module parameters are required.
To link PIDs (Persistent Identifiers) to MSK records, it is necessary to use the PID module to fetch a CSV from either a Rackspace Cloud Files (protected by username and password) instance or a public website. Depending on whether you choose Rackspace or a Web site, different options must be set. If an option is not applicable for your selected module, you can skip the parameter or set it to undef
.
The CSV files are converted to sqlite tables inside /tmp
and can be used in your fixes. See msk.fix for an example.
endpoint
-
URL of the OAI endpoint.
- handler( sub {} | $object | 'NAME' | '+NAME' )
-
Handler to transform each record from XML DOM (XML::LibXML::Element) into Perl hash.
Handlers can be provided as function reference, an instance of a Perl package that implements 'parse', or by a package NAME. Package names should be prepended by
+
or prefixed withCatmandu::Importer::OAI::Parser
. E.gfoobar
will create aCatmandu::Importer::OAI::Parser::foobar
instance. By default the handler Catmandu::Importer::OAI::Parser::oai_dc is used for metadataPrefixoai_dc
, Catmandu::Importer::OAI::Parser::marcxml formarcxml
, Catmandu::Importer::OAI::Parser::mods formods
, Catmandu::Importer::OAI::Parser::Lido forLido
and Catmandu::Importer::OAI::Parser::struct for other formats. In addition there is Catmandu::Importer::OAI::Parser::raw to return the XML as it is. metadata_prefix
-
Any metadata prefix the endpoint supports. Defaults to
oai_lido
. set
-
Optionally, a set to get records from.
from
-
Optionally, a must_be_older_than date.
until
-
Optionally, a must_be_younger_than date.
username
-
Optional HTAccess username.
password
-
Optional HTAccess password.
pids_path
-
Path to a CSV file containing PIDS list.
aat_path
-
Path to a CSV file containing AAT terms vocabulary.
creators_path
-
Path to a CSV file containing Creator terms vocabulary.
generate_vocabularies
-
Generate temporary SQLite db containing vocabularies from CSV file (1 or 0, defaults to 1)
ATTRIBUTES
importer
-
A Importer that can be used in your script.
AUTHOR
Pieter De Praetere <pieter at packed.be > Matthias Vandermaesen <matthias dot vandermaesen at vlaamsekunstcollectie.be >
COPYRIGHT
Copyright 2017- PACKED vzw, Vlaamse Kunstcollectie vzw
LICENSE
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.