NAME
Datahub::Factory::Command::transport - Implements the 'transport' command.
DESCRIPTION
This command allows datamanagers to (a) fetch data from a (local) source (b) transform the data to LIDO using a fix (c) upload the LIDO transformed data to a Datahub instance.
COMMAND LINE INTERFACE
--pipeline
-
Location of the pipeline configuration file.
--general
-
Location of the general configuration file.
--importer
-
Location of the importer configuration file.
--fixer
-
Location of the fixer configuration file.
--exporter
-
Location of the exporter configuration file.
--verbose
-
Set this flag for pretty output of the ETL processing.
Pipeline configuration file
The pipeline configuration file is in the INI format and its location is provided to the application using the --pipeline
switch.
The file is broadly divided in two parts: the first (shortest) part configures the pipeline itself and sets the plugins to use for the import, fix and export actions. The second part sets options specific for the used plugins.
Pipeline configuration
This part has three sections: [Importer]
, [Fixer]
and [Exporter]
. Every section has just one option: plugin
. Set this to the plugin you want to use for every action.
All current supported plugins are in the Importer
and Exporter
folders. For the [Fixer]
, only the Fix plugin is supported.
Supported Importer plugins:
Supported Exporter plugins:
Plugin configuration
[Importer]
plugin = OAI
id_path = 'lidoRecID.0._'
[plugin_importer_OAI]
endpoint = https://oai.my.museum/oai
[Fixer]
plugin = Fix
[plugin_fixer_Fix]
file_name = '/home/datahub/my.fix'
[Exporter]
plugin = YAML
[plugin_exporter_YAML]
All plugins have their own configuration options in sections called [plugin_type_name]
where type
can be importer, exporter or fixer and name
is the name of the plugin.
All plugins define their own options as parameters to the respective plugin. All possible parameters are valid items in the configuration section.
If a plugin requires no options, you still need to create the (empty) configuration section (e.g. [plugin_exporter_LIDO]
in the above example).
Importer plugin
The id_path
option contains the path (in Fix syntax) of the identifier of each record in your data after the fix has been applied, but before it is submitted to the Exporter. It is used for reporting and logging.
Fixer plugin
[plugin_fixer_Fix]
condition = record.institution_name
fixers = FOO, BAR
[plugin_fixer_Fix]
file_name = /home/datahub/my.fix
The [plugin_fixer_Fix]
can directly load a fix file (via the option file_name
) or can be configured to conditionally load a different fix file to support multiple fix files for the same data stream (e.g. when two institutions with different data models use the same API endpoint). This is done by setting the condition
and fixers
options.
Conditional fixers
[plugin_fixer_Fix]
condition = record.institution_name
fixers = FOO, BAR
[plugin_fixer_FOO]
condition = 'Museum of Foo'
file_name = '/home/datahub/foo.fix'
[plugin_fixer_BAR]
condition = 'Museum of Bar'
file_name = '/home/datahub/bar.fix'
If you want to separate the data stream into multiple (smaller) streams with a different fix file for each stream, you can do this by setting the appropriate options in the [plugin_fixer_Fix]
block. Note that id_path
is still mandatory.
Set condition
to the Fix-compatible path in the original stream that holds the condition you want to use to split the stream.
Provide a comma-separated list of fixer plugins in fixers
.
For every fixer plugin in fixers
, create a configuration block called [plugin_fixer_name]
and provide the following options:
condition
-
The value that the
condition
from[plugin_fixer_Fix]
must have for the record to belong to this block. file_name
-
The location of the fix file that must be executed for every record in this block.
Example configuration file
[Importer]
plugin = Adlib
id_path = 'record.id'
[Fixer]
plugin = Fix
[Exporter]
plugin = Datahub
[plugin_importer_Adlib]
file_name = '/tmp/adlib.xml'
data_path = 'recordList.record.*'
[plugin_fixer_Fix]
file_name = '/tmp/msk.fix'
[plugin_exporter_Datahub]
datahub_url = https://my.thedatahub.io
datahub_format = LIDO
oauth_client_id = datahub
oauth_client_secret = datahub
oauth_username = datahub
oauth_password = datahub
AUTHORS
Matthias Vandermaesen <matthias.vandermaesen@vlaamsekunstcollectie.be> Pieter De Praetere <pieter@packed.be>
COPYRIGHT
Copyright 2016 - PACKED vzw, Vlaamse Kunstcollectie vzw
LICENSE
This library is free software; you can redistribute it and/or modify it under the terms of the GPLv3.