NAME
ETL::Pipeline::Input::Xml - Records from an XML file
SYNOPSIS
use ETL::Pipeline;
ETL::Pipeline->new( {
input => ['Xml', matching => 'Data.xml', root => '/Root'],
mapping => {Name => 'Name', Address => 'Address'},
output => ['UnitTest']
} )->process;
DESCRIPTION
ETL::Pipeline::Input::Xml defines an input source that reads records from an XML file. Individual records are found under the "root" node. Fields are accessed with a relative XML path.
METHODS & ATTRIBUTES
Arguments for "input" in ETL::Pipeline
root
The root attribute holds the XPath for the top node. "next_record" iterates over root's children.
Called from "process" in ETL::Pipeline
get
get returns a list of values from matching nodes. The field name is an XPath, relative to "root". See http://www.w3schools.com/xpath/xpath_functions.asp for more information on XPaths.
XML lends itself to recursive records. What happens when you need two fields under the same subnode? For example, a person involved can have both a name and a role. The names and roles go together. How do you get them together?
get supports subnodes as additional parameters. Pass the top node as the first parameter. Pass the subnode names in subsequent parameters. The values are returned in the same order as the parameters. get returns undef
for any non-existant subnodes.
Here are some examples...
# Return a single value from a single field.
$etl->get( 'Name' );
'John Doe'
# Return a list from multiple fields with the same name.
$etl->get( 'PersonInvolved/Name' );
('John Doe', 'Jane Doe')
# Return a list from subnodes.
$etl->get( 'PersonInvolved', 'Name' );
('John Doe', 'Jane Doe')
# Return a list of related fields from subnodes.
$etl->get( 'PersonInvolved', 'Name', 'Role' );
(['John Doe', 'Husband'], ['Jane Doe', 'Wife'])
In the "mapping" in ETL::Pipeline, those examples looks like this...
{Name => 'Name'}
{Name => 'PersonInvolved/Name'}
{Name => ['PersonInvolved', 'Name']}
{Name => ['PersonInvolved', 'Name', 'Role']}
next_record
This method parses the next file in the folder.
Data::ETL::Extract::XmlFiles builds a list of file names when it first starts. next_record iterates over this in-memory list. It will not parse any new files saved into the folder.
configure
configure opens the XML file and extracts the node set. "next_record" then iterates over the node set.
finish
finish doesn't actually do anything. But it is required by "process" in ETL::Pipeline.
Other Methods & Attributes
attribute
The attribute method returns the value of an attribute on the root node. For example, deleted records may have an attribute like ACTION="DELETE"
. "skip_if" in ETL::Pipeline::Input can use attribute and bypass these records.
$elt->input( 'Xml',
bypass_if => sub { $_->input->attribute( 'ACTION' ) eq 'DELETE' },
matching => 'Data.xml',
root_node => '/File'
);
current
The current attribute holds the currently selected node (record). "next_record" automatically sets current.
node_set
The node_set attribute holds the node set of records. It is the list of records in this file. "configure" automatically sets node_set.
xpath
The xpath attribute holds the current XML::XPath object. It is automatically set by the "next_record" method.
SEE ALSO
ETL::Pipeline, ETL::Pipeline::Input, ETL::Pipeline::Input::File, XML::XPath
AUTHOR
Robert Wohlfarth <robert.j.wohlfarth@vanderbilt.edu>
LICENSE
Copyright 2016 (c) Vanderbilt University Medical Center
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.