NAME

ETL::Pipeline::Input::Xml - Records from an XML file

SYNOPSIS

use ETL::Pipeline;
ETL::Pipeline->new( {
  input   => ['Xml', matching => 'Data.xml', root => '/Root'],
  mapping => {Name => 'Name', Address => 'Address'},
  output  => ['UnitTest']
} )->process;

DESCRIPTION

ETL::Pipeline::Input::Xml defines an input source that reads records from an XML file. Individual records are found under the "root" node. Fields are accessed with a relative XML path.

METHODS & ATTRIBUTES

Arguments for "input" in ETL::Pipeline

root

The root attribute holds the XPath for the top node. "next_record" iterates over root's children.

Called from "process" in ETL::Pipeline

get

get returns a list of values from matching nodes. The field name is an XPath, relative to "root". See http://www.w3schools.com/xpath/xpath_functions.asp for more information on XPaths.

XML lends itself to recursive records. What happens when you need two fields under the same subnode? For example, a person involved can have both a name and a role. The names and roles go together. How do you get them together?

get supports subnodes as additional parameters. Pass the top node as the first parameter. Pass the subnode names in subsequent parameters. The values are returned in the same order as the parameters. get returns undef for any non-existant subnodes.

Here are some examples...

# Return a single value from a single field.
$etl->get( 'Name' );
'John Doe'

# Return a list from multiple fields with the same name.
$etl->get( 'PersonInvolved/Name' );
('John Doe', 'Jane Doe')

# Return a list from subnodes.
$etl->get( 'PersonInvolved', 'Name' );
('John Doe', 'Jane Doe')

# Return a list of related fields from subnodes.
$etl->get( 'PersonInvolved', 'Name', 'Role' );
(['John Doe', 'Husband'], ['Jane Doe', 'Wife'])

In the "mapping" in ETL::Pipeline, those examples looks like this...

{Name => 'Name'}
{Name => 'PersonInvolved/Name'}
{Name => ['PersonInvolved', 'Name']}
{Name => ['PersonInvolved', 'Name', 'Role']}

next_record

This method parses the next file in the folder.

Data::ETL::Extract::XmlFiles builds a list of file names when it first starts. next_record iterates over this in-memory list. It will not parse any new files saved into the folder.

configure

configure opens the XML file and extracts the node set. "next_record" then iterates over the node set.

finish

finish doesn't actually do anything. But it is required by "process" in ETL::Pipeline.

Other Methods & Attributes

attribute

The attribute method returns the value of an attribute on the root node. For example, deleted records may have an attribute like ACTION="DELETE". "skip_if" in ETL::Pipeline::Input can use attribute and bypass these records.

$elt->input( 'Xml',
    bypass_if => sub { $_->input->attribute( 'ACTION' ) eq 'DELETE' },
    matching  => 'Data.xml',
    root_node => '/File'
);

current

The current attribute holds the currently selected node (record). "next_record" automatically sets current.

node_set

The node_set attribute holds the node set of records. It is the list of records in this file. "configure" automatically sets node_set.

xpath

The xpath attribute holds the current XML::XPath object. It is automatically set by the "next_record" method.

SEE ALSO

ETL::Pipeline, ETL::Pipeline::Input, ETL::Pipeline::Input::File, XML::XPath

AUTHOR

Robert Wohlfarth <robert.j.wohlfarth@vanderbilt.edu>

LICENSE

Copyright 2016 (c) Vanderbilt University Medical Center

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.