NAME

ETL::Pipeline::Input::XmlFiles - Records in individual XML files

SYNOPSIS

use ETL::Pipeline;
ETL::Pipeline->new( {
  input   => ['XmlFiles', from => 'Documents'],
  mapping => {Name => '/Root/Name', Address => '/Root/Address'},
  output  => ['UnitTest']
} )->process;

DESCRIPTION

ETL::Pipeline::Input::XmlFiles defines an input source that reads multiple XML files from a directory. Each XML file contains exactly one record. Fields are accessed with the full XML path.

METHODS & ATTRIBUTES

Arguments for "input" in ETL::Pipeline

from

from tells ETL::Pipeline::Input::XmlFiles where to find the data files. By default, ETL::Pipeline::Input::XmlFiles looks in "data_in" in ETL::Pipeline. from tells the code to look in another place.

If from is a regular expression, the code finds the first directory whose name matches. If from is a relative path, it is expected to reside under "data_in" in ETL::Pipeline. An absolute path is exact.

...

ETL::Pipeline::Input::XmlFiles accepts any of the tests provided by Path::Iterator::Rule. The value of the argument is passed directly into the test. For boolean tests (e.g. readable, exists, etc.), pass an undef value.

ETL::Pipeline::Input::XmlFiles automatically applies the file and iname filters. Do not pass file through "input" in ETL::Pipeline. You may pass in name or iname to override the default filter of *.xml.

Called from "process" in ETL::Pipeline

get

get returns a list of values from matching nodes. The field name is an XPath. See http://www.w3schools.com/xpath/xpath_functions.asp for more information on XPaths.

XML lends itself to recursive records. What happens when you need two fields under the same subnode? For example, a person involved can have both a name and a role. The names and roles go together. How do you get them together?

get supports subnodes as additional parameters. Pass the top node as the first parameter. Pass the subnode names in subsequent parameters. The values are returned in the same order as the parameters. get returns undef for any non-existant subnodes.

Here are some examples...

# Return a single value from a single field.
$etl->get( '/Root/Name' );
'John Doe'

# Return a list from multiple fields with the same name.
$etl->get( '/Root/PersonInvolved/Name' );
('John Doe', 'Jane Doe')

# Return a list from subnodes.
$etl->get( '/Root/PersonInvolved', 'Name' );
('John Doe', 'Jane Doe')

# Return a list of related fields from subnodes.
$etl->get( '/Root/PersonInvolved', 'Name', 'Role' );
(['John Doe', 'Husband'], ['Jane Doe', 'Wife'])

In the "mapping" in ETL::Pipeline, those examples looks like this...

{Name => '/Root/Name'}
{Name => '/Root/PersonInvolved/Name'}
{Name => ['/Root/PersonInvolved', 'Name']}
{Name => ['/Root/PersonInvolved', 'Name', 'Role']}

next_record

This method parses the next file in the folder.

Data::ETL::Extract::XmlFiles builds a list of file names when it first starts. next_record iterates over this in-memory list. It will not parse any new files saved into the folder.

configure

configure doesn't actually do anything. But it is required by "process" in ETL::Pipeline.

finish

finish doesn't actually do anything. But it is required by "process" in ETL::Pipeline.

Other Methods & Attributes

exists

The exists method tells you whether the given path exists or not. It returns a boolean value. True means that the given node exists in this XML file. False means that it does not.

exists accepts an XPath string as the only parameter. You can learn more about XPath here: http://www.w3schools.com/xpath/xpath_functions.asp.

file

The file attribute holds a Path::Class:File object for the current XML file. You can use it for accessing the file name or directory.

file is automatically set by "next_record".

iterator

Path::Class::Rule creates an iterator that returns each file in turn. iterator holds it for "next_record".

xpath

The xpath attribute holds the current XML::XPath object. It is automatically set by the "next_record" method.

SEE ALSO

ETL::Pipeline, ETL::Pipeline::Input, ETL::Pipeline::Input::XML, Path::Class::File, Path::Class::Rule, Path::Iterator::Rule, XML::XPath

AUTHOR

Robert Wohlfarth <robert.j.wohlfarth@vanderbilt.edu>

LICENSE

Copyright 2016 (c) Vanderbilt University Medical Center

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.