NAME
ETL::Pipeline::Input::File::List - Role for input sources with multiple files
SYNOPSIS
# In the input source...
use Moose;
with 'ETL::Pipeline::Input';
with 'ETL::Pipeline::Input::File::List';
...
sub run {
my ($self, $etl) = @_;
...
while (my $path = $self->next_path( $etl )) {
...
}
}
DESCRIPTION
This is a role used by input sources. It defines everything you need to process multiple input files of the same format. The role uses Path::Class::Rule to locate matching files.
Your input source calls the "next_path" method in a loop. That's it. The role automatically processes constructor arguments that match Path::Class::Rule criteria. It then builds a list of matching files the first time your code calls "next_path".
METHODS & ATTRIBUTES
Arguments for "input" in ETL::Pipeline
ETL::Pipeline::Input::File::List accepts any of the tests provided by Path::Iterator::Rule. The value of the argument is passed directly into the test. For boolean tests (e.g. readable, exists, etc.), pass an undef
value.
ETL::Pipeline::Input::File automatically applies the file
filter. Do not pass file
through "input" in ETL::Pipeline.
iname
is the most common one that I use. It matches the file name, supports wildcards and regular expressions, and is case insensitive.
# Search using a regular expression...
$etl->input( 'XmlFiles', iname => qr/\.xml$/ );
# Search using a file glob...
$etl->input( 'XmlFiles', iname => '*.xml' );
path
Path::Class::File object for the currently selected file. This is first file that matches the criteria. When you call "next_path", it finds the next match and sets path.
So path always points to the current file. It should be used by your input source class as the file name.
# Inside the input source class...
while ($self->next_path( $etl )) {
open my $io, '<', $self->path;
...
}
undef
means no more matches.
Methods
next_path
Looks for the next match in the list and sets the "path" attribute. It also returns the matching path. Your input source class should setup a loop calling this method. Inside the loop, process each file.
next_path takes one parameter - the ETL::Pipeline object. The method matches files in "data_in" in ETL::Pipeline.
SEE ALSO
ETL::Pipeline, ETL::Pipeline::Input, Path::Class::File, Path::Class::Rule, Path::Iterator::Rule
AUTHOR
Robert Wohlfarth <robert.j.wohlfarth@vumc.org>
LICENSE
Copyright 2021 (c) Vanderbilt University Medical Center
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.