NAME

ETL::Pipeline::Input::File::List - Role for input sources with multiple files

SYNOPSIS

  # In the input source...
  use Moose;
  with 'ETL::Pipeline::Input';
  with 'ETL::Pipeline::Input::File::List';
  ...
  sub run {
    my ($self, $etl) = @_;
    ...
	while (my $path = $self->next_path( $etl )) {
	  ...
	}
  }

DESCRIPTION

This is a role used by input sources. It defines everything you need to process multiple input files of the same format. The role uses Path::Class::Rule to locate matching files.

Your input source calls the "next_path" method in a loop. That's it. The role automatically processes constructor arguments that match Path::Class::Rule criteria. It then builds a list of matching files the first time your code calls "next_path".

METHODS & ATTRIBUTES

Arguments for "input" in ETL::Pipeline

ETL::Pipeline::Input::File::List accepts any of the tests provided by Path::Iterator::Rule. The value of the argument is passed directly into the test. For boolean tests (e.g. readable, exists, etc.), pass an undef value.

ETL::Pipeline::Input::File automatically applies the file filter. Do not pass file through "input" in ETL::Pipeline.

iname is the most common one that I use. It matches the file name, supports wildcards and regular expressions, and is case insensitive.

# Search using a regular expression...
$etl->input( 'XmlFiles', iname => qr/\.xml$/ );

# Search using a file glob...
$etl->input( 'XmlFiles', iname => '*.xml' );

path

Path::Class::File object for the currently selected file. This is first file that matches the criteria. When you call "next_path", it finds the next match and sets path.

So path always points to the current file. It should be used by your input source class as the file name.

# Inside the input source class...
while ($self->next_path( $etl )) {
  open my $io, '<', $self->path;
  ...
}

undef means no more matches.

Methods

next_path

Looks for the next match in the list and sets the "path" attribute. It also returns the matching path. Your input source class should setup a loop calling this method. Inside the loop, process each file.

next_path takes one parameter - the ETL::Pipeline object. The method matches files in "data_in" in ETL::Pipeline.

SEE ALSO

ETL::Pipeline, ETL::Pipeline::Input, Path::Class::File, Path::Class::Rule, Path::Iterator::Rule

AUTHOR

Robert Wohlfarth <robert.j.wohlfarth@vumc.org>

LICENSE

Copyright 2021 (c) Vanderbilt University Medical Center

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.