NAME

ETL::Pipeline::Input::FileListing - Input source of a disk folder

SYNOPSIS

use ETL::Pipeline;
ETL::Pipeline->new( {
  input   => ['FileListing', from => 'Documents', name => qr/\.jpg$/i],
  mapping => {FileName => 'File', FullPath => 'Path'},
  output  => ['UnitTest']
} )->process;

DESCRIPTION

ETL::Pipeline::Input::FileListing defines an input source that reads a disk directory. It returns information about each individual file. Use this input source when you need information about the files and not their content.

METHODS & ATTRIBUTES

Arguments for "input" in ETL::Pipeline

from

from tells ETL::Pipeline::Input::FileListing where to find the files. By default, ETL::Pipeline::Input::FileListing looks in "data_in" in ETL::Pipeline. from tells the code to look in another place.

If from is a regular expression, the code finds the first directory whose name matches. If from is a relative path, it is expected to reside under "data_in" in ETL::Pipeline. An absolute path is exact.

...

ETL::Pipeline::Input::FileListing accepts any of the tests provided by Path::Iterator::Rule. The value of the argument is passed directly into the test. For boolean tests (e.g. readable, exists, etc.), pass an undef value.

ETL::Pipeline::Input::FileListing automatically applies the file filter. Do not pass file through "input" in ETL::Pipeline.

name is the most commonly used argument. It accepts a glob or regular expression to match file names.

Called from "process" in ETL::Pipeline

get

get retrieves one field about the currently selected file. get can also return methods from the Path::Class::File object. Any additional arguments for get are passed directly into the method.

# ETL::Pipeline::Input::FileListing fields.
$etl->get( 'Inside' );
$etl->get( 'File' );

# Path::Class::File methods.
$etl->get( 'basename' );

ETL::Pipeline::Input::FileListing provides these fields...

Extension

The file extension, without a leading period.

File

The file name with the extension. No directory information.

Folder

The full directory where this file resides.

Inside

The relative directory name where this file resides. These are the directories below "from" where the file resides. You can use this to re-create the directory structure.

Path

The complete path name of the file (directory, name, and extension). You can use this to access the file contents.

Relative

The relative path name of the file. This is the part that comes after the "from" directory.

Object

The Path::Class::File object for this entry.

next_record

Read one record from the file for processing. next_record returns a boolean. True means success. False means it reached the end of the listing (aka no more files).

while ($input->next_record) {
  ...
}

configure

configure doesn't actually do anything. But it is required by "process" in ETL::Pipeline.

finish

finish doesn't actually do anything. But it is required by "process" in ETL::Pipeline.

Other Methods & Attributes

current

current holds the current record as a hash reference.

iterator

Path::Class::Rule creates an iterator that returns each file in turn. iterator holds it for "next_record".

SEE ALSO

ETL::Pipeline, ETL::Pipeline::Input, Path::Class::File, Path::Class::Rule, Path::Iterator::Rule

AUTHOR

Robert Wohlfarth <robert.j.wohlfarth@vanderbilt.edu>

LICENSE

Copyright 2016 (c) Vanderbilt University Medical Center

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.