NAME
ETL::Pipeline::Input::FileListing - Input source of a disk folder
SYNOPSIS
use ETL::Pipeline;
ETL::Pipeline->new( {
input => ['FileListing', from => 'Documents', name => qr/\.jpg$/i],
mapping => {FileName => 'File', FullPath => 'Path'},
output => ['UnitTest']
} )->process;
DESCRIPTION
ETL::Pipeline::Input::FileListing defines an input source that reads a disk directory. It returns information about each individual file. Use this input source when you need information about the files and not their content.
METHODS & ATTRIBUTES
Arguments for "input" in ETL::Pipeline
from
from tells ETL::Pipeline::Input::FileListing where to find the files. By default, ETL::Pipeline::Input::FileListing looks in "data_in" in ETL::Pipeline. from tells the code to look in another place.
If from is a regular expression, the code finds the first directory whose name matches. If from is a relative path, it is expected to reside under "data_in" in ETL::Pipeline. An absolute path is exact.
...
ETL::Pipeline::Input::FileListing accepts any of the tests provided by Path::Iterator::Rule. The value of the argument is passed directly into the test. For boolean tests (e.g. readable, exists, etc.), pass an undef
value.
ETL::Pipeline::Input::FileListing automatically applies the file
filter. Do not pass file
through "input" in ETL::Pipeline.
name
is the most commonly used argument. It accepts a glob or regular expression to match file names.
Called from "process" in ETL::Pipeline
get
get retrieves one field about the currently selected file. get can also return methods from the Path::Class::File object. Any additional arguments for get are passed directly into the method.
# ETL::Pipeline::Input::FileListing fields.
$etl->get( 'Inside' );
$etl->get( 'File' );
# Path::Class::File methods.
$etl->get( 'basename' );
ETL::Pipeline::Input::FileListing provides these fields...
- Extension
-
The file extension, without a leading period.
- File
-
The file name with the extension. No directory information.
- Folder
-
The full directory where this file resides.
- Inside
-
The relative directory name where this file resides. These are the directories below "from" where the file resides. You can use this to re-create the directory structure.
- Path
-
The complete path name of the file (directory, name, and extension). You can use this to access the file contents.
- Relative
-
The relative path name of the file. This is the part that comes after the "from" directory.
- Object
-
The Path::Class::File object for this entry.
next_record
Read one record from the file for processing. next_record returns a boolean. True means success. False means it reached the end of the listing (aka no more files).
while ($input->next_record) {
...
}
configure
configure doesn't actually do anything. But it is required by "process" in ETL::Pipeline.
finish
finish doesn't actually do anything. But it is required by "process" in ETL::Pipeline.
Other Methods & Attributes
current
current holds the current record as a hash reference.
iterator
Path::Class::Rule creates an iterator that returns each file in turn. iterator holds it for "next_record".
SEE ALSO
ETL::Pipeline, ETL::Pipeline::Input, Path::Class::File, Path::Class::Rule, Path::Iterator::Rule
AUTHOR
Robert Wohlfarth <robert.j.wohlfarth@vanderbilt.edu>
LICENSE
Copyright 2016 (c) Vanderbilt University Medical Center
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.