NAME

ETL::Pipeline::Input::File - Role for file based input sources

SYNOPSIS

# In the input source...
use Moose;
with 'ETL::Pipeline::Input::File';
...

# In the ETL::Pipeline script...
ETL::Pipeline->new( {
  work_in   => {search => 'C:\Data', find => qr/Ficticious/},
  input     => ['Excel', matching => qr/\.xlsx?$/          ],
  mapping   => {Name => 'A', Address => 'B', ID => 'C'     },
  constants => {Type => 1, Information => 'Demographic'    },
  output    => ['SQL', table => 'NewData'                  ],
} )->process;

# Or with a specific file...
ETL::Pipeline->new( {
  work_in   => {search => 'C:\Data', find => qr/Ficticious/},
  input     => ['Excel', file => 'ExportedData.xlsx'       ],
  mapping   => {Name => 'A', Address => 'B', ID => 'C'     },
  constants => {Type => 1, Information => 'Demographic'    },
  output    => ['SQL', table => 'NewData'                  ],
} )->process;

DESCRIPTION

ETL::Pipeline::Input::File provides methods and attributes common to file based input sources. It makes file searches available for any file format. With ETL::Pipeline::Input::File, you can...

Specify the exact path to the file.
Or search the file system for a matching name.

For setting an exact path, see the "path" attribute. For searches, see the "find" attribute.

File vs. DataFile

ETL::Pipeline::Input::DataFile extends ETL::Pipeline::Input::File. This role, ETL::Pipeline::Input::File makes no assumptions about the file format. It works CSV text files, MS Access databases, spread sheets, XML, or any other format found on disk.

ETL::Pipeline::Input::DataFile assumes that each record is stored on one row. And the data is divided into fields (columns). Basically,

METHODS & ATTRIBUTES

Arguments for "input" in ETL::Pipeline

matching

matching locates the first file that matches the given pattern. The pattern can be a glob or regular expression. matching sets "file" to the first file that matches. Search patterns are case insensitive.

# Search using a regular expression...
$etl->input( 'Excel', matching => qr/\.xlsx$/i );

# Search using a file glob...
$etl->input( 'Excel', matching => '*.xlsx' );

For very weird cases, matching also accepts a code reference. matching executes the subroutine against the file names. matching sets "file" to the first file where the subroutine returns a true value.

matching passes two parameters into the subroutine...

The ETL::Pipeline object
The Path::Class::File object

# File larger than 2K...
$etl->input( 'Excel', matching => sub {
  my ($etl, $file) = @_;
  return (!$file->is_dir && $file->size > 2048 ? 1 : 0);
} );

matching searches inside the "data_in" in ETL::Pipeline directory.

file

file holds a Path::Class::File object pointing to the input file. If "input" in ETL::Pipeline does not set file, then the "matching" attribute searches the file system for a match. If "input" in ETL::Pipeline sets file, then "matching" is ignored.

file is relative to "data_in" in ETL::Pipeline, unless you set it to an absolute path name. With "matching", the search is always limited to "data_in" in ETL::Pipeline.

# File inside of "data_in"...
$etl->input( 'Excel', file => 'Data.xlsx' );

# Absolute path name...
$etl->input( 'Excel', file => 'C:\Data.xlsx' );

AUTHOR

Robert Wohlfarth <robert.j.wohlfarth@vanderbilt.edu>

LICENSE

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

To install ETL::Pipeline, copy and paste the appropriate command in to your terminal.

cpanm

cpanm ETL::Pipeline

CPAN shell

perl -MCPAN -e shell
install ETL::Pipeline

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)