NAME
ETL::Pipeline::Input::DelimitedText - Input source for CSV, tab, or pipe delimited files
SYNOPSIS
use ETL::Pipeline;
ETL::Pipeline->new( {
input => ['DelimitedText', matching => qr/\.csv$/i],
mapping => {First => 'Header1', Second => 'Header2'},
output => ['UnitTest']
} )->process;
DESCRIPTION
ETL::Pipeline::Input::DelimitedText defines an input source for reading CSV (comma seperated variable), tab delimited, or pipe delimited files. It uses Text::CSV for parsing.
METHODS & ATTRIBUTES
Arguments for "input" in ETL::Pipeline
ETL::Pipeline::Input::DelimitedText implements ETL::Pipeline::Input::File and ETL::Pipeline::Input::TabularFile. It supports all of the attributes from these roles.
In addition, ETL::Pipeline::Input::DelimitedText makes available all of the options for Text::CSV. See Text::CSV for a list.
# Pipe delimited, allowing embedded new lines.
$etl->input( 'DelimitedText',
matching => qr/\.dat$/i,
sep_char => '|',
binary => 1
);
Called from "process" in ETL::Pipeline
get
get retrieves one field from the current record. get accepts one parameter. That parameter can be an index number, a column name, or a regular expression to match against column names.
$etl->get( 0 );
$etl->get( 'First' );
$etl->get( qr/\bfirst\b/i );
next_record
Read one record from the file for processing. next_record returns a boolean. True means success. False means it reached the end of the file.
while ($input->next_record) {
...
}
get_column_names
get_column_names reads the field names from the first row in the file. "get" can match field names using regular expressions.
configure
configure opens the file for reading. It takes care of loading the column names, if your file has them.
finish
finish closes the file.
Other Methods & Attributes
record
ETL::Pipeline::Input::DelimitedText stores each record as a list of fields. The field name corresponds with the file order of the field, starting at zero. This attribute holds the current record.
fields
Returns a list of fields from the current record. It dereferences "record".
number_of_fields
This method returns the number of fields in the current record.
csv
csv holds a Text::CSV object for reading the file. You can set options for Text::CSV in the "input" in ETL::Pipeline command.
handle
The Perl file handle for reading data. Text::CSV operates on a handle. "next_record" needs the handle.
SEE ALSO
ETL::Pipeline, ETL::Pipeline::Input, ETL::Pipeline::Input::File, ETL::Pipeline::Input::Tabular
AUTHOR
Robert Wohlfarth <robert.j.wohlfarth@vanderbilt.edu>
LICENSE
Copyright 2016 (c) Vanderbilt University Medical Center
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.