NAME

ETL::Pipeline::Input::DelimitedText - Input source for CSV, tab, or pipe delimited files

SYNOPSIS

use ETL::Pipeline;
ETL::Pipeline->new( {
  input   => ['DelimitedText', matching => qr/\.csv$/i],
  mapping => {First => 'Header1', Second => 'Header2'},
  output  => ['UnitTest']
} )->process;

DESCRIPTION

ETL::Pipeline::Input::DelimitedText defines an input source for reading CSV (comma seperated variable), tab delimited, or pipe delimited files. It uses Text::CSV for parsing.

METHODS & ATTRIBUTES

Arguments for "input" in ETL::Pipeline

ETL::Pipeline::Input::DelimitedText implements ETL::Pipeline::Input::File and ETL::Pipeline::Input::TabularFile. It supports all of the attributes from these roles.

In addition, ETL::Pipeline::Input::DelimitedText makes available all of the options for Text::CSV. See Text::CSV for a list.

# Pipe delimited, allowing embedded new lines.
$etl->input( 'DelimitedText', 
  matching => qr/\.dat$/i, 
  sep_char => '|', 
  binary => 1
);

Called from "process" in ETL::Pipeline

get

get retrieves one field from the current record. get accepts one parameter. That parameter can be an index number, a column name, or a regular expression to match against column names.

$etl->get( 0 );
$etl->get( 'First' );
$etl->get( qr/\bfirst\b/i );

next_record

Read one record from the file for processing. next_record returns a boolean. True means success. False means it reached the end of the file.

while ($input->next_record) {
  ...
}

get_column_names

get_column_names reads the field names from the first row in the file. "get" can match field names using regular expressions.

configure

configure opens the file for reading. It takes care of loading the column names, if your file has them.

finish

finish closes the file.

Other Methods & Attributes

record

ETL::Pipeline::Input::DelimitedText stores each record as a list of fields. The field name corresponds with the file order of the field, starting at zero. This attribute holds the current record.

fields

Returns a list of fields from the current record. It dereferences "record".

number_of_fields

This method returns the number of fields in the current record.

csv

csv holds a Text::CSV object for reading the file. You can set options for Text::CSV in the "input" in ETL::Pipeline command.

handle

The Perl file handle for reading data. Text::CSV operates on a handle. "next_record" needs the handle.

SEE ALSO

ETL::Pipeline, ETL::Pipeline::Input, ETL::Pipeline::Input::File, ETL::Pipeline::Input::Tabular

AUTHOR

Robert Wohlfarth <robert.j.wohlfarth@vanderbilt.edu>

LICENSE

Copyright 2016 (c) Vanderbilt University Medical Center

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.