NAME
ETL::Pipeline::Output - Role for ETL::Pipeline output destinations
SYNOPSIS
use Moose;
with 'ETL::Pipeline::Output';
sub write_record {
# Add code to save your data here
...
}
DESCRIPTION
ETL::Pipeline reads data from an input source, transforms it, and writes the information to an output destination. This role defines the required methods and attributes for output destinations. Every output destination must implement ETL::Pipeline::Output.
ETL::Pipeline works by calling the methods defined in this role. The role presents a common interface. It works as a shim, tying database or file access modules with ETL::Pipeline. For example, SQL databases may use DBI or DBIx::Class.
Adding a new output destination
While ETL::Pipeline provides a couple generic output destinations, the real value of ETL::Pipeline comes from adding your own, business specific, destinations...
- 1. Create a Perl module. Name it
ETL::Pipeline::Output::...
. - 2. Make it a Moose object:
use Moose;
. - 3. Include the role:
with 'ETL::Pipeline::Output';
. - 4. Add the "write_record" method:
sub write_record { ... }
. - 5. Add the "set" method:
sub set { ... }
. - 6. Add the "new_record" method:
sub new_record { ... }
. - 7. Add the "configure" method:
sub configure { ... }
. - 8. Add the "finish" method:
sub finish { ... }
.
Ta-da! Your output destination is ready to use:
$etl->output( 'YourNewDestination' );
Provided out of the box
ETL::Pipeline comes with a couple of generic output destinations...
- ETL::Pipeline::Output::Hash
-
Stores records in a Perl hash. Useful for loading support files and tying them together later.
- ETL::Pipeline::Output::Perl
-
Executes a subroutine against the record. Useful for debugging data issues.
METHODS & ATTRIBUTES
pipeline
pipeline returns the ETL::Pipeline object using this input source. You can access information about the pipeline inside the methods.
"input" in ETL::Pipeline automatically sets this attribute.
Arguments for "output" in ETL::Pipeline
Note: This role defines no attributes that are set with the "output" in ETL::Pipeline command. Each child class defines its own options.
Called from "process" in ETL::Pipeline
set
set temporarily saves the value of an individual output field. "write_record" will later copy these values to the correct destination.
"process" in ETL::Pipeline calls set inside of a loop - once for each field. set accepts two parameters:
There is no return value.
Couldn't you just use a hash?
set allows your output destination to choose the in-memory storage that best fits. This might be a hash, a list, or an object of some type. set merely provides a common interface for ETL::Pipeline.
write_record
write_record sends the current record to its final destination. "process" in ETL::Pipeline calls this method once for each record. write_record is the last thing done with this record.
write_record returns a boolean flag. A true value means success saving the record. A false value indicates an error.
When your code encounters an error, call the "error" method like this...
return $self->error( 'Error message here' );
"error" returns a false value. The default "error" does nothing. To save errors, override "error" and add the new functionality. When overriding "error", it is not necessary to return anything. ETL::Pipeline::Output ensures that "error" always returns false.
For fatal errors, use the croak
command from Carp instead.
new_record
Start a brand new, clean record. "write_record" automatically calls new_record, every time, after "write_record" finishes. This means that even if the save failed, "write_record" still calls new_record. The original record with the error is lost.
configure
configure prepares the output destination. It can open files, make database connections, or anything else required before saving the first record.
Why not do this in the class constructor? Some roles add automatic configuration. Those roles use the usual Moose method modifiers, which would not work with the constructor.
This configure - for the output destination - is called after the "configure" in ETL::Pipeline::Input of the input source. This method can expect that the input source is fully configured and ready for use.
finish
finish shuts down the output destination. It can close files, disconnect from the database, or anything else required to cleanly terminate the output.
Why not do this in the class destructor? Some roles add automatic functionality via Moose method modifiers. This would not work with a destructor.
This finish - for the output destination - is called before the "finish" in ETL::Pipeline::Input of the input source. This method should expect that the input source has reached end-of-file by this point, but is not closed yet.
Other methods and attributes
record_number
The record_number attribute tells you how many total records have been saved by "write_record". The first record is always 1.
ETL::Pipeline::Output automatically increments the counter after "write_record". The "write_record" method should not change record_number.
decrement_record_number
This method decreases "record_number" by one. It can be used to back out header records from the count.
increment_record_number
This method increases "record_number" by one.
error
error handles errors from "write_record". The default error discards any error messages. Override error if you want to capture the messages and/or the record that caused it.
error always returns a false value - even if you override it.
SEE ALSO
ETL::Pipeline, ETL::Pipeline::Input, ETL::Pipeline::Output::Hash, ETL::Pipeline::Output::Perl, ETL::Pipeline::Output::UnitTest
AUTHOR
Robert Wohlfarth <robert.j.wohlfarth@vanderbilt.edu>
LICENSE
Copyright 2016 (c) Vanderbilt University
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.