NAME
ETL::Pipeline::Output - Role for ETL::Pipeline output destinations
SYNOPSIS
use Moose;
with 'ETL::Pipeline::Output';
sub open {
# Add code to open the output destination
...
}
sub write {
# Add code to save your data here
...
}
sub close {
# Add code to close the destination
...
}
DESCRIPTION
An output destination fulfills the load part of ETL. This is where the data ends up. These are the outputs of the process.
A destination can be anything - database, file, or anything. Destinations are customized to your environment. And you will probably only have a few.
ETL::Pipeline interacts with the output destination is 3 stages...
- 1. Open - connect to the database, open the file, whatever setup is appropriate for your destination.
- 2. Write - called once per record. This is the part that actually performs the output.
- 3. Close - finished processing and cleanly shut down the destination.
This role sets the requirements for these 3 methods. It should be consumed by all output destination classes. ETL::Pipeline relies on the destination having this role.
How do I create an output destination?
ETL::Pipeline provides a couple generic output destinations as exmaples or for very simple uses. The real value of ETL::Pipeline comes from adding your own, business specific, destinations...
- 1. Start a new Perl module. I recommend putting it in the
ETL::Pipeline::Output
namespace. ETL::Pipeline will pick it up automatically. - 2. Make your module a Moose class -
use Moose;
. - 3. Consume this role -
with 'ETL::Pipeline::Output';
. - 4. Write the "open", "close", and "write" methods.
- 5. Add any attributes for your class.
The new destination is ready to use, like this...
$etl->output( 'YourNewDestination' );
You can leave off the leading ETL::Pipeline::Output::.
When ETL::Pipeline calls "open" or "close", it passes the ETL::Pipeline object as the only parameter. When ETL::Pipeline calls "write", it passed two parameters - the ETL::Pipeline object and the record. The record is a Perl hash.
Example destinations
ETL::Pipeline comes with a couple of generic output destinations...
- ETL::Pipeline::Output::Hash
-
Stores records in a Perl hash. Useful for loading support files and tying them together later.
- ETL::Pipeline::Output::Perl
-
Executes a subroutine against the record. Useful for debugging data issues.
Why this way?
My work involves a small number of destinations that rarely change and a greater number of sources that do change. So I designed ETL::Pipeline to minimize time writing new input sources. The trade off was slightly more complex output destinations.
Upgrading from older versions
ETL::Pipeline version 3 is not compatible with output destinations from older versions. You will need to rewrite your custom output destinations.
- Change the
configure
to "open". - Change
finish
to "close". - Change
write_record
to "write". - Remove
set
andnew_record
. All records are Perl hashes. - Adjust attributes as necessary.
METHODS & ATTRIBUTES
close
Shut down the ouput destination. This method may close files, disconnect from the database, or anything else required to cleanly terminate the output.
close receives one parameter - the ETL::Pipeline object.
The output destination is closed after the input source, at the end of the ETL process.
open
Prepare the output destination for use. It can open files, make database connections, or anything else required to access the destination.
open receives one parameter - the ETL::Pipeline object.
The output destination is opened before the input source, at the beginning of the ETL process.
write
Send a single record to the destination. The ETL process calls this method in a loop. It receives two parameters - the ETL::Pipeline object, and the current record as a Perl hash.
If your code encounters an error, write can call "error" in ETL::Pipeline with the error message. "error" in ETL::Pipeline automatically includes the record count with the error message. You should add any other troubleshooting information such as file names or key fields.
sub write {
my ($self, $etl, $record) = @_;
my $id = $record->{ID};
$etl->error( "Error message here for id $id" );
}
For fatal errors, I recommend using the croak
command from Carp.
SEE ALSO
ETL::Pipeline, ETL::Pipeline::Input, ETL::Pipeline::Output::Hash, ETL::Pipeline::Output::Perl, ETL::Pipeline::Output::UnitTest
AUTHOR
Robert Wohlfarth <robert.j.wohlfarth@vumc.org>
LICENSE
Copyright 2021 (c) Vanderbilt University
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.