NAME
ETL::Pipeline::Output::Memory - Store records in memory
SYNOPSIS
# Save the records into a giant list.
use ETL::Pipeline;
ETL::Pipeline->new( {
input => ['UnitTest'],
mapping => {First => 'Header1', Second => 'Header2'},
output => ['Memory']
} )->process;
# Save the records into a hash, keyed by an identifier.
use ETL::Pipeline;
ETL::Pipeline->new( {
input => ['UnitTest'],
mapping => {First => 'Header1', Second => 'Header2'},
output => ['Memory', key => 'First']
} )->process;
DESCRIPTION
ETL::Pipeline::Output::Memory writes the record into a Perl data structure, in memory. The records can be accessed later in the same script. This output destination comes in useful when processing multiple input files.
ETL::Pipeline::Output::Memory offers two ways of storing the records - in a hash or in a list. ETL::Pipeline::Output::Memory always put records into the list. If the "key" attribute is set, then ETL::Pipeline::Output::Memory also saves records into the hash.
The hash can be used for faster look-up. Use "key" when the record contains an identifier.
METHODS & ATTRIBUTES
Arguments for "output" in ETL::Pipeline
key
Optional. If you want to store the records in a hash, then this is the field name whose value becomes the key. When set, records go into "hash".
If you don't specify a key, then records are stored in an unsorted array - "list".
Attributes
hash
Hash reference used when "key" is set. The key is the value of the field identified by "key". The value is an array reference. The array contains all of the records with that same key.
list
list is an array reference that stores records. The records are saved in same order as they are read from the input source. Each list element is a hash reference (the record).
list always has a complete set of records, whether "key" is set or not.
Methods
close
This method doesn't do anything. There's nothing to close or shut down.
number_of_ids
Count of unique identifiers. This may not be the same as the number of records. One key may have multiple records.
number_of_ids only works if the "key" attribute was set.
number_of_records
Count of records currently in storage.
open
This method doesn't do anything. There's nothing to open or setup.
records
Returns a list of all the records currently in storage. The list contains hash references - one reference for each record.
with_id
with_id returns a list of records for a given key. Pass in a value for the key and with_id returns an array reference of records.
with_id only works if the "key" attribute was set.
write
Save the current record into memory. Your script can access the records after calling "process" in ETL::Pipeline like this - $etl-
output->records>. Both "records" and "with_id" can be used.
If "key" is set, write saves the record in both "hash" and "list". We're storing a reference, not a copy, so there's very little cost. And it allows methods such as "number_of_records" to work.
WARNING: This method stores a reference to the original record. If the input source re-uses the hash or embedded references, it will update all of the currently stored values too. ETL::Pipeline::Output::Memory does not make a copy.
SEE ALSO
ETL::Pipeline, ETL::Pipeline::Output
AUTHOR
Robert Wohlfarth <robert.j.wohlfarth@vumc.org>
LICENSE
Copyright 2021 (c) Vanderbilt University
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.