NAME
Spreadsheet::Compare::Reader - Abstract Reader Base Class
SYNOPSIS
package Spreadsheet::Compare::MyReader;
use Mojo::Base 'Spreadsheet::Compare::Reader';
sub setup {...}
sub fetch {...}
DESCRIPTION
Spreadsheet::Compare::Reader is an abstract base class for spreadsheet reader backends. Available reader classes in this distribution are
Spreadsheet::Compare::Reader::CSV for CSV files
Spreadsheet::Compare::Reader::DB for Databases
Spreadsheet::Compare::Reader::FIX for fixed size column files
Spreadsheet::Compare::Reader::WB for various spreadsheet formats like XLSX, ODS, ...
This module defines the methods and attributes that are used by a Spreadsheet::Compare::Reader subclass. The methods setup and fetch have to be overridden by the derived class and will croak otherwise.
When subclassing consider using Spreadsheet::Compare::Common for convenience.
ATTRIBUTES
If not stated otherwise, read write attributes can be set as options from the config file passed to Spreadsheet::Compare or spreadcomp.
can_chunk
(readonly) Will be set to a true value by the Reader module if the Reader supports chunking.
chunk
possible values: <column>
or
{ column => <column>, regex => <regex> },
default: undef
Process the input in batches defined by the content of a column. When the regex form is used it has to have a capturing expression. The result will be used as identifier for the chunk. For example:
chunk:
column: RECORD_NBR
regex: '(\d{2})$'
will take the last two digits of the numbers in column RECORD_NBR, resulting in up to 100 batches. This is useful for very large files that do not fit entirely into memory (see "LIMITING MEMORY USAGE" in Spreadsheet::Compare). Reading for each batch will be handled sequentially to save memory.
All records will be read twice, first for creating the lookup info for the chunks and second for the actual data. This will significantly increase execution time.
chunker
(readonly) A reference to a generated subroutine that returns the chunk name for a record based on the settings from "chunk". This will be called from the Reader sublasses.
exhausted
(readonly) Will be true if the reader has no more records to read.
has_header
possible values: bool
default: undefined
Specify whether the file contains a header line.
header
(readonly) A reference to an array with the header names or (in case there is no named header) the zero based indexes.
identity
possible values: <array of column numbers or names>
default: []
Defines the identity to indentify and match a single record. If "has_header" is true, the header names can be used. If not, the column numbers (zero based) will be used as header names.
examples for config file entries:
identity: [rec_nbr, rec_type]
identity:
- rec_nbr
- rec_type
identity: [3, 4, 17]
index
(readonly) 0 for the reader on the left and 1 for the reader on the right side of the comparison.
result
(readonly) A reference to an array with the currently read data after a call to fetch
side
(readonly) 'left' for the reader on the left and 'right' for the reader on the right side of the comparison.
side_name
possible values: <string>
default: ''
The name for the side of the comparison used for reporting.
skip
possible values: <key value pairs>
default: undef
Skip lines by column content. Keys must be column names (when the input has column headers, see "has_header") or numbers, the values are interpreted as regular expressions. A leading '!' negates the regex.
Example:
skip:
Name: ^XYZ-
Price: !\d
skipper
(readonly) A reference to a generated subroutine that returns true or false depending on whether the record should be skipped according to the value of "skip". This will be called from the Reader sublasses.
METHODS
The methods "setup" and "fetch" have to be overridden by derived classes.
fetch($size)
Fetch $size records from the source.
setup()
Will be called by Spreadsheet::Compare::Single at the start of a comparison. This is for setup tasks before handling the first fetch (eg. opening a file, reading the header, ...)