NAME

Spreadsheet::Compare::Reader - Abstract Reader Base Class

SYNOPSIS

package Spreadsheet::Compare::MyReader;
use Mojo::Base 'Spreadsheet::Compare::Reader';

sub setup  {...}
sub fetch {...}

DESCRIPTION

Spreadsheet::Compare::Reader is an abstract base class for spreadsheet reader backends. Available reader classes in this distribution are

This module defines the methods and attributes that are used by a Spreadsheet::Compare::Reader subclass. The methods setup and fetch have to be overridden by the derived class and will croak otherwise.

When subclassing consider using Spreadsheet::Compare::Common for convenience.

ATTRIBUTES

If not stated otherwise, read write attributes can be set as options from the config file passed to Spreadsheet::Compare or spreadcomp.

can_chunk

(readonly) Will be set to a true value by the Reader module if the Reader supports chunking.

chunk

possible values: <column>
                 or
                 { column => <column>, regex => <regex> },
default: undef

Process the input in batches defined by the content of a column. When the regex form is used it has to have a capturing expression. The result will be used as identifier for the chunk. For example:

chunk:
    column: RECORD_NBR
    regex: '(\d{2})$'

will take the last two digits of the numbers in column RECORD_NBR, resulting in up to 100 batches. This is useful for very large files that do not fit entirely into memory (see "LIMITING MEMORY USAGE" in Spreadsheet::Compare). Reading for each batch will be handled sequentially to save memory.

All records will be read twice, first for creating the lookup info for the chunks and second for the actual data. This will significantly increase execution time.

chunker

(readonly) A reference to a generated subroutine that returns the chunk name for a record based on the settings from "chunk". This will be called from the Reader sublasses.

exhausted

(readonly) Will be true if the reader has no more records to read.

has_header

possible values: bool
default: undefined

Specify whether the file contains a header line.

(readonly) A reference to an array with the header names or (in case there is no named header) the zero based indexes.

identity

possible values: <array of column numbers or names>
default: []

Defines the identity to indentify and match a single record. If "has_header" is true, the header names can be used. If not, the column numbers (zero based) will be used as header names.

examples for config file entries:

  identity: [rec_nbr, rec_type]

  identity:
    - rec_nbr
    - rec_type

  identity: [3, 4, 17]

index

(readonly) 0 for the reader on the left and 1 for the reader on the right side of the comparison.

result

(readonly) A reference to an array with the currently read data after a call to fetch

side

(readonly) 'left' for the reader on the left and 'right' for the reader on the right side of the comparison.

side_name

possible values: <string>
default: ''

The name for the side of the comparison used for reporting.

skip

possible values: <key value pairs>
default: undef

Skip lines by column content. Keys must be column names (when the input has column headers, see "has_header") or numbers, the values are interpreted as regular expressions. A leading '!' negates the regex.

Example:

skip:
  Name: ^XYZ-
  Price: !\d

skipper

(readonly) A reference to a generated subroutine that returns true or false depending on whether the record should be skipped according to the value of "skip". This will be called from the Reader sublasses.

METHODS

The methods "setup" and "fetch" have to be overridden by derived classes.

fetch($size)

Fetch $size records from the source.

setup()

Will be called by Spreadsheet::Compare::Single at the start of a comparison. This is for setup tasks before handling the first fetch (eg. opening a file, reading the header, ...)