NAME
Spreadsheet::Compare::Single - Module for comparing two spreadsheet datasets
SYNOPSIS
use Spreadsheet::Compare::Single;
my $single = Spreadsheet::Compare::Single->new(%args);
my $result = $single->compare();
DESCRIPTION
Spreadsheet::Compare::Single analyses differences between two similar record sets according to a defined configuration set.
ATTRIBUTES
All attributes return the object on setting and the current value if called without parameter.
$single->attr($value);
my $value = $single->attr;
They will usually be set by Spreadsheet::Compare, after reading the values from a config file.
allow_duplicates
possible values: 0|1
default: 0
Try to match identical records even when a unique identity cannot be constructed. This can significantly increase compare times on large datasets.
below_limit_is_equal
possible values: 0|1
default: 0
Normally differences that are inside configured limits will still be counted as differences (only marked visually as low priority). Setting below_limit_is_equal to a true value will result in the record counted as equal.
convert_numbers
possible values: 0|1
default: 0
Convert content that is treated as a numerical value to an actual numeric value (by simply adding 0). This is e.g. handy for having numerical data in Excel output instead of strings that look like numbers. This will not affect the optional 'All' report that can be created with the report_all_data option.
decimal_separator
possible values: <string>
default: '.'
Decimal separator for numerical values
diff_relative
possible values: <list of column names>
default: []
Report the relative instead of the absolute difference if "report_diff_row" in Spreadsheet::Compare::Reporter is set to a true value.
Example (as YAML config):
diff_relative: [2,3,4]
or
diff_relative:
- Price
- Quantity
digital_grouping_symbol
possible values: <string>
default: ','
Digital grouping symbol for numerical values
fetch_size
possible values: <integer>
default: 1000
When "is_sorted" is set, "fetch_size" determines the number of records fetched into memory at a time.
fetch_limit
possible values: <integer>
default: 0
When "is_sorted" is set, "fetch_limit" determines the number of fetches (of size "fetch_size") before the comparison stops. This is useful during setup with large datasets where you may have columns that are different for every row and that you better add to the ignore list. Just remember to unset this value once you are done.
ignore
possible values: <list of columns>
default: empty list
Columns to ignore while comparing data. If "header" in Spreadsheet::Compare::Reader is set the column names have to be used. Else use the zero based column number.
ignore_strings
possible values: 0|1
default: 0
Only compare numerical data. This skips comparisons where both sides are not considered to be numerical values. This depends on the setting for "decimal_separator" and "digital_grouping_symbol"
is_sorted
possible values: 0|1
default: 0
Assume data is sorted by identity. This is needed for fetching data in smaller batches (see "fetch_size") to use less memory.
left
possible values: <string>
default: 'left'
Name for the input on the left side of the comparison. Used for reporting.
limit_abs
possible values: <number or key/value pairs>
default: 0
Single value or one entry per column for specifying absolute tolorance intervals. Differences inside the tolerance interval will be counted and reported separately from differences outside of it. The default value of 0 means no tolerance limit, the value 'none' skips the limit check with the side effect that the deviation will not be considered in statistics output (column with highest absolute deviation). The special key '__default__' can be used to set a default for all (numerical) columns, and subsequently setting a different limit on selected columns.
Example (as YAML config):
limit_abs: 0.01
or
limit_abs:
__default__: 0.01
Price: 0.0001
Quantity: 1
Size: none
limit_rel
possible values: <number or keys/values>
default: undef
Single value or one entry per column for specifying relative tolerance intervals (decimal value, not a percentage). Differences inside the tolerance interval will be counted and reported separately from differences outside of it. The default value of 0 means no tolerance limit, the value 'none' skips the limit check with the side effect that the deviation will not be considered in statistics output (column with highest relative deviation). The special key '__default__' can be used to set a default for all (numerical) columns, and subsequently setting a different limit on selected columns.
limit_rel: 0.01
or
limit_rel:
__default__: 0.1
Price: 0.01
Quantity: 1
Size: none
readers
This attribute cannot be set from a config file.
possible values: <list of exactly 2 Reader objects>
default: []
The readers have to be two objects of Spreadsheet::Compare::Reader subclasses representing the left and the right side of the comparison.
right
possible values: <string>
default: 'right'
Name for the input on the right side of the comparison. Used for reporting.
title
possible values: <string>
default: ''
A title for the comparison
CONSTRUCTOR
new
my $single = Spreadsheet::Compare::Single->new(%attributes);
my $single = Spreadsheet::Compare::Single->new(\%attributes);
or
my $single = Spreadsheet::Compare::Single->new
->title('Regression Test 1')
->readers([$r_left, $r_right]);
Construct a new Spreadsheet::Compare::Single object. All comparison attributes can be given to the constructor or set individualy via their chainable set methods.
METHODS
compare
Run all configured tests of the run configuration. Return a hashref with counters:
{
add => <number of additional records on the right>,
diff => <number of found differences>,
dup => <number of duplicate rows (maximum of left and right)>,
left => <number of records on the left>,
limit => <number of record with differences below set ste limits>,
miss => <number of records missing on the right>,
right => <number of records on the right>,
same => <number of identical records>,
}
Before running compare the readers have to be set.
EVENTS
Spreadsheet::Compare::Single is a Mojo::EventEmitter.
The reporting events correspond to the methods that are implemented by sublasses of Spreadsheet::Compare::Reporter. Spreadsheet::Compare will subscribe to the events and call the methods.
Spreadsheet::Compare::Single emits the following events
add_stream
$single->on(add_stream => sub ($obj, $name) {
say "new stream $name for ", $obj->title;
});
Reporting event. Signaling that a new reporting stream should be created and will be later referenced for reporting data lines.
The possible stream names are 'Differences', 'Missing', 'Additional', 'Duplicates' and 'All'.
after_fetch
$single->on(after_fetch => sub ($obj) {
say "next fetch for ", $obj->title;
});
Emitted directly after a fetch from the readers.
counters
require Data::Dumper;
$single->on(counters => sub ($obj, $counters) {
say "next fetch for ", $obj->title, ":", Dumper($counters);
});
Emitted for every handled record. Don't rely on the numbers of calls, but on the content of the %$counters hash if you want to know how many lines where actually read from the readers. This can be used for progress reporting.
final_counters
require Data::Dumper;
$single->on(final_counters => sub ($obj, $counters) {
say "next fetch for $title:", Dumper($counters);
});
Emitted after completing a single comparison.
mark_header
$single->on(mark_header => sub ($obj, 'Differences', $mask) {
# mark columns
});
Reporting event. Emitted after completing a single comparison with a mask describing which columns had differences (key:column_index, value:false/true)
write_fmt_row
$single->on(write_fmt_row => sub ($obj, 'Differences', $record) {
# write record to stream;
});
Reporting event. Write a formatted record to the 'Differences' output stream The record is an Spreadsheet::Compare::Record and will contain information about the differences found (see "limit_mask" in Spreadsheet::Compare::Record).
write_header
$single->on(write_header => sub ($obj, $stream) {
# write header for stream;
});
Reporting event. Write the header for an output stream.
write_row
$single->on(write_row => sub ($obj, $stream, $record) {
# write record to stream;
});
Reporting event. Write a default formatted record to an output stream. The record is an Spreadsheet::Compare::Record