NAME
Spreadsheet::Reader::Format::ParseExcelFormatStrings - Convert Excel format strings to code
SYNOPSYS
#!/usr/bin/env perl
package MyPackage;
use Moose;
use lib '../../../../lib';
extends 'Spreadsheet::Reader::Format::FmtDefault';
with 'Spreadsheet::Reader::Format::ParseExcelFormatStrings';
package main;
my $parser = MyPackage->new( epoch_year => 1904 );
my $conversion = $parser->parse_excel_format_string( '[$-409]dddd, mmmm dd, yyyy;@' );
print 'For conversion named: ' . $conversion->name . "\n";
for my $unformatted_value ( '7/4/1776 11:00.234 AM', 0.112311 ){
print "Unformatted value: $unformatted_value\n";
print "..coerces to: " . $conversion->assert_coerce( $unformatted_value ) . "\n";
}
###########################
# SYNOPSIS Screen Output
# 01: For conversion named: DATESTRING
# 02: Unformatted value: 7/4/1776 11:00.234 AM
# 03: ..coerces to: Thursday, July 04, 1776
# 04: Unformatted value: 0.112311
# 05: ..coerces to: Monday, January 01, 1900
###########################
DESCRIPTION
This is the parser that converts Excel custom format strings into code that can be used to transform values into output matching the form defined by the format string. The goal of this code is to support as much as possible the definition of excel custom format strings . If you find cases where this parser and the Excel definition or excecution differ please log a case in github.
This parser converts the format strings to Type::Tiny objects that have the appropriate built in coercions. Any replacement of this engine for use with Spreadsheet::Reader::Format and Spreadsheet::Reader::ExcelXML must output objects that have the methods 'display_name' and 'assert_coerce'. 'display_name' is used by the overall package to determine the cell type and should return a unique name containing an indication of the output data type with either 'DATE' or 'NUMBER' in the name. Otherwise the cell type is assumed to be text. Spreadsheet::Reader::ExcelXML uses 'assert_coerce' as the method to transform the raw value to the formatted value.
Excel format strings can have up to four parts separated by semi-colons. The four parts are positive, zero, negative, and text. In the Excel application the text section is just a pass through. This is how excel handles dates earlier than 1900sh . This parser deviates from that for dates. Since this parser provides code that parses Excel date numbers into a DateTime object (and then potentially back to a differently formatted string) it also attempts to parse strings to DateTime objects if the cell has a date format applied. All other types of Excel number conversions still treat strings as a pass through.
To replace this module just build a Moose::Role that delivers the method parse_excel_format_string and get_defined_conversion. See the documentation for the format interface to integrate into the package.
Caveat Utilitor
The decimal (real number) to fraction conversion implementation here is processing intensive. I erred on the side of accuracy over speed. While I tried my best to provide equivalent accuracy to the Excel output I was unable to duplicate the results in all cases. In those cases this package provides a more precise result than Excel. If you are experiencing delays when reading fraction formatted values then this package is a place to investigate. In order to get the most accurate answer this parser initially uses the continued fraction algorythm to calculate a possible fraction for the pased $decimal value with the setting of 20 max iterations and a maximum denominator width defined by the format string. If that does not resolve satisfactorily it then calculates -all- over/under numerators with decreasing denominators from the maximum denominator (based on the format string) all the way to the denominator of 2 and takes the most accurate result. There is no early-out available in this computation so if you reach this point for multi digit denominators things slow down. (Not that continued fractions are computationally so cheap.) However, dual staging the calculation this way yields either the same result as Excel or a more accurate result while providing a possible early out in the continued fraction portion. I was unable to even come close to Excel output otherwise. If you have a faster conversion or just want to opt out for specific cells without replacing this whole parser then use the worksheet method "set_custom_formats( $key => $format_object_or_string )" in Spreadsheet::Reader::ExcelXML::Worksheet. hint: $format_object_or_string = '@' will set a pass through.
requires
These are method(s) used by this role but not provided by the role. Any class consuming this role will not build without first providing this(ese) methods prior to loading this role.
get_defined_excel_format
Definition: Used to return the standard error string for a defined format position.
See "defined_excel_translations" in Spreadsheet::Reader::Format::FmtDefault
Methods
These are the methods provided by this role to whatever class or instance inherits this role. For additional ParseExcelFormatStrings options see the Attributes section.
parse_excel_format_string( $string, $name )
Definition: This is the method to convert Excel format strings into Type::Tiny objects with built in coercions. The type coercion objects are then used to convert unformatted values into formatted values using the assert_coerce method. Coercions built by this module allow for the format string to have up to four parts separated by semi-colons. These four parts correlate to four different data input ranges. The four parts are positive, zero, negative, and text. If three substrings are sent then the data input is split to (positive and zero), negative, and text. If two input types are sent the data input is split between numbers and text. One input type is a take all comers type with the exception of dates. When dates are built by this module it always adds a possible from-text conversion to process Excel pre-1900ish dates. This is because Excel does not record dates prior to 1900ish as numbers. All date unformatted values are then processed into and then potentially back out of DateTime objects. This requires "Chained Coercions" in Type::Tiny::Manual::Coercions. The two packages used for conversion to DateTime objects are DateTime::Format::Flexible and DateTimeX::Format::Excel.
Accepts: an Excel number format string and a conversion name stored in the Type::Tiny object. This package will auto-generate a name if none is given
Returns: a Type::Tiny object with type coercions and pre-filters set for each input type from the formatting string
get_defined_conversion( $position )
Definition: This is a helper method that combines the call to "get_defined_excel_format( $position )" in Spreadsheet::Reader::Format::FmtDefault and parse_excel_format_string above in order to get the final result with one call.
Accepts: an Excel default format position
Returns: a Type::Tiny object with type coercions and pre-filters set for each input type from the formatting string
Attributes
Data passed to new when creating a class or instance containing this role. For modification of these attributes see the listed 'attribute methods'. For more information on attributes see Moose::Manual::Attributes.
workbook_inst
Definition: This role works better if it has access to two workbook methods there are defaults built in if the workbook is not connected but the package no longer responds dynamically when that connection is broken. This instance is a way for this role to see those settings.
Required: No but it's really nice
Range: an instance of the Spreadsheet::Reader::ExcelXML class
attribute methods Methods provided to adjust this attribute
set_workbook_inst( $instance )
Definition: sets the workbook instance
delegated methods Methods provided from the object stored in the attribute
method_name => method_delegated_from_link
set_error( $error_string ) => "set_error" in Spreadsheet::Reader::ExcelXML
get_epoch_year => "get_epoch_year" in Spreadsheet::Reader::ExcelXML
cache_formats
Definition: In order to save re-building the coercion each time they are requested, the built coercions can be cached with the format string as the key. This attribute sets whether caching is turned on or not. In rare cases with lots of unique formats this would allow a reduction in RAM consumtion at the price of speed.
Range: Boolean
Default: 1 = caching is on
attribute methods Methods provided to adjust this attribute
get_cache_behavior
Definition: returns the state of the attribute
set_cache_behavior( $bool )
Definition: sets the value of the attribute to $Bool
Range: Boolean 1 = cache formats, 0 = Don't cache formats
datetime_dates
Definition: It may be that you desire the full DateTime object as output rather than the finalized datestring when converting unformatted date data to formatted date data. This attribute sets whether data coersions are built to do the full conversion or just to a DateTime object in return.
Default: 0 = unformatted values are coerced completely to date strings (1 = stop at DateTime objects)
attribute methods Methods provided to adjust this attribute.
get_date_behavior
Definition: returns the value of the attribute
set_date_behavior( $bool )
Definition: sets the attribute value (only new coercions are affected)
Accepts: Boolean values
Delegated to the workbook class: yes
european_first
Definition: This is a way to check for DD-MM-YY formatting of inbound (read from the file) date stringsprior to checking for MM-DD-YY. Since the package always checks both ways when the number is ambiguous the goal is to catch data where the substring for DD < 13 and assign it correctly.
Default: 0 = MM-DD-YY[YY] is tested first
attribute methods Methods provided to adjust this attribute
get_european_first
Definition: returns the value of the attribute
set_european_first( $bool )
Definition: sets the value of the attribute
Range: Boolean 0 = MM-DD-YY is tested first, 1 = DD-MM-YY is tested first
SUPPORT
TODO
1. Attempt to merge _split_decimal_integer and _integer_and_decimal
AUTHOR
COPYRIGHT
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.
This software is copyrighted (c) 2016 by Jed Lund
DEPENDENCIES
SEE ALSO
Spreadsheet::ParseExcel - Excel 2003 and earlier
Spreadsheet::XLSX - 2007+
Spreadsheet::ParseXLSX - 2007+
All lines in this package that use Log::Shiras are commented out