NAME
Spreadsheet::XLSX::Reader::LibXML::ParseExcelFormatStrings - Convert Excel format strings to code
SYNOPSYS
See the "SYNOPSYS" in Spreadsheet::XLSX::Reader::LibXML::FormatInterface
DESCRIPTION
This is the parser that converts Excel custom format strings into code that can be used to transform values into output matching the form defined by the format string. The goal of this code is to support as much as possible the definition of excel custom format strings . If you find cases where this parser and the Excel definition differ please log a case in github.
This parser converts the format strings to Type::Tiny objects that have the appropriate built in coercions. Any replacement of this engine for use with Spreadsheet::XLSX::Reader::LibXML must output code that has the methods 'display_name' and 'assert_coerce'. Display name is used by the overall package to determine the cell type and should return a name containing the work date or number for date and number types othewise the cell type is assumed to be text. The package uses 'assert_coerce' as the method to transform the raw value to the formatted value. Excel defines the format strings as number conversions only (They do not act on text). Excel format strings can have up to four parts separated by semi-colons. The four parts are positive, zero, negative, and text. In Excel the text section is just a pass through. This is how excel handles dates earlier than 1900sh. This parser deviates from that for dates. Since this parser provides code that parses Excel date numbers into a DateTime object (and then potentially back to a differently formatted string) it also attempts to parse strings to DateTime objects if the cell has a date format applied. All other types of Excel number conversions still treat strings as a pass through.
To replace this module just build a Moose::Role that delivers the method parse_excel_format_string and get_defined_conversion. See the documentation for the format interface to integrate it in the package formatter .
For an explanation of functionality for a fully built Formatter class see the documentation for Spreadsheet::XLSX::Reader::LibXML::FormatInterface. This will include the class Spreadsheet::XLSX::Reader::LibXML::FmtDefault.
Caveat Utilitor
The decimal (real number) to fraction conversion implementation here is processing intensive. I erred on the side of accuracy over speed. While I tried my best to provide equivalent accuracy to the Excel output I was unable to duplicate the results in all cases. In those cases this package provides a more precise result than Excel. If you are experiencing delays when reading fraction formatted values then this package is a place to investigate. In order to get the most accurate answer this parser initially uses the continued fraction algorythm to calculate a possible fraction for the pased $decimal value with the setting of 20 max iterations and a maximum denominator width defined by the format string. If that does not resolve satisfactorily it then calculates -all- over/under numerators with decreasing denominators from the maximum denominator (based on the format string) all the way to the denominator of 2 and takes the most accurate result. There is no early-out available in this computation so if you reach this point for multi digit denominators it is computationally intensive. (Not that continued fractions are computationally so cheap.) However, dual staging the calculation this way yields either the same result as Excel or a more accurate result while providing a possible early out in the continued fraction portion. I was unable to even come close to Excel output otherwise. If you have a faster conversion or just want to opt out for specific cells without replacing this whole parser then use the worksheet method "set_custom_formats( $key => $format_object_or_string )" in Spreadsheet::XLSX::Reader::LibXML::Worksheet. $format_object_or_string = '@' is a pass through.
requires
These are method(s) used by this role but not provided by the role. Any class consuming this role will not build without first providing this(ese) methods prior to loading this role.
get_defined_excel_format
Definition: Used to return the standard error string for a defined format position.
See "defined_excel_translations" in Spreadsheet::XLSX::Reader::LibXML::FmtDefault
Methods
These are the methods provided by this role to whatever class or instance inherits this role. For additional ParseExcelFormatStrings options see the Attributes section. See the documentation for "format_inst" in Spreadsheet::XLSX::Reader::LibXML to see which methods are delegated all the way to the workbook level.
parse_excel_format_string( $string, $name )
Definition: This is the method to convert Excel format strings into Type::Tiny objects with built in coercions. The type coercion objects are then used to convert unformatted values into formatted values using the assert_coerce method. Coercions built by this module allow for the format string to have up to four parts separated by semi-colons. These four parts correlate to four different data input ranges. The four parts are positive, zero, negative, and text. If three substrings are sent then the data input is split to (positive and zero), negative, and text. If two input types are sent the data input is split between numbers and text. One input type is a take all comers type with the exception of dates. When dates are built by this module it always adds a possible from-text conversion to process Excel pre-1900ish dates. This is because Excel does not record dates prior to 1900ish as numbers. All date unformatted values are then processed into and then potentially back out of DateTime objects. This requires "Chained Coercions" in Type::Tiny::Manual::Coercions. The two packages used for conversion to DateTime objects are DateTime::Format::Flexible and DateTimeX::Format::Excel.
Accepts: an Excel number format string and a conversion name stored in the Type::Tiny object. This package will auto-generate a name if none is given
Returns: a Type::Tiny object with type coercions and pre-filters set for each input type from the formatting string
get_defined_conversion( $position )
Definition: This is a helper method that combines the call to "get_defined_excel_format( $position )" in Spreadsheet::XLSX::Reader::LibXML::FmtDefault and parse_excel_format_string above in order to get the final result with one call.
Accepts: an Excel default format position
Returns: a Type::Tiny object with type coercions and pre-filters set for each input type from the formatting string
Attributes
Data passed to new when creating a class or instance containing this role. For modification of these attributes see the listed 'attribute methods'. For more information on attributes see Moose::Manual::Attributes. See the documentation for "format_inst" in Spreadsheet::XLSX::Reader::LibXML to see which attribute methods are delegated all the way to the workbook level.
workbook_inst
Definition: This role needs visibility to several attributes held at the workbook level. This instance is a way for this role to see those settings.
Required: Since the only way this package is used is when it is installed in the workbook object as an instance the workbook itself will add this instance when it recieves the formatter instance.
Range: an instance of the Spreadsheet::XLSX::Reader::LibXML class
attribute methods Methods provided to adjust this attribute
set_workbook_inst( $instance )
Definition: sets the workbook instance
delegated methods Methods provided from the object stored in the attribute
method_name => method_delegated_from_link
error => "error" in Spreadsheet::XLSX::Reader::LibXML
set_error( $error_string ) => "set_error" in Spreadsheet::XLSX::Reader::LibXML
clear_error => "clear_error" in Spreadsheet::XLSX::Reader::LibXML
get_epoch_year => "get_epoch_year" in Spreadsheet::XLSX::Reader::LibXML
cache_formats
Definition: In order to save re-building the coercion each time they are requested, the built coercions can be cached with the format string as the key. This attribute sets whether caching is turned on or not.
Rang: Boolean
Default: 1 = caching is on
attribute methods Methods provided to adjust this attribute
get_cache_behavior
Definition: returns the state of the attribute
set_cache_behavior( $Bool )
Definition: sets the value of the attribute to $Bool
Range: Boolean 1 = cache formats, 0 = Don't cache formats
datetime_dates
Definition: It may be that you desire the full DateTime object as output rather than the finalized datestring when converting unformatted date data to formatted date data. This attribute sets whether data coersions are built to do the full conversion or just to a DateTime object in return.
Default: 0 = unformatted values are coerced completely to date strings (1 = stop at DateTime)
attribute methods Methods provided to adjust this attribute.
get_date_behavior
Definition: returns the value of the attribute
set_date_behavior( $Bool )
Definition: sets the attribute value (only new coercions are affected)
Accepts: Boolean values
Delegated to the workbook class: yes
european_first
Definition: This is a way to check for DD-MM-YY formatting of string dates prior to checking for MM-DD-YY. Since this checks both ways the goal is to catch ambiguous data where the substring for DD < 13 and assign it correctly.
Default: 0 = MM-DD-YY[YY] is tested first
attribute methods Methods provided to adjust this attribute
get_european_first
Definition: returns the value of the attribute
set_european_first( $Bool )
Definition: sets the value of the attribute
Range: Boolean 0 = MM-DD-YY is tested first, 1 = DD-MM-YY is tested first
SUPPORT
TODO
1. Attempt to merge _split_decimal_integer and _integer_and_decimal
AUTHOR
COPYRIGHT
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.
This software is copyrighted (c) 2014, 2015 by Jed Lund
DEPENDENCIES
version 0.77
Carp - confess
Type::Tiny - 1.000
DateTimeX::Format::Excel - 0.012
Clone - clone
Spreadsheet::XLSX::Reader::LibXML::Types
requires;
get_excel_region
set_error
get_defined_excel_format
SEE ALSO
Spreadsheet::ParseExcel - Excel 2003 and earlier
Spreadsheet::XLSX - 2007+
Spreadsheet::ParseXLSX - 2007+
All lines in this package that use Log::Shiras are commented out