NAME
Spreadsheet::XLSX::Reader::LibXML::ParseExcelFormatStrings - Parser of XLSX format strings
SYNOPSYS
See the "SYNOPSYS" in Spreadsheet::XLSX::Reader::LibXML::FmtDefault
DESCRIPTION
To use the general package for excel parsing out of the box please review the documentation for Workbooks , Worksheets , and Cells
This is a general purpose Moose Role that will convert Excel format strings into Type::Tiny objects in order to implement the conversion defined by the format string. Excel defines the format strings as number conversions only (They do not act on text). Excel format strings can have up to four parts separated by semi-colons. The four parts are positive, zero, negative, and text. In Excel the text section is just a pass through. This is how excel handles dates earlier than 1900sh. This parser deviates from that for dates. Since this parser parses dates into a DateTime objects (and then potentially back to a differently formatted string) it also attempts to parse strings to DateTime objects if the cell has a date format applied. All other types of Excel number conversions still treat strings as a pass through.
To replace this module just build a Moose::Role that delivers the method parse_excel_format_string and get_defined_conversion. Then use it when building a replacement for Spreadsheet::XLSX::Reader::LibXML::FmtDefault.
The decimal (real number) to fractions conversion can be top heavy to build. If you are experiencing delays when reading values then this is another place to investigate. In order to get the most accurate answer this parser initially uses the continued fraction algorythm to calculate a possible fraction for the pased $decimal value with the setting of 20 max iterations and a maximum denominator width defined by the format string. If that does not resolve satisfactorily it then calculates an over/under numerator with decreasing denominators from the maximum denominator (based on the format string) all the way to the denominator of 2 and takes the most accurate result. There is no early-out set in this computation so if you reach this point for multi digit denominators it is computationally intensive. (Not that continued fractions are computationally so cheap.). However, doing the calculation this way generally yields the same result as Excel. In some few cases the result is more accurate. I was unable to duplicate the results from Excel exactly (or even come close otherwise). If you have a faster conversion then implemenation of the speed-up can be acheived by substituting the fraction coercion using "set_custom_formats( { $key => $conversion } )" in Spreadsheet::XLSX::Reader::LibXML::GetCell
requires
These are method(s) used by this role but not provided by the role. Any class consuming this role will not build without first providing these methods prior to loading this role.
get_excel_region
Definition: Used to return the two letter region ID. This ID is then used by DateTime::Format::Flexible to interpret date strings. Currently this method is provided by Spreadsheet::XLSX::Reader::LibXML::FmtDefault and (potentially) reset when that instance is loaded to the parser.
set_error
Definition: Used to set the error string in a shared error instance.
get_defined_excel_format
Definition: Used to return the default error string for a defined position.
See "defined_excel_translations" in Spreadsheet::XLSX::Reader::LibXML::FmtDefault
Primary Methods
These are the primary ways to use this Role. For additional ParseExcelFormatStrings options see the Attributes section.
parse_excel_format_string( $string, $name )
Definition: This is the method to convert Excel format strings into Type::Tiny objects with built in coercions. The type coercion objects are then used to convert unformatted values into formatted values using the assert_coerce method. Coercions built by this module allow for the format string to have up to four parts separated by semi-colons. These four parts correlate to four different data input ranges. The four parts are positive, zero, negative, and text. If three substrings are sent then the data input is split to (positive and zero), negative, and text. If two input types are sent the data input is split between numbers and text. One input type is a take all comers type with the exception of dates. When dates are built by this module it always adds a possible from-text conversion to process Excel pre-1900ish dates. This is because Excel does not record dates prior to 1900ish as numbers. All date unformatted values are then processed into and then potentially back out of DateTime objects. This requires "Chained Coercions" in Type::Tiny::Manual::Coercions. The two packages used for conversion to DateTime objects are DateTime::Format::Flexible and DateTimeX::Format::Excel.
Accepts: an Excel number format string and a conversion name stored in the Type::Tiny object. This package will auto-generate a name if none is given
Returns: a Type::Tiny object with type coercions and pre-filters set for each input type from the formatting string
Delegated to the workbook class: yes
get_defined_conversion( $position )
Definition: This is a helper method that combines the call to "get_defined_excel_format( $position )" in Spreadsheet::XLSX::Reader::LibXML::FmtDefault and parse_excel_format_string above in order to get all the information with one request.
Accepts: an Excel format position
Returns: a Type::Tiny object with type coercions and pre-filters set for each input type from the formatting string
Delegated to the workbook class: no
Attributes
Data passed to new when creating the Spreadsheet::XLSX::Reader::LibXML::FmtDefault instance. For modification of these attributes see the listed 'attribute methods'. For more information on attributes see Moose::Manual::Attributes. Most of these are not exposed to the top level of Spreadsheet::XLSX::Reader::LibXML.
epoch_year
Definition: This is the epoch year in the Excel sheet. It differentiates between Windows and Apple Excel implementations. For more information see DateTimeX::Format::Excel. It is generally (re)set by the workbook when the formatter instance is passed to the workbook.
Default: 1900
Range: 1900 or 1904
attribute methods Methods provided to adjust this attribute
get_epoch_year
Definition: returns the value of the attribute
Delegated to the workbook class: no
set_epoch_year
Definition: sets the value of the attribute
Delegated to the workbook class: no
datetime_dates
Definition: It may be that you desire the full DateTime object as output rather than the finalized datestring when converting unformatted date data to formatted date data. This attribute sets whether data coersions are built to do the full conversion or just to a DateTime object level. It is generally (re)set by the workbook when the formatter instance is passed to the workbook.
Default: 0 = unformatted values are coerced completely to date strings (1 = stop at DateTime)
attribute methods Methods provided to adjust this attribute.
get_date_behavior
Definition: returns the value of the attribute
Delegated to the workbook class: yes
set_date_behavior( $Bool )
Definition: sets the attribute value (only new coercions are affected)
Accepts: Boolean values
Delegated to the workbook class: yes
cache_formats
Definition: In order to save re-building the coercion each time they are used, the built coercions can be cached with the format string as the key. This attribute sets whether caching is turned on or not.
Default: 1 = caching is on
attribute methods Methods provided to adjust this attribute
get_cache_behavior
Definition: returns the value of the attribute
Delegated to the workbook class: inherited
set_cache_behavior
Definition: sets the value of the attribute
Range: Boolean 1 = cache formats, 0 = Don't cache formats
Delegated to the workbook class: inherited
SUPPORT
TODO
1. Attempt to merge _split_decimal_integer and _integer_and_decimal
AUTHOR
COPYRIGHT
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.
This software is copyrighted (c) 2014, 2015 by Jed Lund
DEPENDENCIES
version 0.77
Carp - confess
Type::Tiny - 1.000
DateTimeX::Format::Excel - 0.012
Clone - clone
Spreadsheet::XLSX::Reader::LibXML::Types
requires;
get_excel_region
set_error
get_defined_excel_format
SEE ALSO
Spreadsheet::ParseExcel - Excel 2003 and earlier
Spreadsheet::XLSX - 2007+
Spreadsheet::ParseXLSX - 2007+
All lines in this package that use Log::Shiras are commented out