NAME
Spreadsheet::XLSX::Reader::LibXML::ParseExcelFormatStrings - Parser of XLSX format strings
SYNOPSYS
#!/usr/bin/env perl
package MyPackage;
use Moose;
with 'Spreadsheet::XLSX::Reader::LibXML::FmtDefault';
# call 'with' a second time to ensure that the prior methods are recorded
with 'Spreadsheet::XLSX::Reader::LibXML::ParseExcelFormatStrings';
package main;
my $parser = MyPackage->new( epoch_year => 1904 );
my $conversion = $parser->parse_excel_format_string( '[$-409]dddd, mmmm dd, yyyy;@' );
print 'For conversion named: ' . $conversion->name . "\n";
for my $unformatted_value ( '7/4/1776 11:00.234 AM', 0.112311 ){
print "Unformatted value: $unformatted_value\n";
print "..coerces to: " . $conversion->assert_coerce( $unformatted_value ) . "\n";
}
###########################
# SYNOPSIS Screen Output
# 01: For conversion named: DATESTRING_0
# 02: Unformatted value: 7/4/1776 11:00.234 AM
# 03: ..coerces to: Thursday, July 04, 1776
# 04: Unformatted value: 0.112311
# 05: ..coerces to: Friday, January 01, 1904
###########################
DESCRIPTION
This documentation is written to explain ways to use this module when writing your own excel parser or extending this package. To use the general package for excel parsing out of the box please review the documentation for Workbooks , Worksheets , and Cells
This is a general purpose Moose Role that will convert Excel format strings into Type::Tiny objects in order to implement the conversion defined by the format string. Excel defines the format strings as number conversions only (They do not act on text). Excel format strings can have up to four parts separated by semi-colons. The four parts are positive, zero, negative, and text. In Excel the text section is just a pass through. This is how excel handles dates earlier than 1900sh. This parser deviates from that for dates. Since this parser parses dates into a DateTime objects (and then potentially back to a differently formatted string) it also attempts to parse strings to DateTime objects if the cell has a date format applied. All other types of Excel number conversions still treat strings as a pass through.
To replace this module just build a Moose::Role that delivers the method parse_excel_format_string Then set the attribute "format_string_parser" in Spreadsheet::XLSX::Reader::LibXML with the new role name.
The decimal (real number) to fractions conversion can be top heavy to build. If you are experiencing delays when reading values then this is another place to investigate. In order to get the most accurate answer this parser initially uses the continued fraction algorythm to calculate a possible fraction for the pased $decimal value with the setting of 20 max iterations and a maximum denominator width defined by the format string. If that does not resolve satisfactorily it then calculates an over/under numerator with decreasing denominators from the maximum denominator (based on the format string) all the way to the denominator of 2 and takes the most accurate result. There is no early-out set in this computation so if you reach this point for multi digit denominators it is computationally intensive. (Not that continued fractions are computationally so cheap.). However, doing the calculation this way generally yields the same result as Excel. In some few cases the result is more accurate. I was unable to duplicate the results from Excel exactly (or even come close otherwise). The speed-up can be acheived by substituting the fraction coercion using set_custom_formats
requires
These are method(s) used by this role but not provided by the role. Any class consuming this role will not build without first providing these methods prior to loading this role.
get_excel_region
Definition: Used to return the two letter region ID. This ID is then used by DateTime::Format::Flexible to interpret date strings. Currently this method is provided by Spreadsheet::XLSX::Reader::LibXML::FmtDefault when it is loaded as a role of Spreadsheet::XLSX::Reader::LibXML::Styles.
Primary Methods
These are the primary ways to use this Role. For additional ParseExcelFormatStrings options see the Attributes section.
parse_excel_format_string( $string )
Definition: This is the method to convert Excel format strings into Type::Tiny objects with built in coercions. The type coercion objects are then used to convert unformatted values into formatted values using the assert_coerce method. Coercions built by this module allow for the format string to have up to four parts separated by semi-colons. These four parts correlate to four different data input ranges. The four parts are positive, zero, negative, and text. If three substrings are sent then the data input is split to (positive and zero), negative, and text. If two input types are sent the data input is split between numbers and text. One input type is a take all comers type with the exception of dates. When dates are built by this module it always adds a possible from-text conversion to process Excel pre-1900ish dates. This is because Excel does not record dates prior to 1900ish as numbers. All date unformatted values are then processed into and then potentially back out of DateTime objects. This requires "Chained Coercions" in Type::Tiny::Manual::Coercions. The two packages used for conversion to DateTime objects are DateTime::Format::Flexible and DateTimeX::Format::Excel.
Accepts: an Excel number format string
Returns: a Type::Tiny object with type coercions and pre-filters set for each input type from the formatting string
Attributes
Data passed to new when creating the Styles instance. For modification of these attributes see the listed 'attribute methods'. For more information on attributes see Moose::Manual::Attributes. Most of these are not exposed to the top level of Spreadsheet::XLSX::Reader::LibXML.
epoch_year
Definition: This is the epoch year in the Excel sheet. It differentiates between Windows and Apple Excel implementations. For more information see DateTimeX::Format::Excel. It is generally set by the workbook when the workbook first opens.
Default: 1900
attribute methods Methods provided to adjust this attribute
get_epoch_year
Definition: returns the value of the attribute
datetime_dates
Definition: It may be that you desire the full DateTime object as output rather than the finalized datestring when converting unformatted data to formatted date data. This attribute sets whether data coersions are built to do the full conversion or just to a DateTime object level.
Default: 0 = unformatted values are coerced completely to date strings (1 = stop at DateTime)
attribute methods Methods provided to adjust this attribute. They are both delegated all the way up to the worksbook instance.
get_date_behavior
Definition: returns the value of the attribute
set_date_behavior( $Bool )
Definition: sets the attribute value (only new coercions are affected)
Accepts: Boolean values
cache_formats
Definition: In order to save re-building the coercion each time it is required, built coercions can be cached with the format string as the key. This attribute sets whether caching is turned on or not.
Default: 1 = caching is on
attribute methods Methods provided to adjust this attribute
get_cache_behavior
Definition: returns the value of the attribute
set_cache_behavior
Definition: sets the value of the attribute
Range: Boolean 1 = cache formats, 0 = Don't cache formats
format_cash
Definition: This is the format cache described in cache_formats. It stores pre-built formats for re-use.
Default: {}
attribute methods Methods provided to adjust this attribute
has_cached_format( $format_string )
Definition: returns true if the $format_string has a pre-built coersion already stored
set_cached_format( $format_string => $coercion )
Definition: sets the coersion object for $format_string key. Consider working with "custom_formats" in Spreadsheet::XLSX::Reader::LibXML::Worksheet to manipulate format assignment instead of this method.
get_cached_format( $format_string )
Definition: gets the coersion object stored against the $format_string key
SUPPORT
TODO
1. Attempt to merge _split_decimal_integer and _integer_and_decimal
AUTHOR
COPYRIGHT
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.
This software is copyrighted (c) 2014, 2015 by Jed Lund
DEPENDENCIES
version 0.77
Carp - confess
Type::Tiny - 1.000
DateTimeX::Format::Excel - 0.012
Clone - clone
Spreadsheet::XLSX::Reader::LibXML::Types
requires;
get_excel_region
SEE ALSO
Spreadsheet::ParseExcel - Excel 2003 and earlier
Spreadsheet::XLSX - 2007+
Spreadsheet::ParseXLSX - 2007+
All lines in this package that use Log::Shiras are commented out