NAME

Spreadsheet::XLSX::Reader::LibXML::ParseExcelFormatStrings - Convert Excel format strings to code

SYNOPSYS

See the "SYNOPSYS" in Spreadsheet::XLSX::Reader::LibXML::FormatInterface

DESCRIPTION

This is the parser that converts Excel custom format strings into code that can be used to transform values into output matching the form defined by the format string. The goal of this code is to support as much as possible the definition of excel custom format strings . If you find cases where this parser and the Excel definition differ please log a case in github.

This parser converts the format strings to Type::Tiny objects that have the appropriate built in coercions. Any replacement of this engine for use with Spreadsheet::XLSX::Reader::LibXML must output code that has the methods 'display_name' and 'assert_coerce'. Display name is used by the overall package to determine the cell type and should return a name containing the work date or number for date and number types othewise the cell type is assumed to be text. The package uses 'assert_coerce' as the method to transform the raw value to the formatted value. Excel defines the format strings as number conversions only (They do not act on text). Excel format strings can have up to four parts separated by semi-colons. The four parts are positive, zero, negative, and text. In Excel the text section is just a pass through. This is how excel handles dates earlier than 1900sh. This parser deviates from that for dates. Since this parser provides code that parses Excel date numbers into a DateTime object (and then potentially back to a differently formatted string) it also attempts to parse strings to DateTime objects if the cell has a date format applied. All other types of Excel number conversions still treat strings as a pass through.

To replace this module just build a Moose::Role that delivers the method parse_excel_format_string and get_defined_conversion. See the documentation for the format interface to integrate it in the package formatter .

For an explanation of functionality for a fully built Formatter class see the documentation for Spreadsheet::XLSX::Reader::LibXML::FormatInterface. This will include the class Spreadsheet::XLSX::Reader::LibXML::FmtDefault.

Caveat Utilitor

The decimal (real number) to fraction conversion implementation here is processing intensive. I erred on the side of accuracy over speed. While I tried my best to provide equivalent accuracy to the Excel output I was unable to duplicate the results in all cases. In those cases this package provides a more precise result than Excel. If you are experiencing delays when reading fraction formatted values then this package is a place to investigate. In order to get the most accurate answer this parser initially uses the continued fraction algorythm to calculate a possible fraction for the pased $decimal value with the setting of 20 max iterations and a maximum denominator width defined by the format string. If that does not resolve satisfactorily it then calculates -all- over/under numerators with decreasing denominators from the maximum denominator (based on the format string) all the way to the denominator of 2 and takes the most accurate result. There is no early-out available in this computation so if you reach this point for multi digit denominators it is computationally intensive. (Not that continued fractions are computationally so cheap.) However, dual staging the calculation this way yields either the same result as Excel or a more accurate result while providing a possible early out in the continued fraction portion. I was unable to even come close to Excel output otherwise. If you have a faster conversion or just want to opt out for specific cells without replacing this whole parser then use the worksheet method "set_custom_formats( $key => $format_object_or_string )" in Spreadsheet::XLSX::Reader::LibXML::Worksheet. $format_object_or_string = '@' is a pass through.

requires

These are method(s) used by this role but not provided by the role. Any class consuming this role will not build without first providing this(ese) methods prior to loading this role.

get_defined_excel_format

Methods

These are the methods provided by this role to whatever class or instance inherits this role. For additional ParseExcelFormatStrings options see the Attributes section. See the documentation for "format_inst" in Spreadsheet::XLSX::Reader::LibXML to see which methods are delegated all the way to the workbook level.

parse_excel_format_string( $string, $name )

    Definition: This is the method to convert Excel format strings into Type::Tiny objects with built in coercions. The type coercion objects are then used to convert unformatted values into formatted values using the assert_coerce method. Coercions built by this module allow for the format string to have up to four parts separated by semi-colons. These four parts correlate to four different data input ranges. The four parts are positive, zero, negative, and text. If three substrings are sent then the data input is split to (positive and zero), negative, and text. If two input types are sent the data input is split between numbers and text. One input type is a take all comers type with the exception of dates. When dates are built by this module it always adds a possible from-text conversion to process Excel pre-1900ish dates. This is because Excel does not record dates prior to 1900ish as numbers. All date unformatted values are then processed into and then potentially back out of DateTime objects. This requires "Chained Coercions" in Type::Tiny::Manual::Coercions. The two packages used for conversion to DateTime objects are DateTime::Format::Flexible and DateTimeX::Format::Excel.

    Accepts: an Excel number format string and a conversion name stored in the Type::Tiny object. This package will auto-generate a name if none is given

    Returns: a Type::Tiny object with type coercions and pre-filters set for each input type from the formatting string

get_defined_conversion( $position )

Attributes

Data passed to new when creating a class or instance containing this role. For modification of these attributes see the listed 'attribute methods'. For more information on attributes see Moose::Manual::Attributes. See the documentation for "format_inst" in Spreadsheet::XLSX::Reader::LibXML to see which attribute methods are delegated all the way to the workbook level.

workbook_inst

cache_formats

    Definition: In order to save re-building the coercion each time they are requested, the built coercions can be cached with the format string as the key. This attribute sets whether caching is turned on or not.

    Rang: Boolean

    Default: 1 = caching is on

    attribute methods Methods provided to adjust this attribute

      get_cache_behavior

        Definition: returns the state of the attribute

      set_cache_behavior( $Bool )

        Definition: sets the value of the attribute to $Bool

        Range: Boolean 1 = cache formats, 0 = Don't cache formats

datetime_dates

    Definition: It may be that you desire the full DateTime object as output rather than the finalized datestring when converting unformatted date data to formatted date data. This attribute sets whether data coersions are built to do the full conversion or just to a DateTime object in return.

    Default: 0 = unformatted values are coerced completely to date strings (1 = stop at DateTime)

    attribute methods Methods provided to adjust this attribute.

      get_date_behavior

        Definition: returns the value of the attribute

      set_date_behavior( $Bool )

        Definition: sets the attribute value (only new coercions are affected)

        Accepts: Boolean values

        Delegated to the workbook class: yes

european_first

    Definition: This is a way to check for DD-MM-YY formatting of string dates prior to checking for MM-DD-YY. Since this checks both ways the goal is to catch ambiguous data where the substring for DD < 13 and assign it correctly.

    Default: 0 = MM-DD-YY[YY] is tested first

    attribute methods Methods provided to adjust this attribute

      get_european_first

        Definition: returns the value of the attribute

      set_european_first( $Bool )

        Definition: sets the value of the attribute

        Range: Boolean 0 = MM-DD-YY is tested first, 1 = DD-MM-YY is tested first

SUPPORT

TODO

    1. Attempt to merge _split_decimal_integer and _integer_and_decimal

AUTHOR

Jed Lund
jandrew@cpan.org

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

This software is copyrighted (c) 2014, 2015 by Jed Lund

DEPENDENCIES

SEE ALSO