NAME

Spreadsheet::XLSX::Reader::LibXML::XMLReader::WorksheetToRow - Pull rows out of worksheet xml files

SYNOPSIS

See t\Spreadsheet\XLSX\Reader\LibXML02-worksheet_to_row.t

DESCRIPTION

This documentation is written to explain ways to use this module when writing your own excel parser. To use the general package for excel parsing out of the box please review the documentation for Workbooks, Worksheets, and Cells

This module provides the basic connection to individual worksheet files (not chartsheets) for parsing xlsx workbooks and coalating shared strings data to cell data. It does not provide a way to connect to chartsheets. It does not provide the final view of a given cell. The final view of the cell is collated with the role (Interface) Spreadsheet::XLSX::Reader::LibXML::Worksheet. This reader extends the base reader class Spreadsheet::XLSX::Reader::LibXML::XMLReader. The functionality provided by those modules is not explained here.

For now this module reads each full row (with values) into a Spreadsheet::XLSX::Reader::LibXML::Row instance. It stores only the currently read row and the previously read row. Exceptions to this are the start of read and end of read. For start of read only the current row is available with the assumption that all prior implied rows are empty. When a position past the end of the sheet is called both current and prior rows are cleared and an 'EOF' or undef value is returned. See "file_boundary_flags" in Spreadsheet::XLSX::Reader::LibXML for more details. This allows for storage of row general formats by row and where a requested cell falls in a row without values that the empty state can be determined without rescanning the file.

All positions (row and column places and integers) at this level are stored and returned in count from one mode!

Modification of this module probably means extending a different reader or using other roles for implementation of the class. Search for

extends	'Spreadsheet::XLSX::Reader::LibXML::XMLReader';

To replace the base reader. Search for the method 'worksheet' in Spreadsheet::XLSX::Reader::LibXML and the variable '$parser_modules' to replace this whole thing.

Attributes

Data passed to new when creating an instance. For access to the values in these attributes see the listed 'attribute methods'. For general information on attributes see Moose::Manual::Attributes. For ways to manage the instance when opened see the Public Methods.

is_hidden

    Definition: This is set when the sheet is read from the sheet metadata level indicating if the sheet is hidden

    Default: none

    Range: (1|0)

    attribute methods Methods provided to adjust this attribute

      is_sheet_hidden

        Definition: return the attribute value

workbook_instance

_sheet_min_col

    Definition: This is the minimum column in the sheet with data or formatting. For this module it is pulled from the xml file at worksheet/dimension:ref = "upperleft:lowerright"

    Range: an integer

    attribute methods Methods provided to adjust this attribute

      _set_min_col

        Definition: sets the attribute value

      _min_col

        Definition: returns the attribute value

      has_min_col

        Definition: attribute predicate

_sheet_min_row

    Definition: This is the minimum row in the sheet with data or formatting. For this module it is pulled from the xml file at worksheet/dimension:ref = "upperleft:lowerright"

    Range: an integer

    attribute methods Methods provided to adjust this attribute

      _set_min_row

        Definition: sets the attribute value

      _min_row

        Definition: returns the attribute value

      has_min_row

        Definition: attribute predicate

_sheet_max_col

    Definition: This is the maximum column in the sheet with data or formatting. For this module it is pulled from the xml file at worksheet/dimension:ref = "upperleft:lowerright"

    Range: an integer

    attribute methods Methods provided to adjust this attribute

      _set_max_col

        Definition: sets the attribute value

      _max_col

        Definition: returns the attribute value

      has_max_col

        Definition: attribute predicate

_sheet_max_row

    Definition: This is the maximum row in the sheet with data or formatting. For this module it is pulled from the xml file at worksheet/dimension:ref = "upperleft:lowerright"

    Range: an integer

    attribute methods Methods provided to adjust this attribute

      _set_max_row

        Definition: sets the attribute value

      _max_row

        Definition: returns the attribute value

      has_max_row

        Definition: attribute predicate

_merge_map

    Definition: This is an array ref of array refs where the first level represents rows and the second level of array represents cells. If a cell is merged then the merge span is stored in the row sub array position. This means the same span is stored in multiple positions. The data is stored in the Excel convention of count from 1 so the first position in both levels of the array are essentially placeholders. The data is extracted from the merge section of the worksheet at worksheet/mergeCells. That array is read and converted into this format for reading by this module when it first opens the worksheet.

    Range: an array ref

    attribute methods Methods provided to adjust this attribute

      _set_merge_map

        Definition: sets the attribute value

    _get_merge_map

      Definition: returns the attribute array of arrays

delegated methods This attribute uses the native trait 'Array'

    _get_row_merge_map( $int ) delgated from 'Array' 'get'

      Definition: returns the sub array ref representing any merges for that row. If no merges are available for that row it returns undef.

_column_formats

    Definition: In order to (eventually) show all column formats that also affect individual cells the column based formats are read from the metada when the worksheet is opened. They are stored here for use although for now they are mostly used to determine the hidden state of the column. The formats are stored in the array by count from 1 column position.

    Range: an array ref

    attribute methods Methods provided to adjust this attribute

      _set_set_column_formats

        Definition: sets the attribute value

    _get_get_column_formats

      Definition: returns the attribute array

delegated methods This attribute uses the native trait 'Array'

    _get_custom_column_data( $int ) delgated from 'Array' 'get'

      Definition: returns the sub hash ref representing any formatting for that column. If no custom formatting is available it returns undef.

_new_row_inst

_row_hidden_states

    Definition: As the worksheet is parsed it will store the hidden state for the row in this attribute when each row is read. This is the only worksheet level caching done. It will not test whether the requested row hidden state has been read when accessing this data. If a method call a row past the current max parsed row it will return 0 (unhidden).

    Range: an array ref of Boolean values

    delegated methods This attribute uses the native trait 'Array'

      _set_row_hidden( $int ) delgated from 'Array' 'set'

        Definition: sets the hidden state for that $int (row) counting from 1.

      _get_row_hidden( $int ) delgated from 'Array' 'get'

        Definition: returns the known hidden state of the row.

Methods

These are the methods provided by this class for use within the package but are not intended to be used by the end user. Other private methods not listed here are used in the module but not used by the package. If the private method is listed here then replacement of this module either requires replacing them or rewriting all the associated connecting roles and classes.

_load_unique_bits

_get_next_value_cell

    Definition: This returns the worksheet file hash ref representation of the xml stored for the 'next' value cell. A cell is determined to have value based on the attribute "values_only" in Spreadsheet::XLSX::Reader::LibXML. Next is affected by the attribute "empty_is_end" in Spreadsheet::XLSX::Reader::LibXML. This method never returns an 'EOR' flag. It just wraps automatically. This does return values from the shared strings file integrated but not values from the Styles file integrated.

    Accepts: nothing

    Returns: a hashref of key value pairs

_get_col_row( $col, $row )

    Definition: This is the way to return the information about a specific position in the worksheet. Since this is a private method it requires its inputs to be in the 'count from one' index.

    Accepts: ( $column, $row ) - both required in that order

    Returns: whatever is in that worksheet position as a hashref

_get_row_all( $row )

    Definition: This is returns an array ref of each of the values in the row placed in their 'count from one' position. If the row is empty but it is not the end of the sheet then this will return an empty array ref.

    Accepts: ( $row ) - required

    Returns: an array ref

_is_column_hidden( @query_list )

    Definition: This is returns a list of hidden states for each column integer in the @query_list it will generally return n array ref of each of the values in the row placed in their 'count from one' position. If the row is empty but it is not the end of the sheet then this will return an empty array ref.

    Accepts: ( @query_list ) - integers in count from 1 representing requested columns

    Returns (when wantarray): a list of hidden states as follows; 1 => hidden, 0 => known to be unhidden, undef => unknown state (usually this represents columns before min_col or after max_col or at least past the last stored value in the column)

SUPPORT

TODO

    1. Nothing yet

AUTHOR

Jed Lund
jandrew@cpan.org

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

This software is copyrighted (c) 2014, 2015 by Jed Lund

DEPENDENCIES

SEE ALSO

    Log::Shiras

      All lines in this package that use Log::Shiras are commented out