NAME
Spreadsheet::XLSX::Reader::LibXML::XMLReader::WorksheetToRow - Pull rows out of worksheet xml files
SYNOPSIS
See t\Spreadsheet\XLSX\Reader\LibXML02-worksheet_to_row.t
DESCRIPTION
This documentation is written to explain ways to use this module when writing your own excel parser. To use the general package for excel parsing out of the box please review the documentation for Workbooks, Worksheets, and Cells
This module provides the basic connection to individual worksheet files (not chartsheets) for parsing xlsx workbooks and coalating shared strings data to cell data. It does not provide a way to connect to chartsheets. It does not provide the final view of a given cell. The final view of the cell is collated with the role (Interface) Spreadsheet::XLSX::Reader::LibXML::Worksheet. This reader extends the base reader class Spreadsheet::XLSX::Reader::LibXML::XMLReader. The functionality provided by those modules is not explained here.
For now this module reads each full row (with values) into a Spreadsheet::XLSX::Reader::LibXML::Row instance. It stores only the currently read row and the previously read row. Exceptions to this are the start of read and end of read. For start of read only the current row is available with the assumption that all prior implied rows are empty. When a position past the end of the sheet is called both current and prior rows are cleared and an 'EOF' or undef value is returned. See "file_boundary_flags" in Spreadsheet::XLSX::Reader::LibXML for more details. This allows for storage of row general formats by row and where a requested cell falls in a row without values that the empty state can be determined without rescanning the file.
All positions (row and column places and integers) at this level are stored and returned in count from one mode!
Modification of this module probably means extending a different reader or using other roles for implementation of the class. Search for
extends 'Spreadsheet::XLSX::Reader::LibXML::XMLReader';
To replace the base reader. Search for the method 'worksheet' in Spreadsheet::XLSX::Reader::LibXML and the variable '$parser_modules' to replace this whole thing.
Attributes
Data passed to new when creating an instance. For access to the values in these attributes see the listed 'attribute methods'. For general information on attributes see Moose::Manual::Attributes. For ways to manage the instance when opened see the Public Methods.
is_hidden
Definition: This is set when the sheet is read from the sheet metadata level indicating if the sheet is hidden
Default: none
Range: (1|0)
attribute methods Methods provided to adjust this attribute
is_sheet_hidden
Definition: return the attribute value
workbook_instance
Definition: This attribute holds a reference back to the workbook instance so that the worksheet has access to the global settings managed there. As a consequence many of the workbook methods are be exposed here. This includes some setter methods for workbook attributes. Beware that setting or adjusting the workbook level attributes with methods here will be universal and affect other worksheets. So don't forget to return the old value if you want the old behavour after you are done. If that doesn't make sense then don't use these methods. (Nothing to see here! Move along.)
Default: a Spreadsheet::XLSX::Reader::LibXML instance
attribute methods Methods of the workbook exposed here by the delegation of the instance to this class through this attribute
counting_from_zero
Definition: returns the "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML instance state
boundary_flag_setting
Definition: returns the "file_boundary_flags" in Spreadsheet::XLSX::Reader::LibXML instance state
change_boundary_flag( $Bool )
Definition: sets the "file_boundary_flags" in Spreadsheet::XLSX::Reader::LibXML instance state (For the whole workbook!)
get_shared_string_position( $int )
Definition: returns the shared string data stored in the sharedStrings file at position $int. For more information review Spreadsheet::XLSX::Reader::LibXML::SharedStrings. This is a delegation of a delegation!
get_format_position( $int, [$header] )
Definition: returns the format data stored in the styles file at position $int. If the optional $header is passed only the data for that header is returned. Otherwise all styles for that position are returned. For more information review Spreadsheet::XLSX::Reader::LibXML::Styles. This is a delegation of a delegation!
set_empty_is_end( $Bool )
Definition: sets the "empty_is_end" in Spreadsheet::XLSX::Reader::LibXML instance state (For the whole workbook!)
is_empty_the_end
Definition: returns the "empty_is_end" in Spreadsheet::XLSX::Reader::LibXML instance state.
get_group_return_type
Definition: returns the "group_return_type" in Spreadsheet::XLSX::Reader::LibXML instance state.
set_group_return_type( (instance|unformatted|value) )
Definition: sets the "group_return_type" in Spreadsheet::XLSX::Reader::LibXML instance state (For the whole workbook!)
get_epoch_year
Definition: uses the "get_epoch_year" in Spreadsheet::XLSX::Reader::LibXML method.
get_date_behavior
Definition: This is a delegated method from the styles class (stored as a private instance in the workbook). It is held (and documented) in the Spreadsheet::XLSX::Reader::LibXML::ParseExcelFormatStrings role. It will indicate how far unformatted transformation is carried for date coercions when returning formatted values.
set_date_behavior
Definition: This is a delegated method from the styles class (stored as a private instance in the workbook). It is held (and documented) in the Spreadsheet::XLSX::Reader::LibXML::ParseExcelFormatStrings role. It will set how far unformatted transformation is carried for date coercions when returning formatted values.
get_values_only
Definition: gets the "values_only" in Spreadsheet::XLSX::Reader::LibXML instance state.
set_values_only
Definition: sets the "values_only" in Spreadsheet::XLSX::Reader::LibXML instance state (For the whole workbook!)
_sheet_min_col
Definition: This is the minimum column in the sheet with data or formatting. For this module it is pulled from the xml file at worksheet/dimension:ref = "upperleft:lowerright"
Range: an integer
attribute methods Methods provided to adjust this attribute
_set_min_col
Definition: sets the attribute value
_min_col
Definition: returns the attribute value
has_min_col
Definition: attribute predicate
_sheet_min_row
Definition: This is the minimum row in the sheet with data or formatting. For this module it is pulled from the xml file at worksheet/dimension:ref = "upperleft:lowerright"
Range: an integer
attribute methods Methods provided to adjust this attribute
_set_min_row
Definition: sets the attribute value
_min_row
Definition: returns the attribute value
has_min_row
Definition: attribute predicate
_sheet_max_col
Definition: This is the maximum column in the sheet with data or formatting. For this module it is pulled from the xml file at worksheet/dimension:ref = "upperleft:lowerright"
Range: an integer
attribute methods Methods provided to adjust this attribute
_set_max_col
Definition: sets the attribute value
_max_col
Definition: returns the attribute value
has_max_col
Definition: attribute predicate
_sheet_max_row
Definition: This is the maximum row in the sheet with data or formatting. For this module it is pulled from the xml file at worksheet/dimension:ref = "upperleft:lowerright"
Range: an integer
attribute methods Methods provided to adjust this attribute
_set_max_row
Definition: sets the attribute value
_max_row
Definition: returns the attribute value
has_max_row
Definition: attribute predicate
_merge_map
Definition: This is an array ref of array refs where the first level represents rows and the second level of array represents cells. If a cell is merged then the merge span is stored in the row sub array position. This means the same span is stored in multiple positions. The data is stored in the Excel convention of count from 1 so the first position in both levels of the array are essentially placeholders. The data is extracted from the merge section of the worksheet at worksheet/mergeCells. That array is read and converted into this format for reading by this module when it first opens the worksheet.
Range: an array ref
attribute methods Methods provided to adjust this attribute
_set_merge_map
Definition: sets the attribute value
_get_merge_map
Definition: returns the attribute array of arrays
delegated methods This attribute uses the native trait 'Array'
_get_row_merge_map( $int ) delgated from 'Array' 'get'
Definition: returns the sub array ref representing any merges for that row. If no merges are available for that row it returns undef.
_column_formats
Definition: In order to (eventually) show all column formats that also affect individual cells the column based formats are read from the metada when the worksheet is opened. They are stored here for use although for now they are mostly used to determine the hidden state of the column. The formats are stored in the array by count from 1 column position.
Range: an array ref
attribute methods Methods provided to adjust this attribute
_set_set_column_formats
Definition: sets the attribute value
_get_get_column_formats
Definition: returns the attribute array
delegated methods This attribute uses the native trait 'Array'
_get_custom_column_data( $int ) delgated from 'Array' 'get'
Definition: returns the sub hash ref representing any formatting for that column. If no custom formatting is available it returns undef.
_old_row_inst
Definition: This is the prior read row instance or undef for the beginning or end of the sheet read.
Range: isa => InstanceOf[ Spreadsheet::XLSX::Reader::LibXML::Row ]
attribute methods Methods provided to adjust this attribute
_set_old_row_inst
Definition: sets the attribute value
_get_old_row_inst
Definition: returns the attribute
_clear_old_row_inst
Definition: clears the attribute
_has_old_row_inst
Definition: predicate for the attribute
delegated methods from Spreadsheet::XLSX::Reader::LibXML::Row
_get_old_row_number = "get_row_number" in Spreadsheet::XLSX::Reader::LibXML::Row
_is_old_row_hidden = "is_row_hidden" in Spreadsheet::XLSX::Reader::LibXML::Row
_get_old_row_formats = "get_row_format" in Spreadsheet::XLSX::Reader::LibXML::Row
pass the desired format key
_get_old_column = "get_the_column( $column )" in Spreadsheet::XLSX::Reader::LibXML::Row
pass a column number (no next default) returns (cell|undef|EOR)
_get_old_last_value_col = "get_last_value_column" in Spreadsheet::XLSX::Reader::LibXML::Row
_get_old_row_list = "get_row_all" in Spreadsheet::XLSX::Reader::LibXML::Row
_get_old_row_end = "get_row_endl" in Spreadsheet::XLSX::Reader::LibXML::Row
_new_row_inst
Definition: This is the current read row instance or undef for the end of the sheet read.
Range: isa => InstanceOf[ Spreadsheet::XLSX::Reader::LibXML::Row ]
attribute methods Methods provided to adjust this attribute
_set_new_row_inst
Definition: sets the attribute value
_get_new_row_inst
Definition: returns the attribute
_clear_new_row_inst
Definition: clears the attribute
_has_new_row_inst
Definition: predicate for the attribute
delegated methods from Spreadsheet::XLSX::Reader::LibXML::Row
_get_new_row_number = "get_row_number" in Spreadsheet::XLSX::Reader::LibXML::Row
_is_new_row_hidden = "is_row_hidden" in Spreadsheet::XLSX::Reader::LibXML::Row
_get_new_row_formats = "get_row_format" in Spreadsheet::XLSX::Reader::LibXML::Row
pass the desired format key
_get_new_column = "get_the_column( $column )" in Spreadsheet::XLSX::Reader::LibXML::Row
pass a column number (no next default) returns (cell|undef|EOR)
_get_new_next_value = "get_the_next_value_position" in Spreadsheet::XLSX::Reader::LibXML::Row
pass nothing returns next (cell|EOR)
_get_new_last_value_col = "get_last_value_column" in Spreadsheet::XLSX::Reader::LibXML::Row
_get_new_row_list = "get_row_all" in Spreadsheet::XLSX::Reader::LibXML::Row
_get_new_row_end = "get_row_endl" in Spreadsheet::XLSX::Reader::LibXML::Row
_row_hidden_states
Definition: As the worksheet is parsed it will store the hidden state for the row in this attribute when each row is read. This is the only worksheet level caching done. It will not test whether the requested row hidden state has been read when accessing this data. If a method call a row past the current max parsed row it will return 0 (unhidden).
Range: an array ref of Boolean values
delegated methods This attribute uses the native trait 'Array'
_set_row_hidden( $int ) delgated from 'Array' 'set'
Definition: sets the hidden state for that $int (row) counting from 1.
_get_row_hidden( $int ) delgated from 'Array' 'get'
Definition: returns the known hidden state of the row.
Methods
These are the methods provided by this class for use within the package but are not intended to be used by the end user. Other private methods not listed here are used in the module but not used by the package. If the private method is listed here then replacement of this module either requires replacing them or rewriting all the associated connecting roles and classes.
_load_unique_bits
Definition: This is called by Spreadsheet::XLSX::Reader::LibXML::XMLReader when the file is loaded for the first time so that file specific metadata can be collected.
Accepts: nothing
Returns: nothing
_get_next_value_cell
Definition: This returns the worksheet file hash ref representation of the xml stored for the 'next' value cell. A cell is determined to have value based on the attribute "values_only" in Spreadsheet::XLSX::Reader::LibXML. Next is affected by the attribute "empty_is_end" in Spreadsheet::XLSX::Reader::LibXML. This method never returns an 'EOR' flag. It just wraps automatically. This does return values from the shared strings file integrated but not values from the Styles file integrated.
Accepts: nothing
Returns: a hashref of key value pairs
_get_col_row( $col, $row )
Definition: This is the way to return the information about a specific position in the worksheet. Since this is a private method it requires its inputs to be in the 'count from one' index.
Accepts: ( $column, $row ) - both required in that order
Returns: whatever is in that worksheet position as a hashref
_get_row_all( $row )
Definition: This is returns an array ref of each of the values in the row placed in their 'count from one' position. If the row is empty but it is not the end of the sheet then this will return an empty array ref.
Accepts: ( $row ) - required
Returns: an array ref
_is_column_hidden( @query_list )
Definition: This is returns a list of hidden states for each column integer in the @query_list it will generally return n array ref of each of the values in the row placed in their 'count from one' position. If the row is empty but it is not the end of the sheet then this will return an empty array ref.
Accepts: ( @query_list ) - integers in count from 1 representing requested columns
Returns (when wantarray): a list of hidden states as follows; 1 => hidden, 0 => known to be unhidden, undef => unknown state (usually this represents columns before min_col or after max_col or at least past the last stored value in the column)
SUPPORT
TODO
1. Nothing yet
AUTHOR
COPYRIGHT
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.
This software is copyrighted (c) 2014, 2015 by Jed Lund
DEPENDENCIES
Clone - clone
Carp - confess
Type::Tiny - 1.000
MooseX::ShortCut::BuildInstance - build_instance should_re_use_classes
Spreadsheet::XLSX::Reader::LibXML
Spreadsheet::XLSX::Reader::LibXML::XMLReader
Spreadsheet::XLSX::Reader::LibXML::Row
SEE ALSO
All lines in this package that use Log::Shiras are commented out