NAME
Spreadsheet::XLSX::Reader::LibXML::GetCell - Top level xlsx Worksheet interface
DESCRIPTION
This documentation is written to explain ways to extend this package. To use the data extraction of Excel workbooks, worksheets, and cells please review the documentation for Spreadsheet::XLSX::Reader::LibXML, Spreadsheet::XLSX::Reader::LibXML::Worksheet, and Spreadsheet::XLSX::Reader::LibXML::Cell
This is the extracted Role to be used as a top level worksheet interface. This is the place where all the various details in each sub XML sheet are coallated into a set of data representing all the necessary information for a requested cell. Since this is the center of data coallation all elements that may be customized should reside outside of this role. This includes any specific elements that would be different between each of the sheet parser types and any element of Excel data presentation that may lend itself to customization. For instance all the XML parser methods, (Reader, DOM, and possibly SAX) can plug in outside of this role.
This role is also where a layer of abstraction is maintained to manage user defined count-from-one or count-from-zero mode. The layer of abstraction is use with the Moose around modifier. The behaviour is managed with the workbook attribute counting_from_zero .
requires
These are method(s) used by this Role but not provided by the role. Any class consuming this role will not build without first providing these methods prior to loading this role. Since this is the center of data coallation the list is long.
set_error( $error_string )
Definition: Used to set errors that occur in code from this role. See Spreadsheet::XLSX::Reader::LibXML::Error for the default implementation of this functionality.
min_row
Definition: Used to get the minimum data row set for the worksheet.
max_row
Definition: Used to get the maximum data row set for the worksheet.
min_col
Definition: Used to get the minimum data column set for the worksheet.
max_col
Definition: Used to get the maximum data column set for the worksheet.
row_range
Definition: Used to return a list of the $minimum_row and $maximum_row
col_range
Definition: Used to return a list of the $minimum_column and $maximum_column
change_output_encoding
Definition: This should be using the localization role. The role should be reachable by delegation through a workbook parser attribute (trait) in the class consuming this role.
get_group_return_type
Definition: This is where the group_return_type attribute is used as a consequence this role needs to understand the current setting. It should be reachable by delegation through a workbook parser attribute (trait) in the class consuming this role.
set_group_return_type
Definition: In order to speed up header reading and setting the full cell is not needed, just the value. As a consequence this role changes the group_return_type attribute as needed for header retrieval.
_get_next_value_cell
Definition: This should return the next cell data from the worksheet file that contains unique formatting or information. The data is expected in a perl hash ref. This method should collect data left to right and top to bottom. The styles.xml, sharedStrings.xml, and calcChain.xml etc. sheet data are coallated into the cell later in this role. An 'EOF' string should be returned when the file has reached the end and then the method should wrap back to the beginning.
Example
{
'r' => 'A6', # The cell ID
'cell_merge' => 'A6:B6', # The merge range
'row' => 6, # Already converted to count by 1
'col' => 1, # Already converted to count by 1
't' => 's' # Cell data type (string)
'v' => { # Cell data (since this cell is string
'raw_text' => '15' # data this actually points to position
} # 15 in the sharedStrings.xml file )
's' => '11', # Styles type (position 11 in the styles sheet)
}
_get_next_cell
Definition: Like _get_next_value_cell this method should return the next cell. The difference is it should return undef for empty cells rather than skipping them. This method should collect data left to right and top to bottom. The styles.xml, sharedStrings.xml, and calcChain.xml etc. sheet data are coallated into the cell later in this role. An 'EOF' string should be returned when the file has reached the end and then the method should wrap back to the beginning.
_get_col_row
Definition: This method should provide a targeted way to return the worksheet file information on a cell. It should only accept count-from-one column and row numbers and the column should be required before the row. If the request is made for an out of row bounds position the method should provide an 'EOR' string. An 'EOF' string should be returned when the file has reached the end and then the method should wrap back to the beginning.
_get_error_inst
Definition: This method is used to access the shared "error_inst" in Spreadsheet::XLSX::Reader::LibXML
counting_from_zero
Definition: This role implements some of the count_from_zero attribute behaviors so it needs to be able to read the current state.
boundary_flag_setting
Definition: This role implements some of the file_boundary_flags attribute behaviors so it needs to be able to read the current state.
_has_shared_strings_file
Definition: This should indicate if the sharedStrings.xml file is available
get_shared_string_position( $position )
Definition: This should return a hashref of data for the indicated $position
Example: from the example shared strings position 15
{
'rich_text' => [ # Rich text definition
2, # Position from 0 to start this element
{ # Element definition
'color' => {
'rgb' => 'FFFF0000'
},
'sz' => '11',
'b' => 1,
'scheme' => 'minor',
'rFont' => 'Calibri',
'family' => '2'
},
6, # Position from 0 to start this element
{ # Element definition
'color' => {
'rgb' => 'FF0070C0'
},
'sz' => '20',
'b' => 1,
'scheme' => 'minor',
'rFont' => 'Calibri',
'family' => '2'
}
],
'raw_text' => 'Hello World', # The raw text the format is applied to
}
_has_styles_file
Definition: This should indicate if the styles.xml file is available
get_format_position( $position )
Definition: This should return a hashref of data for the indicated $position in the styles.xml file. This will include any general cell formatting as well as any references to the subroutines for number conversions either defined by Excel or any custom (user defined) conversion subroutines
Example: from the example styles position 11
{
'fontId' => '0',
'fonts' => {
'color' => {
'theme' => '1'
},
'sz' => '11',
'name' => 'Calibri',
'scheme' => 'minor',
'family' => '2'
},
'numFmtId' => '0',
'fillId' => '0',
'xfId' => '0',
'applyAlignment' => '1',
'borders' => {
'left' => 1,
'right' => 1,
'top' => 1,
'diagonal' => 1,
'bottom' => 1
},
'borderId' => '0',
'alignment' => {
'horizontal' => 'left'
},
'cellStyleXfs' => {
'fillId' => '0',
'fontId' => '0',
'borderId' => '0',
'numFmtId' => '0'
},
'fills' => {
'patternFill' => {
'patternType' => 'none'
}
},
'numFmts' => bless( { # This is the package build Type::Tiny object
'name' => 'Excel_number_0', # to be use for coercion
'coercion' => bless( {
~~ Type::Coercion instance here ~~
}, 'Type::Coercion' ),
'display_name' => 'Excel_number_0',
'uniq' => 94
}, 'Type::Tiny' )
}
Primary Methods
These are the various methods provided by this role.
get_cell( $row, $column )
Definition: Used to return the cell or information from the cell at the specified $row and $column. Both $row and $column are required.
Accepts: the list ( $row, $column ) both required
Returns: see group_return_type for details on what is returned
get_next_value
Definition: Reading left to right and top to bottom this will return the next cell with a value. This actually includes cells with no value but some unique formatting such as cells that have been merged with other cells.
Accepts: nothing
Returns: see group_return_type for details on what is returned
fetchrow_arrayref( $row )
Definition: In an homage to DBI I included this function to return an array ref of the cells or values in the requested $row. If no row is requested this returns the 'next' row. In the array ref any empty and non unique cell will show as 'undef'.
Accepts: undef = next|$row = a row integer indicating the desired row
Returns: an array ref of all possible column positions in that row with data filled in as appropriate.
fetchrow_array( $row )
Definition: This function is just like fetchrow_arrayref except it returns an array instead of an array ref
Accepts: undef = next|$row = a row integer indicating the desired row
Returns: an array of all possible column positions in that row with data filled in as appropriate.
set_headers( @header_row_list )
Definition: This function is used to set headers used in the function fetchrow_hashref. It accepts a list of row numbers that will be collated into a set of headers used to build the hashref for each row. The header rows are coallated in sequence with the first number taking precedence. The list is also used to set the lowest row of the headers in the table. All rows at that level and higher will be considered out of the table and will return undef while setting the error instance. If some of the columns do not have values then the instance will auto generate unique headers for each empty header column to fill out the header ref.
Accepts: a list of row numbers
Returns: an array ref of the built headers for review
fetchrow_hashref( $row )
Definition: This function is used to return a hashref representing the data in the specified row. If no $row value is passed it will return the 'next' row of data. A call to this function without setting the headers first will return undef and set the error instance.
Accepts: a target $row number for return values or undef meaning 'next'
Returns: a hash ref of the values for that row
Attributes
Data passed to new when creating the Styles instance. For modification of these attributes see the listed 'attribute methods'. For more information on attributes see Moose::Manual::Attributes. Most of these are not exposed to the top level of the workbook parser. As a consequence these attribute methods which are available at the worksheet instance level are the best way to manipulate the attribute settings.
last_header_row
Definition: This is generally set by the method set_headers and is the largest row number of the @header_row_list even if the list is out of sequence.
Default: undef
attribute methods Methods provided to adjust this attribute
get_last_header_row
Definition: returns the value of the attribute
has_last_header_row
Definition: predicate for the attribute
min_header_col
Definition: When the method fetchrow_array(ref) or fetchrow_hashref are called it is possible to only collect a set of information between two defined columns. This is the attribute that defines the start point.
Default: undef
attribute methods Methods provided to adjust this attribute
set_min_header_col
Definition: returns the value of the attribute
get_min_header_col
Definition: returns the value of the attribute
has_min_header_col
Definition: predicate for the attribute
clear_min_header_col
Definition: sets min_header_col to undef
max_header_col
Definition: When the method fetchrow_array(ref) or fetchrow_hashref are called it is possible to only collect a set of information between two defined columns. This is the attribute that defines the end point.
Default: undef
attribute methods Methods provided to adjust this attribute
set_max_header_col
Definition: returns the value of the attribute
get_max_header_col
Definition: returns the value of the attribute
has_max_header_col
Definition: predicate for the attribute
clear_max_header_col
Definition: sets min_header_col to undef
custom_formats
Definition: When this role is coallating data about a cell it will check this attribute before it checks the styles sheet to see if there is a format defined by the user for converting the unformatted data. The formats stored must have two methods 'assert_coerce' and 'display_name'. The collater first checks the cellID as a key, then it checks for just the column letter(s) as a key, and finally it checks the row number as a key.
Default: undef
attribute methods Methods provided to adjust this attribute
set_custom_formats( { $key = $conversion } )>
Definition: a way to set all $key => $conversion pairs at once
Accepts: a hashref of $key => $conversion pairs
has_custom_format( $key )
Definition: checks if the specific $key for a format is registered
get_custom_format( $key )
Definition: get the custom format for the requested $key
Returns: the $conversion registered to the $key
set_custom_format( $key = $conversion )>
Definition: set the custom format $conversion for the identified $key
SUPPORT
TODO
1. Eliminate the min / max row / col calls from this role (and requireds) if possible.
AUTHOR
COPYRIGHT
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.
This software is copyrighted (c) 2014 by Jed Lund
DEPENDENCIES
SEE ALSO
Spreadsheet::ParseExcel - Excel 2003 and earlier
Spreadsheet::XLSX - 2007+
Spreadsheet::ParseXLSX - 2007+
All lines in this package that use Log::Shiras are commented out