NAME
Spreadsheet::XLSX::Reader::LibXML::XMLReader::Worksheet - A LibXML::XMLReader worksheet base class
SYNOPSIS
use Data::Dumper;
use MooseX::ShortCut::BuildInstance qw( build_instance );
use Types::Standard qw( Bool HasMethods );
use Spreadsheet::XLSX::Reader::LibXML::Error;
use Spreadsheet::XLSX::Reader::LibXML::XMLReader::Worksheet;
my $error_instance = Spreadsheet::XLSX::Reader::LibXML::Error->new( should_warn => 0 );
my $workbook_instance = build_instance(
package => 'WorkbookInstance',
add_methods =>{
counting_from_zero => sub{ return 0 },
boundary_flag_setting => sub{},
change_boundary_flag => sub{},
_has_shared_strings_file => sub{ return 1 },
get_shared_string_position => sub{},
_has_styles_file => sub{},
get_format_position => sub{},
get_group_return_type => sub{},
set_group_return_type => sub{},
get_epoch_year => sub{ return '1904' },
change_output_encoding => sub{ $_[0] },
get_date_behavior => sub{},
set_date_behavior => sub{},
get_empty_return_type => sub{ return 'undef_string' },
get_values_only => sub{},
set_values_only => sub{},
},
add_attributes =>{
error_inst =>{
isa => HasMethods[qw(
error set_error clear_error set_warnings if_warn
) ],
clearer => '_clear_error_inst',
reader => 'get_error_inst',
required => 1,
handles =>[ qw(
error set_error clear_error set_warnings if_warn
) ],
},
empty_is_end =>{
isa => Bool,
writer => 'set_empty_is_end',
reader => 'is_empty_the_end',
default => 0,
},
from_the_edge =>{
isa => Bool,
reader => '_starts_at_the_edge',
writer => 'set_from_the_edge',
default => 1,
},
},
error_inst => $error_instance,
);
my $test_instance = Spreadsheet::XLSX::Reader::LibXML::XMLReader::Worksheet->new(
file => 'xl/worksheets/sheet3.xml',
error_inst => $error_instance,
sheet_name => 'Sheet3',
workbook_instance => $workbook_instance,
);
my $x = 0;
my $result;
while( $x < 20 and (!$result or $result ne 'EOF') ){
$result = $test_instance->_get_next_value_cell;
print "Collecting data from position: $x" . Dumper( $result );
$x++;
}
###############################################
# SYNOPSIS Screen Output
# 01: Collecting data from position: 0
# 02: $VAR1 = {
# 03: 'r' => 'A2',
# 04: 'row' => 2,
# 05: 'col' => 1,
# 06: 'v' => {
# 07: 'raw_text' => '0'
# 08: },
# 09: 't' => 's'
# 10: };
# 11:
# 12: Collecting data from position: 1
# 13: $VAR1 = {
# 14: 'r' => 'D2',
# 15: 'row' => 2,
# 16: 'col' => 4,
# 17: 'v' => {
# 18: 'raw_text' => '2'
# 19: },
# 20: 't' => 's'
# 21: };
# ~~ Continuing ~~
###############################################
DESCRIPTION
This documentation is written to explain ways to use this module when writing your own excel parser. To use the general package for excel parsing out of the box please review the documentation for Workbooks, Worksheets, and Cells
This module provides the basic connection to individual worksheet files for parsing xlsx workbooks. It does not provide a way to connect to chartsheets. It does not provide the final view of a given cell. The final view of the cell is collated with the role Spreadsheet::XLSX::Reader::LibXML::GetCell. This reader extends the base reader class Spreadsheet::XLSX::Reader::LibXML::XMLReader. The functionality provided by those modules is not covered here.
Modification of this module probably means extending a different reader or using other roles for implementation of the class. See lines 18 and on in the code here for the location to change and See line 54 in the code Spreadsheet::XLSX::Reader::LibXML for the way to repoint the package at a new module.
Attributes
Data passed to new when creating an instance. For access to the values in these attributes see the listed 'attribute methods'. For general information on attributes see Moose::Manual::Attributes. For ways to manage the instance when opened see the Public Methods. The remaining undocumented attributes are used internally for tracking state.
sheet_type
Definition: This will always be 'worksheet' for this module. It is provided as a simple introspection method for distinguishing between worksheets and chartsheets in case the circumstances are ambiguous.
Default: 'worksheet'
Range: 'worksheet'
attribute methods Methods provided to adjust this attribute
get_sheet_type
Definition: return the attribute value
sheet_rel_id
Definition: To coordinate information accross the various sub-files excel maintains a relId for sheets. This is the value that excel assigned to this sheet.
Range: a string
attribute methods Methods provided to adjust this attribute
rel_id
Definition: return the attribute value
sheet_id
Definition: When writing vbScript the sheet can be identified by a number instead of a name. This is that number.
Range: an integer
attribute methods Methods provided to adjust this attribute
sheet_id
Definition: return the attribute value
sheet_position
Definition: Even if there are chartsheets in the workbook you will only get a list of worksheets when using 'worksheet' methods. However, this position is the visible position of the worksheet in the workbook including chartsheets. This can be different than sheet_id
Range: an integer
attribute methods Methods provided to adjust this attribute
position
Definition: return the attribute value
sheet_name
Definition: This is the visible string expressed on the tab of the worksheet in the workbook.
Range: a String
attribute methods Methods provided to adjust this attribute
get_name
Definition: return the attribute value
_sheet_min_col
Definition: This is the minimum column in the sheet with data or formatting. For this module it is pulled from the xml file at worksheet/dimension:ref = "upperleft:lowerright"
Range: an integer
attribute methods Methods provided to adjust this attribute
_set_min_col
Definition: sets the attribute value
_min_col
Definition: returns the attribute value
has_min_col
Definition: attribute predicate
_sheet_min_row
Definition: This is the minimum row in the sheet with data or formatting. For this module it is pulled from the xml file at worksheet/dimension:ref = "upperleft:lowerright"
Range: an integer
attribute methods Methods provided to adjust this attribute
_set_min_row
Definition: sets the attribute value
_min_row
Definition: returns the attribute value
has_min_row
Definition: attribute predicate
_sheet_max_col
Definition: This is the maximum column in the sheet with data or formatting. For this module it is pulled from the xml file at worksheet/dimension:ref = "upperleft:lowerright"
Range: an integer
attribute methods Methods provided to adjust this attribute
_set_max_col
Definition: sets the attribute value
_max_col
Definition: returns the attribute value
has_max_col
Definition: attribute predicate
_sheet_max_row
Definition: This is the maximum row in the sheet with data or formatting. For this module it is pulled from the xml file at worksheet/dimension:ref = "upperleft:lowerright"
Range: an integer
attribute methods Methods provided to adjust this attribute
_set_max_row
Definition: sets the attribute value
_max_row
Definition: returns the attribute value
has_max_row
Definition: attribute predicate
_merge_map
Definition: This is an array ref of array refs where the first level represents rows and the second level of array represents cells. If a cell is merged then the merge span is stored in the row sub array position. This means the same span is stored in multiple positions. The data is stored in the Excel convention of count from 1 so the first position in both levels of the array are essentially placeholders. The data is extracted from the merge section of the worksheet at worksheet/mergeCells. That array is read and converted into this format for reading by this module when it first opens the worksheet..
Range: an array ref
attribute methods Methods provided to adjust this attribute
_set_merge_map
Definition: sets the attribute value
delegated methods This attribute uses the native trait 'Array'
_get_row_merge_map( $int ) => 'get'
Definition: returns the sub array ref representing any merges for that row. If no merges are available for that row it returns undef.
Public Methods
These are the methods provided by this class for use by the end user.
min_row
Definition: This returns the minimum row with data or formatting in the worksheet. It is separated from _min_row so that the package can modify the output between the functions to match the attribute "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML
Accepts: nothing
Returns: the minimum row integer
max_row
Definition: This returns the maximum row with data or formatting in the worksheet. It is separated from _max_row so that the package can modify the output between the functions to match the attribute "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML
Accepts: nothing
Returns: the maximum row integer
min_col
Definition: This returns the minimum column with data or formatting in the worksheet. It is separated from _min_col so that the package can modify the output between the functions to match the attribute "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML
Accepts: nothing
Returns: the minimum column integer
max_col
Definition: This returns the maximum column with data or formatting in the worksheet. It is separated from _max_col so that the package can modify the output between the functions to match the attribute "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML
Accepts: nothing
Returns: the maximum column integer
row_range
Definition: This returns the first and last row with data or formatting in the worksheet. It is separated from _min_row and _max_row so that the package can modify the output between the functions to match the attribute "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML
Accepts: nothing
Returns: ( $min_row, $max_row ) as a list
col_range
Definition: This returns the first and last column with data or formatting in the worksheet. It is separated from _min_col and _max_col so that the package can modify the output between the functions to match the attribute "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML
Accepts: nothing
Returns: ( $min_col, $max_col ) as a list
get_merged_areas
Definition: This method returns an array ref of cells that are merged. This method does respond to the attribute "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML
Accepts: nothing
Returns: An arrayref of arrayrefs of merged areas or undef if no merged areas
[ [ $start_row_1, $start_col_1, $end_row_1, $end_col_1], etc.. ]
is_sheet_hidden
Definition: Method indicates if the excel program would hide the sheet or show it if the file were opened in the Microsoft Excel application
Accepts: nothing
Returns: a boolean value indicating if the sheet is hidden or not 1 = hidden
is_column_hidden
Definition: Method indicates if the excel program would hide the identified column(s) or show it|them if the file were opened in the Microsoft Excel application. If more than one column is passed then it returns true if any of the columns are hidden in scalar context and a list of 1 and 0 values for each of the requested positions in array (list) context. This method (input) does respond to the attribute "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML. Unlike the method 'is_row_hidden' this method will always 'know' the correct answer since the information is stored outside of the dataa table in the xml file.
Accepts: integer values or column letter values selecting the columns in question
Returns: in scalar context it returns a boolean value indicating if any of the requested columns would be hidden by Excel. In array/list context it returns a list of boolean values for each requested column indicating it's hidden state for Excel. (1 = hidden)
is_row_hidden
Definition: Method indicates if the excel program would hide the identified row(s) or show it|them if the file were opened in the Microsoft Excel application. If more than one row is passed then it returns true if any of the rows are hidden in scalar context and a list of 1 and 0 values for each of the requested positions in array (list) context. This method (input) does respond to the attribute "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML. Warning: THIS METHOD WILL ONLY BE ACCURATE AFTER THE SHEET HAS READ AT LEAST ONE CELL FROM THE ROW NUMBER REQUESTED. THIS ALLOWS THE SHEET TO AVOID READING ALL THE WAY THROUGH ONCE BEFORE STARTING THE CELL PARSING.
Accepts: integer values selecting the rows in question
Returns: in scalar context it returns a boolean value indicating if any of the requested rows would be hidden by Excel. In array/list context it returns a list of boolean values for each requested row indicating it's hidden state for Excel. (1 = hidden)
Private Methods
These are the methods provided by this class for use within the package but are not intended to be used by the end user. Other private methods not listed here are used in the module but not used by the package. If the private method is listed here then replacement of this module either requires replacing them or rewriting all the associated connecting roles and classes.
_load_unique_bits
Definition: This is called by Spreadsheet::XLSX::Reader::LibXML::XMLReader when the file is loaded for the first time so that file specific data can be collected. All the Attributes with a leading _ in the documentation are filled in at this point.
Accepts: nothing
Returns: nothing
_get_next_value_cell
Definition: This returns the worksheet file hash ref representation of the xml stored for the 'next' value cell. A cell is determined to have value based on the attribute "values_only" in Spreadsheet::XLSX::Reader::LibXML. Next is affected by the attribute "empty_is_end" in Spreadsheet::XLSX::Reader::LibXML. This method never returns an 'EOR' flag. It just wraps automatically.
Accepts: nothing
Returns: the cell (or value as requested)
_get_next_cell
Definition: This returns on every cell position whether there is data or not. For empty cells undef is returned. Next is affected by the attribute "empty_is_end" in Spreadsheet::XLSX::Reader::LibXML. This method never returns an 'EOR' flag. It just wraps automatically.
Accepts: nothing
Returns: undef, a value, or the cell
_get_col_row( $col, $row )
Definition: This is the way to return the information about a specific position in the worksheet. Since this is a private method it requires its inputs to be in the 'count from one' index.
Accepts: ( $column, $row ) - both required in that order
Returns: whatever is in that worksheet position as a hashref
_get_row_all( $row )
Definition: This is returns an array ref of each of the values in the row placed in their 'count from one' position. If the row is empty but it is not the end of the sheet then this will return an empty array ref.
Accepts: ( $row ) - required
Returns: an array ref
SUPPORT
TODO
1. Nothing yet
AUTHOR
COPYRIGHT
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.
This software is copyrighted (c) 2014, 2015 by Jed Lund
DEPENDENCIES
Carp - confess
Type::Tiny - 1.000
Spreadsheet::XLSX::Reader::LibXML::XMLReader
Spreadsheet::XLSX::Reader::LibXML::CellToColumnRow
SEE ALSO
Spreadsheet::ParseExcel - Excel 2003 and earlier
Spreadsheet::XLSX - 2007+
Spreadsheet::ParseXLSX - 2007+
All lines in this package that use Log::Shiras are commented out