NAME

Spreadsheet::XLSX::Reader::LibXML::GetCell - Top level xlsx Worksheet interface

DESCRIPTION

This documentation is written to explain ways to extend this package. To use the data extraction of Excel workbooks, worksheets, and cells please review the documentation for Spreadsheet::XLSX::Reader::LibXML, Spreadsheet::XLSX::Reader::LibXML::Worksheet, and Spreadsheet::XLSX::Reader::LibXML::Cell

This is the extracted Role to be used as a top level worksheet interface. This is the place where all the various details in each sub XML sheet are coallated into a set of data representing all the necessary information for a requested cell. Since this is the center of data coallation all elements that may be customized should reside outside of this role. This includes any specific elements that would be different between each of the sheet parser types and any element of Excel data presentation that may lend itself to customization. For instance all the XML parser methods, (Reader, DOM, and possibly SAX) can plug in outside of this role.

This role is also where a layer of abstraction is maintained to manage user defined count-from-one or count-from-zero mode. The layer of abstraction is use with the Moose around modifier. The behaviour is managed with the workbook attribute counting_from_zero .

requires

These are method(s) used by this Role but not provided by the role. Any class consuming this role will not build without first providing these methods prior to loading this role. Since this is the center of data coallation the list is long.

get_log_space

    Definition: Used to return the log space used by the code protected by ###LogSD. See Log::Shiras for more information.

set_error( $error_string )

min_row

    Definition: Used to get the minimum data row set for the worksheet.

max_row

    Definition: Used to get the maximum data row set for the worksheet.

min_col

    Definition: Used to get the minimum data column set for the worksheet.

max_col

    Definition: Used to get the maximum data column set for the worksheet.

row_range

    Definition: Used to return a list of the $minimum_row and $maximum_row

col_range

    Definition: Used to return a list of the $minimum_column and $maximum_column

change_output_encoding

    Definition: This should be using the localization role. The role should be reachable by delegation through a workbook parser attribute (trait) in the class consuming this role.

get_group_return_type

    Definition: This is where the group_return_type attribute is used as a consequence this role needs to understand the current setting. It should be reachable by delegation through a workbook parser attribute (trait) in the class consuming this role.

set_group_return_type

    Definition: In order to speed up header reading and setting the full cell is not needed, just the value. As a consequence this role changes the group_return_type attribute as needed for header retrieval.

_get_next_value_cell

    Definition: This should return the next cell data from the worksheet file that contains unique formatting or information. The data is expected in a perl hash ref. This method should collect data left to right and top to bottom. The styles.xml, sharedStrings.xml, and calcChain.xml etc. sheet data are coallated into the cell later in this role. An 'EOF' string should be returned when the file has reached the end and then the method should wrap back to the beginning.

    Example

    {
    	'r' => 'A6',				# The cell ID
    	'cell_merge' => 'A6:B6',	# The merge range
    	'row' => 6,					# Already converted to count by 1
    	'col' => 1,					# Already converted to count by 1
    	't' => 's'					# Cell data type (string)
    	'v' => {					# Cell data (since this cell is string 
    		'raw_text' => '15'		# 	data this actually points to position 
    	}							# 	15 in the sharedStrings.xml file )
    	's' => '11',				# Styles type (position 11 in the styles sheet)
    }

_get_next_cell

    Definition: Like _get_next_value_cell this method should return the next cell. The difference is it should return undef for empty cells rather than skipping them. This method should collect data left to right and top to bottom. The styles.xml, sharedStrings.xml, and calcChain.xml etc. sheet data are coallated into the cell later in this role. An 'EOF' string should be returned when the file has reached the end and then the method should wrap back to the beginning.

_get_col_row

    Definition: This method should provide a targeted way to return the worksheet file information on a cell. It should only accept count-from-one column and row numbers and the column should be required before the row. If the request is made for an out of row bounds position the method should provide an 'EOR' string. An 'EOF' string should be returned when the file has reached the end and then the method should wrap back to the beginning.

_get_error_inst

counting_from_zero

    Definition: This role implements some of the count_from_zero attribute behaviors so it needs to be able to read the current state.

boundary_flag_setting

    Definition: This role implements some of the file_boundary_flags attribute behaviors so it needs to be able to read the current state.

_has_shared_strings_file

    Definition: This should indicate if the sharedStrings.xml file is available

get_shared_string_position( $position )

    Definition: This should return a hashref of data for the indicated $position

    Example: from the example shared strings position 15

    {
    	'rich_text' => [				# Rich text definition
    		2,							# Position from 0 to start this element
    		{							# Element definition
    			'color' => {
    				'rgb' => 'FFFF0000'
    			},
    			'sz' => '11',
    			'b' => 1,
    			'scheme' => 'minor',
    			'rFont' => 'Calibri',
    			'family' => '2'
    		},
    		6,							# Position from 0 to start this element
    		{							# Element definition
    			'color' => {
    				'rgb' => 'FF0070C0'
    			},
    			'sz' => '20',
    			'b' => 1,
    			'scheme' => 'minor',
    			'rFont' => 'Calibri',
    			'family' => '2'
    		}
    		],
    	'raw_text' => 'Hello World',	# The raw text the format is applied to
    }

_has_styles_file

    Definition: This should indicate if the styles.xml file is available

get_format_position( $position )

    Definition: This should return a hashref of data for the indicated $position in the styles.xml file. This will include any general cell formatting as well as any references to the subroutines for number conversions either defined by Excel or any custom (user defined) conversion subroutines

    Example: from the example styles position 11

    {
    	'fontId' => '0',
    	'fonts' => {
    		'color' => {
    			'theme' => '1'
    		},
    		'sz' => '11',
    		'name' => 'Calibri',
    		'scheme' => 'minor',
    		'family' => '2'
    	},
    	'numFmtId' => '0',
    	'fillId' => '0',
    	'xfId' => '0',
    	'applyAlignment' => '1',
    	'borders' => {
    		'left' => 1,
    		'right' => 1,
    		'top' => 1,
    		'diagonal' => 1,
    		'bottom' => 1
    	},
    	'borderId' => '0',
    	'alignment' => {
    		'horizontal' => 'left'
    	},
    	'cellStyleXfs' => {
    		'fillId' => '0',
    		'fontId' => '0',
    		'borderId' => '0',
    		'numFmtId' => '0'
    	},
    	'fills' => {
    		'patternFill' => {
    			'patternType' => 'none'
    		}
    	},
    	'numFmts' => bless( {						# This is the package build Type::Tiny object
    		'name' => 'Excel_number_0',				#  to be use for coercion
    		'coercion' => bless( { 
    			~~ Type::Coercion instance here ~~
    		 }, 'Type::Coercion' ),
    	'display_name' => 'Excel_number_0',
    	'uniq' => 94
    	}, 'Type::Tiny' )
    }

Primary Methods

These are the various methods provided by this role.

get_cell( $row, $column )

    Definition: Used to return the cell or information from the cell at the specified $row and $column. Both $row and $column are required.

    Accepts: the list ( $row, $column ) both required

    Returns: see group_return_type for details on what is returned

get_next_value

    Definition: Reading left to right and top to bottom this will return the next cell with a value. This actually includes cells with no value but some unique formatting such as cells that have been merged with other cells.

    Accepts: nothing

    Returns: see group_return_type for details on what is returned

fetchrow_arrayref( $row )

    Definition: In an homage to DBI I included this function to return an array ref of the cells or values in the requested $row. If no row is requested this returns the 'next' row. In the array ref any empty and non unique cell will show as 'undef'.

    Accepts: undef = next|$row = a row integer indicating the desired row

    Returns: an array ref of all possible column positions in that row with data filled in as appropriate.

fetchrow_array( $row )

    Definition: This function is just like fetchrow_arrayref except it returns an array instead of an array ref

    Accepts: undef = next|$row = a row integer indicating the desired row

    Returns: an array of all possible column positions in that row with data filled in as appropriate.

set_headers( @header_row_list )

    Definition: This function is used to set headers used in the function fetchrow_hashref. It accepts a list of row numbers that will be collated into a set of headers used to build the hashref for each row. The header rows are coallated in sequence with the first number taking precedence. The list is also used to set the lowest row of the headers in the table. All rows at that level and higher will be considered out of the table and will return undef while setting the error instance. If some of the columns do not have values then the instance will auto generate unique headers for each empty header column to fill out the header ref.

    Accepts: a list of row numbers

    Returns: an array ref of the built headers for review

fetchrow_hashref( $row )

    Definition: This function is used to return a hashref representing the data in the specified row. If no $row value is passed it will return the 'next' row of data. A call to this function without setting the headers first will return undef and set the error instance.

    Accepts: a target $row number for return values or undef meaning 'next'

    Returns: a hash ref of the values for that row

Attributes

Data passed to new when creating the Styles instance. For modification of these attributes see the listed 'attribute methods'. For more information on attributes see Moose::Manual::Attributes. Most of these are not exposed to the top level of the workbook parser. As a consequence these attribute methods which are available at the worksheet instance level are the best way to manipulate the attribute settings.

last_header_row

    Definition: This is generally set by the method set_headers and is the largest row number of the @header_row_list even if the list is out of sequence.

    Default: undef

    attribute methods Methods provided to adjust this attribute

      get_last_header_row

        Definition: returns the value of the attribute

      has_last_header_row

        Definition: predicate for the attribute

min_header_col

    Definition: When the method fetchrow_array(ref) or fetchrow_hashref are called it is possible to only collect a set of information between two defined columns. This is the attribute that defines the start point.

    Default: undef

    attribute methods Methods provided to adjust this attribute

      set_min_header_col

        Definition: returns the value of the attribute

        get_min_header_col

          Definition: returns the value of the attribute

        has_min_header_col

          Definition: predicate for the attribute

        clear_min_header_col

          Definition: sets min_header_col to undef

max_header_col

    Definition: When the method fetchrow_array(ref) or fetchrow_hashref are called it is possible to only collect a set of information between two defined columns. This is the attribute that defines the end point.

    Default: undef

    attribute methods Methods provided to adjust this attribute

      set_max_header_col

        Definition: returns the value of the attribute

        get_max_header_col

          Definition: returns the value of the attribute

        has_max_header_col

          Definition: predicate for the attribute

        clear_max_header_col

          Definition: sets min_header_col to undef

custom_formats

    Definition: When this role is coallating data about a cell it will check this attribute before it checks the styles sheet to see if there is a format defined by the user for converting the unformatted data. The formats stored must have two methods 'assert_coerce' and 'display_name'. The collater first checks the cellID as a key, then it checks for just the column letter(s) as a key, and finally it checks the row number as a key.

    Default: undef

    attribute methods Methods provided to adjust this attribute

      set_custom_formats( { $key = $conversion } )>

        Definition: a way to set all $key => $conversion pairs at once

        Accepts: a hashref of $key => $conversion pairs

        has_custom_format( $key )

          Definition: checks if the specific $key for a format is registered

        get_custom_format( $key )

          Definition: get the custom format for the requested $key

          Returns: the $conversion registered to the $key

        set_custom_format( $key = $conversion )>

          Definition: set the custom format $conversion for the identified $key

SUPPORT

TODO

    1. Eliminate the min / max row / col calls from this role (and requireds) if possible.

AUTHOR

Jed Lund
jandrew@cpan.org

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

This software is copyrighted (c) 2014 by Jed Lund

DEPENDENCIES

SEE ALSO