NAME

Spreadsheet::Reader::ExcelXML::SharedStrings - The sharedStrings interface

SYNOPSIS

#!/usr/bin/env perl
$|=1;
use Data::Dumper;
use MooseX::ShortCut::BuildInstance qw( build_instance );
use Spreadsheet::Reader::ExcelXML::Workbook;
use Spreadsheet::Reader::ExcelXML::XMLReader;
use Spreadsheet::Reader::ExcelXML::SharedStrings;
use Spreadsheet::Reader::ExcelXML::XMLReader::PositionSharedStrings;

# This whole thing is performed under the hood of
#  Spreadsheet::Reader::ExcelXML
my $file_instance = build_instance(
		package      => 'SharedStringsInstance',
		file         => 'sharedStrings.xml',
		workbook_inst => Spreadsheet::Reader::ExcelXML::Workbook->new,
		superclasses =>[
			'Spreadsheet::Reader::ExcelXML::XMLReader'
		],
		add_roles_in_sequence =>[
			'Spreadsheet::Reader::ExcelXML::XMLReader::PositionSharedStrings',
			'Spreadsheet::Reader::ExcelXML::SharedStrings',
		],
	);

# Demonstrate output
print Dumper( $file_instance->get_shared_string( 3 ) );
print Dumper( $file_instance->get_shared_string( 12 ) );

#######################################
# SYNOPSIS Screen Output
# 01: $VAR1 = {
# 02:     'raw_text' => ' '
# 03: };
# 04: $VAR1 = {
# 05:     'raw_text' => 'Superbowl Audibles'
# 06: };
#######################################

DESCRIPTION

This documentation is written to explain ways to use this module when writing your own excel parser or extending this package. To use the general package for excel parsing out of the box please review the documentation for Workbooks , Worksheets , and Cells.

This class is the interface for reading the sharedStrings file in a standard xml based Excel file. The SYNOPSIS provides an example with a role added to implement that type of reading ~PositionSharedStrings. The other role written for this interface is Spreadsheet::Reader::ExcelXML::NamedSharedStrings. It does not provide connection to other file types or even the elements from other files that are related to this file. This POD documents all functionaliy required by this interface independant of where it is provided.

Methods

These are the primary ways to use this class. For additional SharedStrings options see the Attributes section.

get_shared_string( $positive_int|$name )

    Definition: This returns the data in the shared strings file identified by either the $positive_int position for position based sharedStrings files or $name in name based sharedStrings files. The position implementation is Spreadsheet::Reader::ExcelXML::PositionSharedStrings. The named retrieval is implemented in Spreadsheet::Reader::ExcelXML::NamedSharedStrings.

    Accepts: $positive_int ( a positive integer ) or $name depending on the associated role

    Returns: a hash ref with the key 'raw_text' and all coallated text for that xml node as the value. If there is associated rich text in the node and "group_return_type" in Spreadsheet::Reader::ExcelXML is set to 'instance' then it will also have a 'rich_text' key with the value set as an arrayref of pairs (not sub array refs) with the first value being the position of the raw_text from zero that the formatting is applied and the second position as the settings for that format. Ex.

    {
    	raw_text => 'Hello World',
    	rich_text =>[
    		2,# Starting with the letter 'l' apply the format
    		{
    			'color' => {
    				'rgb' => 'FFFF0000'
    			},
    			'sz' => '11',
    			'b' => undef,
    			'scheme' => 'minor',
    			'rFont' => 'Calibri',
    			'family' => '2'
    		},
    		6,# Starting with the letter 'W' apply the format
    		{
    			'color' => {
    				'rgb' => 'FF0070C0'
    			},
    			'sz' => '20',
    			'b' => undef,
    			'scheme' => 'minor',
    			'rFont' => 'Calibri',
    			'family' => '2'
    		}
    	]
    }

loaded_correctly

    Definition: This interface will check the sharedStrings file for a global scope of the number of shared strings and store it when the file is opened. If the process was succesful then this will return 1.

    Accepts: nothing

    Returns: (1|0) depending on if file opened as a shared strings file

Attributes

Data passed to new when creating an instance with this interface. For modification of this(ese) attribute(s) see the listed 'attribute methods'. For more information on attributes see Moose::Manual::Attributes. The easiest way to modify this(ese) attribute(s) is during instance creation before it is passed to the workbook or parser.

file

    Definition: This attribute holds the file handle for the file being read. If the full file name and path is passed to the attribute the class will coerce that into an IO::File file handle.

    Default: no default - this must be provided to read a file

    Required: yes

    Range: any unencrypted sharedStrings.xml file name and path or IO::File file handle with that content.

    attribute methods Methods provided to adjust this attribute

      set_file

        Definition: change the file value in the attribute (this will reboot the file instance and lock the file)

      get_file

        Definition: Returns the file handle of the file even if a file name was passed

      has_file

        Definition: this is used to see if the file loaded correctly.

      clear_file

        Definition: this clears (and unlocks) the file handle

cache_positions

    Definition: Especially for sheets with lots of stored text the parser can slow way down when accessing each postion. This is because the text is not always stored sequentially and the reader is a JIT linear parser. To go back it must restart and index through each position till it gets to the right place. This is especially true for excel sheets that have experienced any significant level of manual intervention prior to being read. This attribute turns (default) on caching for shared strings so the parser only has to read through the shared strings once. When the read is complete all the way to the end it will also release the shared strings file in order to free up some space. (a small win in exchange for the space taken by the cache). The trade off here is that all intermediate shared strings are fully read before reading the target string. This means early reads will be slower. For sheets that only have numbers stored or at least have very few strings this will likely not be a initial hit (or speed improvement). In order to minimize the physical size of the cache, if there is only a text string stored in the shared strings position then only the string will be stored (not as a value to a raw_text hash key). It will then reconstitue into a hashref when requested.

    Default: 1 = caching is on

    Range: 1|0

    Attribute required: yes

    attribute methods Methods provided to adjust this attribute

SUPPORT

TODO

    1. Nothing yet

AUTHOR

    Jed Lund

    jandrew@cpan.org

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

This software is copyrighted (c) 2016 by Jed Lund

DEPENDENCIES

SEE ALSO