NAME
Spreadsheet::Reader::ExcelXML::SharedStrings - The sharedStrings interface
SYNOPSIS
#!/usr/bin/env perl
$|=1;
use Data::Dumper;
use MooseX::ShortCut::BuildInstance qw( build_instance );
use Spreadsheet::Reader::ExcelXML::Workbook;
use Spreadsheet::Reader::ExcelXML::XMLReader;
use Spreadsheet::Reader::ExcelXML::SharedStrings;
use Spreadsheet::Reader::ExcelXML::XMLReader::PositionSharedStrings;
# This whole thing is performed under the hood of
# Spreadsheet::Reader::ExcelXML
my $file_instance = build_instance(
package => 'SharedStringsInstance',
file => 'sharedStrings.xml',
workbook_inst => Spreadsheet::Reader::ExcelXML::Workbook->new,
superclasses =>[
'Spreadsheet::Reader::ExcelXML::XMLReader'
],
add_roles_in_sequence =>[
'Spreadsheet::Reader::ExcelXML::XMLReader::PositionSharedStrings',
'Spreadsheet::Reader::ExcelXML::SharedStrings',
],
);
# Demonstrate output
print Dumper( $file_instance->get_shared_string( 3 ) );
print Dumper( $file_instance->get_shared_string( 12 ) );
#######################################
# SYNOPSIS Screen Output
# 01: $VAR1 = {
# 02: 'raw_text' => ' '
# 03: };
# 04: $VAR1 = {
# 05: 'raw_text' => 'Superbowl Audibles'
# 06: };
#######################################
DESCRIPTION
This documentation is written to explain ways to use this module when writing your own excel parser or extending this package. To use the general package for excel parsing out of the box please review the documentation for Workbooks , Worksheets , and Cells.
This class is the interface for reading the sharedStrings file in a standard xml based Excel file. The SYNOPSIS provides an example with a role added to implement that type of reading ~PositionSharedStrings. The other role written for this interface is Spreadsheet::Reader::ExcelXML::NamedSharedStrings. It does not provide connection to other file types or even the elements from other files that are related to this file. This POD documents all functionaliy required by this interface independant of where it is provided.
Methods
These are the primary ways to use this class. For additional SharedStrings options see the Attributes section.
get_shared_string( $positive_int|$name )
Definition: This returns the data in the shared strings file identified by either the $positive_int position for position based sharedStrings files or $name in name based sharedStrings files. The position implementation is Spreadsheet::Reader::ExcelXML::PositionSharedStrings. The named retrieval is implemented in Spreadsheet::Reader::ExcelXML::NamedSharedStrings.
Accepts: $positive_int ( a positive integer ) or $name depending on the associated role
Returns: a hash ref with the key 'raw_text' and all coallated text for that xml node as the value. If there is associated rich text in the node and "group_return_type" in Spreadsheet::Reader::ExcelXML is set to 'instance' then it will also have a 'rich_text' key with the value set as an arrayref of pairs (not sub array refs) with the first value being the position of the raw_text from zero that the formatting is applied and the second position as the settings for that format. Ex.
{
raw_text => 'Hello World',
rich_text =>[
2,# Starting with the letter 'l' apply the format
{
'color' => {
'rgb' => 'FFFF0000'
},
'sz' => '11',
'b' => undef,
'scheme' => 'minor',
'rFont' => 'Calibri',
'family' => '2'
},
6,# Starting with the letter 'W' apply the format
{
'color' => {
'rgb' => 'FF0070C0'
},
'sz' => '20',
'b' => undef,
'scheme' => 'minor',
'rFont' => 'Calibri',
'family' => '2'
}
]
}
loaded_correctly
Definition: This interface will check the sharedStrings file for a global scope of the number of shared strings and store it when the file is opened. If the process was succesful then this will return 1.
Accepts: nothing
Returns: (1|0) depending on if file opened as a shared strings file
Attributes
Data passed to new when creating an instance with this interface. For modification of this(ese) attribute(s) see the listed 'attribute methods'. For more information on attributes see Moose::Manual::Attributes. The easiest way to modify this(ese) attribute(s) is during instance creation before it is passed to the workbook or parser.
file
Definition: This attribute holds the file handle for the file being read. If the full file name and path is passed to the attribute the class will coerce that into an IO::File file handle.
Default: no default - this must be provided to read a file
Required: yes
Range: any unencrypted sharedStrings.xml file name and path or IO::File file handle with that content.
attribute methods Methods provided to adjust this attribute
set_file
Definition: change the file value in the attribute (this will reboot the file instance and lock the file)
get_file
Definition: Returns the file handle of the file even if a file name was passed
has_file
Definition: this is used to see if the file loaded correctly.
clear_file
Definition: this clears (and unlocks) the file handle
cache_positions
Definition: Especially for sheets with lots of stored text the parser can slow way down when accessing each postion. This is because the text is not always stored sequentially and the reader is a JIT linear parser. To go back it must restart and index through each position till it gets to the right place. This is especially true for excel sheets that have experienced any significant level of manual intervention prior to being read. This attribute turns (default) on caching for shared strings so the parser only has to read through the shared strings once. When the read is complete all the way to the end it will also release the shared strings file in order to free up some space. (a small win in exchange for the space taken by the cache). The trade off here is that all intermediate shared strings are fully read before reading the target string. This means early reads will be slower. For sheets that only have numbers stored or at least have very few strings this will likely not be a initial hit (or speed improvement). In order to minimize the physical size of the cache, if there is only a text string stored in the shared strings position then only the string will be stored (not as a value to a raw_text hash key). It will then reconstitue into a hashref when requested.
Default: 1 = caching is on
Range: 1|0
Attribute required: yes
attribute methods Methods provided to adjust this attribute
none - (will be autoset by "cache_positions" in Spreadsheet::Reader::ExcelXML)
SUPPORT
TODO
1. Nothing yet
AUTHOR
Jed Lund
jandrew@cpan.org
COPYRIGHT
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.
This software is copyrighted (c) 2016 by Jed Lund
DEPENDENCIES
Spreadsheet::Reader::ExcelXML - the package
SEE ALSO
Spreadsheet::Read - generic Spreadsheet reader
Spreadsheet::ParseExcel - Excel binary version 2003 and earlier (.xls files)
Spreadsheet::XLSX - Excel version 2007 and later
Spreadsheet::ParseXLSX - Excel version 2007 and later
All lines in this package that use Log::Shiras are commented out