NAME

Spreadsheet::Read::Ingester - ingest and save csv and spreadsheet data to a perl data structure to avoid reparsing

SYNOPSIS

use Spreadsheet::Read::Ingester;

# ingest raw file, store parsed data file, and return data object
my $data = Spreadsheet::Read::Ingester->new('/path/to/file');

# the returned data object has all the methods of a L<Spreadsheet::Read> object
my $num_cols = $data->sheet(1)->maxcol;

# delete old data files older than 30 days to save disk space
Spreadsheet::Read::Ingester->cleanup;

DESCRIPTION

This module is intended to be a drop-in replacement for <Spreadsheet::Read> and is a simple, unobtrusive wrapper for it.

Parsing spreadsheet and csv data files is time consuming, especially with large data sets. If a data file is ingested more than once, much time and processing power is wasted reparsing the same data. To avoid reparsing, this module uses <Storable> to save a parsed version of the data to disk when a new file is ingested. All subsequent ingestions are retrieved from the stored Perl data structure. Files are saved in the directory determined by File::UserConfig and is a function of the user's OS.

The stored data file names are the unique file signatures for the raw data file. The signature is used to detect if the original file changed, in which case the data is reingested from the raw file and a new parsed file is saved using an updated file signature. Parsed data files are kept indefinitely but can be deleted with the cleanup() method.

Consult the Spreadsheet::Read documentation for accessing the data object returned by this module.

METHODS

new( $path_to_file )

my $data = Spreadsheet::Read::Ingester->new('/path/to/file');

Takes same arguments as the new constructor in Spreadsheet::Read module. Returns an object identical to the object returned by the Spreadsheet::Read module along with its corresponding methods.

cleanup( $file_age_in_days )

cleanup()

Spreadsheet::Read::Ingester->cleanup(0);

Deletes all stored files from the user's application data directory. Takes an optional argument indicating the minimum number of days old the file must be before it is deleted. Defaults to 30 days. Passing a value of 0 deletes all files.

REQUIRES

SUPPORT

Perldoc

You can find documentation for this module with the perldoc command.

perldoc Spreadsheet::Read::Ingester

Websites

The following websites have more information about this module, and may be of help to you. As always, in addition to those websites please use your favorite search engine to discover more resources.

Source Code

The code is open to the world, and available for you to hack on. Please feel free to browse it and play with it, or whatever. If you want to contribute patches, please send me a diff or prod me to pull from your repository :)

https://github.com/sdondley/Spreadsheet-Read-Ingester

git clone git://github.com/sdondley/Spreadsheet-Read-Ingester.git

BUGS AND LIMITATIONS

You can make new bug reports, and view existing ones, through the web interface at https://github.com/sdondley/Spreadsheet-Read-Ingester/issues.

INSTALLATION

See perlmodinstall for information and options on installing Perl modules.

SEE ALSO

Spreadsheet::Read

AUTHOR

Steve Dondley <s@dondley.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2019 by Steve Dondley.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.