NAME

Spreadsheet::Edit::IO - convert between spreadsheet and csv files

SYNOPSIS

use Spreadsheet::Edit::IO qw/
             convert_spreadsheet OpenAsCsv
             cx2let let2cx
             @sane_CSV_read_options @sane_CSV_write_options/;

# Open a CSV file or result of converting a sheet from a spreadsheet
my $hash = OpenAsCsv("/path/to/spreadsheet.odt!Sheet1");
my $hash = OpenAsCsv(inpath => "/path/to/spreadsheet.odt",
                     sheetname => "Sheet1");

print "Reading ",$hash->{csvpath}," with encoding ",$hash->{encoding},"\n";
while (<$hash->{fh}>) { ... }

# Convert CSV to spreadsheet in temp file (deleted at process exit)
$hash = convert_spreadsheet(inpath => "mycsv.csv", cvt_to => "xlsx");
print "Output is $hash->{outpath}\n"; # e.g. "/tmp/dwYT6qf/mycsv.xlsx"

# Convert *all* sheets to CSV files in a temp directory
$hash = convert_spreadsheet(inpath => "mywork.xls", cvt_to => "csv",
                            allsheets => 1);
opendir $dh, $hash->{outpath};
while (readrir($h)) { say "Output csv file is $_" }

# Transcode a CSV from windows-1252 to UTF-8
convert_spreadsheet(
    inpath  => "input.csv",  input_encoding   => 'windows-1252',
    outpath => "output.csv", output_binmode => ':raw:encoding(UTF-8):crlf'
);

# Translate between 0-based column index and letter code (A, B, etc.)
print cx2let(0);     # "A"
print let2cx("A");   # 0
print cx2let(26);    # "AA"
print let2cx("ABC"); # 730

# Extract components from "filepath!SHEETNAME" specifiers
my $path      = filepath_from_spec("/path/to/spreasheet.xls!Sheet1")
my $sheetname = sheetname_from_spec("/path/to/spreasheet.xls!Sheet1")

# Parse a csv file with sane options
my $csv = Text::CSV->new({ @sane_CSV_read_options, eol => $hash->{eol} })
  or die "ERROR: ".Text::CSV->error_diag ();
my @rows
while (my $F = $csv->getline( $infh )) {
  push @rows, $F;
}
close $infh or die "Error reading ", $hash->csvpath(), ": $!";

# Write a csv file with sane options
my $ocsv = Text::CSV->new({ @sane_CSV_write_options })
  or die "ERROR: ".Text::CSV->error_diag ();
open my $outfh, ">:encoding(utf8)", $outpath
  or die "$outpath: $!";
foreach (@rows) { $ocsv->print($outfh, $_) }
close $outfh or die "Error writing $outpath: $!";

DESCRIPTION

Convert between CSV and spreadsheet files using external tools, plus some utility functions

Currently this uses LibreOffice or OpenOffice (whatever is installed). An external tool is not needed when only CSV files are involved.

$hash = OpenAsCsv INPUT

$hash = OpenAsCsv inpath => INPUT, sheetname => SHEETNAME, ...

This is a thin wrapper for convert_spreadsheet followed by open

If a single argument is given it specifies INPUT; otherwise all arguments must be specified as key => value pairs, and may include any options supported by convert_spreadsheet.

INPUT may be a csv or spreadsheet workbook path; if a spreadsheet, then a single "sheet" is converted, specified by either a !SHEETNAME suffix in the INPUT path, a separate sheetname => SHEETNAME option, or if unspecified to extract the only sheet (croaks if there is more than one).

The resulting file handle refers to a guaranteed-seekable BOM-less CSV file. This will either be a temporary file (auto-removed at process exit), or the original INPUT if it was already a seekable csv file without a BOM.

RETURNS: A ref to a hash containing the following:

{
 fh        => the resulting open file handle
 csvpath   => the path {fh} refers to, which might be a temporary file
 sheetname => sheet name if the input was a spreadsheet
}

convert_spreadsheet INPUT, cvt_to=>suffix, OPTIONS

convert_spreadsheet INPUT, cvt_to=>"csv", allsheets => 1, OPTIONS

Convert CSV to spreadsheet or vice-versa, or transcode CSV to CSV.

RETURNS: A ref to a hash containing:

{
 outpath   => path to the output file (or directory with 'allsheets')
              (a temp file/dir if you did not specify outpath in OPTIONS).

 encoding  => the encoding used when writing .csv files
}

INPUT is the input file path; it may be a separate first argument as shown above, or else included in OPTIONS as inpath => INPUT.

If outpath => OUTPATH is specifed then results are always saved to that path. With allsheets this must be a directory, which will be created if necessary.

If outpath is NOT specified in OPTIONS then, with one exception, results are saved to a temporary file or directory and that path is returned as outpath in the result hash. The exception is if no conversion is necessary when the input file itself is returned as outpath (i.e. cvt_from is the same as cvt_to and, if 'csv', there was no BOM and an encoding change is not needed).

In all cases outpath in the result hash points to the results.

cvt_from or cvt_to are filename suffixes (sans dot) e.g. "csv", "xlsx", etc., and are only required if INPATH or outpath parameters do not contain a .suffix .

OPTIONS may also include:

sheetname => "sheet name"

The workbook 'sheet' name used when reading or writing a spreadsheet. An input sheet name may also be specified as "!sheetname" appended to the INPUT path.

allsheets => BOOL

All sheets in the input are converted to separate .csv files named "SHEETNAME.csv" in the 'outpath' directory. cvt_to => 'csv' is also requred.

input_encoding => ENCODING

Specifies the encoding of INPUT if it is a csv file.

ENCODING may be a comma-separated list of encoding names which will be tried in the order until one seems to work. If only one is specified it will be used without trying it first. The default is "UTF-8,windows-1252". If a BOM is present it overrides.

output_binmode => "..."

Used when writing csv file(s), defaults to ':raw:encoding(UTF-8):crlf'.

output_encoding => ENCODING

(Deprecated) Implies output_binmode => ':raw:encoding(ENCODIING):crlf'.

col_formats => [...]

This specifies how CSV data is imported into a spreadsheet. Each element of the array may contain:

undef, "standard" or ""  (LibreOffice will auto-detect)
"text"                   (imported as unmolested text)
"MM/DD/YY",
"DD/MM/YY",
"YY/MM/DD",
"ignore"                 (do not import this column)

Elements may also contain the numeric format codes defined by LibreOffice at https://wiki.documentfoundation.org/Documentation/DevGuide/Spreadsheet_Documents#Filter_Options_for_the_CSV_Filter

Automatic format detection: If col_formats is not specified, input CSV data is pre-scanned to auto-detect column formats as much as possible. This usually works well as long as dates are represented unambiguously, e.g. 2021-01-01 or "Jan 1, 2023".

Specifically, this reads columns containing leading zeroes as "text" (such as in U.S. Zip Codes); date forms MM/DD/YY or DD/MM/YY are read with corresponding format if a DD happens to be more than 12 (otherwise LibreOffice's default is used).

verbose => BOOL

'binmode' Argument For Reading result CSVs

The following incantation will correctly read either DOS/Windows (CR,LF) or *nix (LF) line endings properly, i.e. as a single \n:

open my $fh, "<", $resulthash->{outpath};
my $enc = $resulthash->{encoding};
binmode($fh, ":raw:encoding($enc):crlf");

@sane_CSV_read_options

@sane_CSV_write_options

These contain options you will always want to use with Text::CSV->new(). Specifically, quotes and embedded newlines are handled correctly.

Not exported by default.

cx2let COLUMNINDEX

let2cx LETTERCODE

Functions which translate between spreadsheet-column letter codes ("A", "B", etc.) and 0-based column indicies. Not exported by default.

filepath_from_spec EXPR

sheetname_from_spec EXPR

Functions which decompose strings containing a spreadsheet path and possibly sheetname suffix in any of these forms: "FILEPATH!SHEETNAME", "FILEPATH|||SHEETNAME", or "FILEPATH[SHEETNAME]". sheetname_from_spec returns undef if the input does not have a a sheetname suffix. Not exported by default.

form_spec_with_sheetname(PATH, SHEENAME)

Composes a combined string in a "preferred" format (currently "PATH!SHEETNAME"). Not exported by default.

Feature Test Functions

$bool = can_cvt_spreadsheets();

$bool = can_extract_allsheets();

$bool = can_extract_named_sheet();

These functions return false if the corresponding operations are not possible because LibreOffice (or, someday gnumeric) is not installed or is an older version which does not have needed capabilities.

$path = openlibreoffice_path();

Returns the detected path of soffice (Libre Orrice or Apache Open Office) or undef if not found.

These are not exported by default.

SEE ALSO

Spreadsheet::Edit and Text::CSV