NAME
Text::AutoCSV - helper module to automate the use of Text::CSV
VERSION
version 1.2.0
SYNOPSIS
By default, Text::AutoCSV will detect the following characteristics of the input:
- The separator, among ",", ";" and "\t" (tab)
- The escape character, among '"' (double-quote) and '\\' (backslash)
- Try UTF-8 and if it fails, fall back on latin1
- Read the header line and compute field names
- If asked to (see "fields_dates_auto"), detect any field that contains a DateTime value, trying 20 date formats, possibly followed by a time (6 time formats tested)
- If asked to (see "fields_dates"), detect DateTime format of certain fields, croak if no DateTime format can be worked out
- Fields identified as containing a DateTime value ("fields_dates_auto" or "fields_dates") are stored as DateTime objects by default
Text::AutoCSV also provides methods to search on fields (using cached hash tables) and it can populate the value of "remote" fields, made from joining 2 CSV files with a key-value search
General
use Text::AutoCSV;
# Read CSV from std input, write to std output
Text::AutoCSV->new()->write();
# Read CSV data from f.csv, write to std output
Text::AutoCSV->new(in_file => 'f.csv')->write();
# Read CSV data from f.csv, write to g.csv
Text::AutoCSV->new(in_file => 'f.csv', out_file => 'g.csv')->write();
# "Rewrite" CSV file by printing out records as a list (separated by
# line breaks) of field name followed by its value.
my $csv = Text::AutoCSV->new(in_file => 'in.csv', walker_hr => \&walk);
my @cols = $csv->get_fields_names();
$csv->read();
sub walk {
my %rec = %{$_[0]};
for (@cols) {
next if $_ eq '';
print("$_ => ", $rec{$_}, "\n");
}
print("\n");
}
OBJ-ish functions
# Identify column internal names with more flexibility as the default
# mechanism
my $csv = Text::AutoCSV->new(in_file => 'zips.csv',
fields_hr => {'CITY' => '^(city|town)', 'ZIPCODE' => '^zip(code)?$'});
# Get zipcode of Claix
my $z = $csv->vlookup('CITY', ' claix ', 'ZIPCODE');
my $csv = Text::AutoCSV->new(in_file => 'zips.csv');
# Get zipcode of Claix
my $z = $csv->vlookup('CITY', ' claix ', 'ZIPCODE');
# Same as above, but vlookup is strict for case and spaces around
my $csv = Text::AutoCSV->new(in_file => 'zips.csv', search_case => 1,
search_trim => 0);
my $z = $csv->vlookup('CITY', 'Claix', 'ZIPCODE');
# Create field 'MYCITY' made by taking pers.csv' ZIP column value, looking
# it up in the ZIPCODE columns of zips.csv, taking CITY colmun value and
# naming it 'MYCITY'. Output is written in std output. If a zipcode is
# ambiguous, say it.
Text::AutoCSV->new(in_file => 'pers.csv')
->field_add_link('MYCITY', 'ZIP->ZIPCODE->CITY', 'zips.csv',
{ ignore_ambiguous => 0,
value_if_ambiguous => '<duplicate zipcode found!>' }
)->write();
# Note the above can also be written using Text::AutoCSV level attributes:
Text::AutoCSV->new(in_file => 'pers.csv',
search_ignore_ambiguous => 0,
search_value_if_ambiguous => '<duplicate zipcode found!>')
->field_add_link('MYCITY', 'ZIP->ZIPCODE->CITY', 'zips.csv')->write();
# Create 'MYCITY' field as above, then display some statistics
my $nom_compose = 0;
my $zip_not_found = 0;
Text::AutoCSV->new(in_file => 'pers.csv', walker_hr => \&walk)
->field_add_link('MYCITY', 'ZIP->ZIPCODE->CITY', 'zips.csv')->read();
sub walk {
my $hr = shift;
$nom_compose++ if $hr->{'NAME'} =~ m/[- ]/;
$zip_not_found++ unless defined($hr->{'MYCITY'});
}
print("Number of persons with a multi-part name: $nom_compose\n");
print("Number of persons with unknown zipcode: $zip_not_found\n");
Updating
Text::AutoCSV->new(in_file => 'names.csv', out_file => 'ucnames.csv',
read_post_update_hr => \&updt)->write();
sub updt { $_[0]->{'LASTNAME'} =~ s/^.*$/\U&/; }
Text::AutoCSV->new(in_file => 'squares.csv',
out_file => 'checkedsquares.csv',
out_filter => \&wf)->write();
sub wf { return ($_[0]->{'X'} ** 2 == $_[0]->{'SQUAREOFX'}); }
# Add a field for the full name, made of the concatenation of the
# first name and the last name.
# Also display stats about empty full names.
Text::AutoCSV->new(in_file => 'dirpeople.csv', out_file => 'dirwithfn.csv',
verbose => 1)
->field_add_computed('FULLNAME', \&calc_fn)->write();
sub calc_fn {
my ($field, $hr, $stats) = @_;
my $fn = $hr->{'FIRSTNAME'} . ' ' . uc($hr->{'LASTNAME'});
$stats->{'empty full name'}++ if $fn eq ' ';
return $fn;
}
# Read a file with a lot of columns and keep only 2 columns in output
Text::AutoCSV->new(in_file => 'big.csv', out_file => 'addr.csv',
out_fields => ['NAME', 'ADDRESS'])
->out_header('ADDRESS', 'Postal Address')
->write();
Datetime management
# Detect any field containing a DateTime value and convert it to yyyy-mm-dd
# whatever the input format is.
Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv',
fields_dates_auto => 1, out_dates_format => '%F')->write();
# Detect any field containing a DateTime value and convert it to a US
# DateTime whatever the input format is.
Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv',
fields_dates_auto => 1,
out_dates_format => '%b %d, %Y, %I:%M:%S %p',
out_dates_locale => 'en')->write();
# Find dates of specific formats and convert it into yyyy-mm-dd
Text::AutoCSV->new(in_file => 'raw.csv', out_file => 'cooked.csv',
dates_formats_to_try => ['%d_%m_%Y', '%m_%d_%Y', '%Y_%m_%d'],
out_dates_format => '%F')->write();
# Take the dates on columns 'LASTLOGIN' and 'CREATIONDATE' and convert it
# into French dates (day/month/year).
# Text::AutoCSV will croak if LASTLOGIN or CREATIONDATE do not contain a
# DateTime format. By default, Text::AutoCSV will try 20 different formats.
Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv',
fields_dates => ['LASTLOGIN', 'CREATIONDATE'],
out_dates_format => '%d/%m/%Y')->write();
# Convert 2 DateTime fields into unix standard epoch
# Write -1 if DateTime is empty.
sub toepoch { return $_->epoch() if $_; -1; }
Text::AutoCSV->new(in_file => 'stats.csv', out_file => 'stats-epoch.csv',
fields_dates => ['ATIME', 'MTIME'])
->in_map('ATIME', \&toepoch)
->in_map('MTIME', \&toepoch)
->write();
# Do the other way round from above: convert 2 fields containing unix
# standard epoch into a string displaying a human-readable DateTime.
my $formatter = DateTime::Format::Strptime->new(
pattern => 'DATE=%F, TIME=%T'
);
sub fromepoch {
return $formatter->format_datetime(DateTime->from_epoch(epoch => $_))
if $_ >= 0;
'';
}
$csv = Text::AutoCSV->new(in_file => 'stats-epoch.csv',
out_file => 'stats2.csv'
)
->in_map('ATIME', \&fromepoch)
->in_map('MTIME', \&fromepoch)
->write();
Miscellaneous
use Text::AutoCSV 'remove_accents';
# Output 'Francais: etre elementaire, Tcheque: sluzba dum' followed by a new
# line.
print remove_accents("Français: être élémentaire, Tchèque: služba dům"),
"\n";
csvcopy.pl
As of version 1.2.0, csvcopy.pl is installed along with Text::AutoCSV. It is a command-line wrapper to Text::AutoCSV, run
csvcopy.pl --help
to get help.
NAME
Text::AutoCSV - helper module to automate the use of Text::CSV
METHODS
new
my $csv = Text::AutoCSV->new(%attr);
(Class method) Returns a new instance of Text::AutoCSV. The object attributes are described by the hash %attr
(can be empty).
Currently the following attributes are available:
- Preliminary note about "fields_hr", "fields_ar" and "fields_column_names" attributes
-
By default, Text::AutoCSV assumes the input has a header and will use the field values of this first line (the header) to work out the column internal names. These internal names are used everywhere in Text::AutoCSV to designate columns.
The values are transformed as follows:
- All accents are removed using the exportable function "remove_accents".
- Any non-alphanumeric character is removed (except underscore) and all letters are switched to upper case. The regex to do this is
s/[^[:alnum:]_]//gi; s/^.*$/\U$&/;
Thus a header line of
'Office Number 1,Office_2,Personal Number'
will produce the internal column names
'OFFICENUMBER1' (first column) 'OFFICE_2' (second column) 'PERSONALNUMBER' (third column).
The attribute "fields_hr", "fields_ar" or "fields_column_names" (only one of the three is useful at a time) allows to change this behavior.
NOTE
The removal of accents is *not* a conversion to us-ascii, see "remove_accents" for details.
- Preliminary note about fields reading
-
Functions that are given a field name ("get_cell", "vlookup", "field_add_copy", ...) raise an error if the field requested does not exist.
SO WILL THE HASHREFS GIVEN BY Text::AutoCSV: when a function returns a hashref ("search_1hr", "get_row_hr", ...), the hash is locked with the
lock_keys
function ofHash::Util
. Any attempt to read a non-existing key from the hash causes a croak. This feature is de-activated if you specifiedcroak_if_error => 0
when creating Text::AutoCSV object. - in_file
-
The name of the file to read CSV data from.
If not specified or empty, read standard input.
Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv');
- inh
-
File handle to read CSV data from. Normally you don't want to specify this attribute.
inh
is useful if you don't like the way Text::AutoCSV opens the input file for you.Example:
open my $inh, "producecsv.sh|"; my $csv = Text::AutoCSV->new(inh => $inh);
- encoding
-
Comma-separated list of encodings to try to read input.
Note that finding the correct encoding of any given input is overkill. This script just tries encodings one after the other, and selects the first one that does not trigger a warning during reading of input. If all produce warnings, select the first one.
The encoding chosen is used in output, unless attribute "out_encoding" is specified.
Value by default: 'UTF-8,latin1'
IMPORTANT
If one tries something like
encoding => 'latin1,UTF-8'
, it'll almost never detect UTF-8 because latin1 rarely triggers warnings during reading. It tends to be also true with encodings like UTF-16 that can remain happy with various inputs (sometimes resulting in Western languages turned into Chinese text).Ultimately this attribute should be used with a unique value. The result when using more than one value can produce weird results and should be considered experimental.
Example:
my $csv = Text::AutoCSV->new(in_file => 'w.csv', encoding => 'UTF-16');
- via
-
Adds a
via
to the file opening instruction performed by Text::AutoCSV. You don't want to use it under normal circumstances.The value should start with a ':' character (Text::AutoCSV won't add one for you).
Value by default: none
Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', via => ':raw:perlio:UTF-32:crlf');
- dont_mess_with_encoding
-
If true, just ignore completely encoding and don't try to alter I/O operations with encoding considerations (using
binmode
instruction). Note that if inh attribute is specified, then Text::AutoCSV will consider the caller manages encoding for himself and dont_mess_with_encoding will be automatically set, too.IMPORTANT
This attribute does not mean perl will totally ignore encoding and would consider character strings as bytes for example. The meaning of "dont_mess_with_encoding" is that Text::AutoCSV itself will totally ignore encoding matters, and leave it entirely to Perl' default.
Value by default:
0 if inh attribute is not set 1 if inh attribute is set
Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', dont_mess_with_encoding => 1);
- sep_char
-
Specify the CSV separator character. Turns off separator auto-detection. This attribute is passed as is to
Text::CSV->new()
.Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', sep_char => ';');
- quote_char
-
Specify the field quote character. This attribute is passed as is to
Text::CSV->new()
.Value by default: double quote ('"')
Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', quote_char => '\'');
- escape_char
-
Specify the escape character. Turns off escape character auto-detection. This attribute is passed as is to
Text::CSV->new()
.Value by default: backslash ('\\')
Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', escape_char => '"');
- in_csvobj
-
Text::CSV object to use. Normally you don't want to specify this attribute.
By default, Text::AutoCSV will manage creating such an object and will work hard to detect the parameters it requires.
Defining
in_csvobj
attribute turns off separator character and escape character auto-detection.Using this attribute workarounds Text::AutoCSV philosophy a bit, but you may need it in case Text::AutoCSV behavior is not suitable for Text::CSV creation.
Example:
my $tcsv = Text::CSV->new(); my $acsv = Text::AutoCSV->new(in_file => 'in.csv', in_csvobj => $tcsv);
- has_headers
-
If true, Text::AutoCSV assumes the input has a header line.
Value by default: 1
Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', has_headers => 0);
- fields_hr
-
(Only if input has a header line) Hash ref that contains column internal names along with a regular expression to find it in the header line. For example if you have:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', fields_hr => {'PHONE OFFICE' => '^office phone nu', 'PHONE PERSONAL' => '^personal phone nu'});
And the header line is
'Personal Phone Number,Office Phone Number'
the column name 'PHONE OFFICE' will designate the second column and the column name 'PHONE PERSONAL' will designate the first column.
You can choose column names like 'Phone Office' and 'Phone Personal' as well.
The regex search is case insensitive.
- fields_ar
-
(Only if input has a header line) Array ref that contains column internal names. The array is used to create a hash ref of the same kind as "fields_hr", by wrapping the column name in a regex. The names are surrounded by a leading '^' and a trailing '$', meaning, the name must match the entire field name.
For example
fields_ar => ['OFFICENUMBER', 'PERSONALNUMBER']
is strictly equivalent to
fields_hr => {'OFFICENUMBER' => '^officenumber$', 'PERSONALNUMBER' = '^personalnumber$'}
The regex search is case insensitive.
fields_ar
is useful if the internal names are identical to the file column names. It avoids repeating the names over and over as would happen if using "fields_hr" attribute.NOTE
You might wonder why using fields_ar as opposed to Text::AutoCSV default' mechanism. There are two reasons for that:
1- Text::AutoCSV removes spaces from column names, and some people may want another behavior. A header name of 'Phone Number' will get an internal column name of 'PHONENUMBER' (default behavior, if none of fields_hr, fields_ar and fields_column_names attributes is specified), and one may prefer 'PHONE NUMBER' or 'phone number' or whatsoever.
2- By specifying a list of columns using either of fields_hr or fields_ar, you not only map column names as found in the header line to internal column names: you also request these columns to be available. If one of the requested columns cannot be found, Text::AutoCSV will croak (default) or print an error and return an undef object (if created with
croak_if_error => 0
). - fields_column_names
-
Array ref of column internal names, in the order of columns in file. This attribute works like the
column_names
attribute of Text::CSV. It'll just assign names to columns one by one, regardless of what the header line contains. It'll work also if the file has no header line.Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', fields_column_names => ['My COL1', '', 'My COL3']);
- out_file
-
Output file when executing the "write" method.
If not specified or empty, write to standard output.
Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv');
- outh
-
File handle to write CSV data to when executing the "write" method. Normally you don't want to specify this attribute.
outh
is useful if you don't like the way Text::AutoCSV opens the output file for you.Example:
my $outh = open "myin.csv', ">>"; my $csv = Text::AutoCSV->new(in_file => 'in.csv', has_headers => 0, outh => $outh);
- out_encoding
-
Enforce the encoding of output.
Value by default: input encoding
Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv', out_encoding => 'UTF-16');
- out_utf8_bom
-
Enforce BOM (Byte-Order-Mark) on output, when it is UTF8. If output encoding is not UTF-8, this attribute is ignored.
NOTE
UTF-8 needs no BOM (there is no Byte-Order in UTF-8), and in practice, UTF8-encoded files rarely have a BOM.
Using this attribute is not recommended. It is provided for the sake of completeness, and also to produce Unicode files Microsoft EXCEL will be happy to read.
At first sight it would seem more logical to make EXCEL happy with something like this:
out_encoding => 'UTF-16'
But... While EXCEL will identify UTF-16 and read it as such, it will not take into account the BOM found at the beginning. In the end the first cell will have 2 useless characters prepended. The only solution the author knows to workaround this issue if to use UTF-8 as output encoding, and enforce a BOM. That is, use:
..., out_encoding => 'UTF-8', out_utf8_bom => 1, ...
- out_sep_char
-
Enforce the output CSV separator character.
Value by default: input separator
Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv', out_sep_char => ',');
- out_quote_char
-
Enforce the output CSV quote character.
Value by default: input quote character
Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv', out_quote_char => '"');
- out_escape_char
-
Enforce the output CSV escape character.
Value by default: input escape character
Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv', out_escape_char_char => '\\');
- out_always_quote
-
If true, quote all fields of output (set always_quote of Text::CSV).
If false, don't quote all fields of output (don't set
always_quote
of Text::CSV).Value by default: same as what is found in input
While reading input, Text::AutoCSV works out whether or not all fields were quoted. If yes, then the output Text::CSV object has the always_quote attribute set, if no, then the output Text::CSV object does not have this attribute set.
Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv', out_always_quote => 1);
- out_has_headers
-
If true, when writing output, write a header line on first line.
If false, when writing output, don't write a header line on first line.
Value by default: same as has_headers attribute
Example 1
Read standard input and write to standard output, removing the header line.
Text::AutoCSV->new(out_has_headers => 0)->write();
Example 2
Read standard input and write to standard output, adding a header line.
Text::AutoCSV->new(fields_column_names => ['MYCOL1', 'MYCOL2'], out_has_headers => 1)->write();
- no_undef
-
If true, non-existent column values are set to an empty string instead of undef. It is also done on extra fields that happen to have an undef value (for example when the target of a linked field is not found).
Note this attribute does not work on callback functions output set with "in_map": for example empty DateTime values (on fields identified as containing a date/time, see
dates_*
attributes below) are set toundef
, even whileno_undef
is set. Indeed setting it to an empty string while non-empty values would contain a Datetime object would not be clean. An empty value in a placeholder containing an object must be undef.Since version 1.1.5 of Text::AutoCSV,
no_undef
is examined when sending parameter ($_) to "in_map" callback: an undef value is now passed as is (as undef), unlessno_undef
is set. Ifno_undef
is set, and field value is undef, then $_ is set to the empty string ('') when calling callback defined by "in_map". This new behavior was put in place to be consistent with what is being done with DateTime values.Value by default: 0
Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', no_undef => 1);
- read_post_update_hr
-
To be set to a ref sub. Each time a record is read from input, call
read_post_update_hr
to update the hash ref of the record. The sub is called with 2 arguments: the hash ref to the record value and the hash ref to stats.The stats allow to count events and are printed in the end of reading in case Text::AutoCSV is called in verbose mode (
verbose => 1
).For example, the
read_post_update_hr
below will turn column 'CITY' values in upper case and count occurences of empty cities in stat display:Text::AutoCSV->new(in_file => 'addresses.csv', read_post_update_hr => \&updt, verbose => 1) ->write(); sub updt { my ($hr, $stats) = @_; $hr->{'CITY'} =~ s/^.*$/\U$&/; $stats->{'empty city encountered'}++ if $hr->{'CITY'} eq ''; }
IMPORTANT
You cannot create a field this way. To create a field, you have to use the member functions "field_add_link", "field_add_copy" or "field_add_computed".
NOTE
If you wish to manage some updates at field level, consider registering update functions with "in_map" and "out_map" member functions. These functions register callbacks that work at field level and with $_ variable (thus the callback function invoked is AutoCSV-agnostic).
"in_map" updates a field after read, "out_map" updates the field content before writing it.
- walker_hr
-
To set to a sub ref that'll be executed each time a record is read from input. It is executed after "read_post_update_hr". The sub is called with 2 arguments: the hash ref to the record value and the hash ref to stats.
Note "read_post_update_hr" is meant for updating record fields just after reading, whereas "walker_hr" is read-only.
The stats allow to count events and are printed in the end of reading in case Text::AutoCSV is called in verbose mode (
verbose => 1
). If the "verbose" attribute is not set, the stats are not displayed, however you can get stats by calling the get_stats function.The example below will count in the stats the number of records where the 'CITY' field is empty. Thanks to
verbose => 1
attribute, at the end of reading the stats are displayed.my $csv = Text::AutoCSV->new(in_file => 'addresses.csv', walker_hr => \&walk1, verbose => 1)->read(); sub walk1 { my ($hr, $stats) = @_; $stats->{'empty city'}++ if $hr->{'CITY'} eq ''; }
- walker_ar
-
To set to a sub ref that'll be executed each time a record is read from input. It is executed after "read_post_update_hr". The sub is called with 2 arguments: the array ref to the record value and the hash ref to stats.
Note "read_post_update_hr" is meant for updating record fields just after reading, whereas
walker_hr
is read-only.The stats allow to count events and are printed in the end of reading in case Text::AutoCSV is called in verbose mode (
verbose => 1
). If the "verbose" attribute is not set, the stats are lost.The array ref contains values in their natural order in the CSV. To be used with the column names, you have to use "get_fields_names" member function.
The example below will count in the stats the number of records where the 'CITY' field is empty. Thanks to
verbose => 1
attribute, at the end of reading the stats are displayed. It produces the exact same result as the example in walker_hr attribute, but it uses walker_ar.use List::MoreUtils qw(first_index); my $csv = Text::AutoCSV->new(in_file => 'addresses.csv', walker_ar => \&walk2, verbose => 1); my @cols = $csv->get_fields_names(); my $idxCITY = first_index { /^city$/i } @cols; die "No city field!??" if $idxCITY < 0; $csv->read(); sub walk2 { my ($ar, $stats) = @_; $stats->{'empty city'}++ if $ar->[$idxCITY] eq ''; }
- write_filter_hr
-
Alias of "out_filter".
- out_filter
-
To be set to a ref sub. Before writing a record to output,
out_filter
is called and the record gets writen only ifout_filter
return value is true. The sub is called with 1 argument: the hash ref to the record value.For example, if you want to output only records where the 'CITY' column value is Grenoble:
Text::AutoCSV->new(in_file => 'addresses.csv', out_file => 'grenoble.csv', out_filter => \&filt)->write(); sub filt { my $hr = shift; return 1 if $hr->{'CITY'} =~ /^grenoble$/i; return 0; }
- write_fields
-
Alias of "out_fields".
- out_fields
-
Set to an array ref. List fields to write to output.
Fields are written in their order in the array ref, the first CSV column being the first element in the array, and so on. Fields not listed in out_fields are not written in output.
You can use empty field names to have empty columns in output.
Value by default: none, meaning, all fields are output in their natural order. What is natural order? It is the input order for fields that were read from input, and the order in which they got created for created fields.
Example:
Text::AutoCSV->new(in_file => 'allinfos.csv', out_file => 'only-addresses.csv', out_fields => [ 'NAME', 'ADDRESS' ] )->write();
- out_orderby
-
Array reference to a list of fields to sort output with.
At the moment this feature is a bit of a hack (no option to make sort descending or ascending, numeric or text, and it is not part of test plan).
Example:
Text::AutoCSV->new(in_file => 'names.csv', out_file => 'sortednames.csv', out_orderby => [ 'LASTNAME', 'FIRSTNAME']);
- search_case
-
If true, searches are case sensitive by default. Searches are done by the member functions "search", "search_1hr", "vlookup", and linked fields ("field_add_link").
The search functions can also be called with the option "case", that takes precedence over the object-level
search_case
attribute value. See "vlookup" help.Value by default: 0 (by default searches are case insensitive)
Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', search_case => 1);
- search_trim
-
If true, searches ignore the presence of leading or trailing spaces in values.
The search functions can also be called with the option "trim", that takes precedence over the object-level
search_trim
attribute value. See "vlookup" help.Value by default: 1 (by default searches ignore leading and trailing spaces)
Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', search_trim => 0);
- search_ignore_empty
-
If true, empty fields are not included in the search indexes.
The search functions can also be called with the option "ignore_empty", that takes precedence over the object-level
search_ignore_empty
attribute value. See "vlookup" help.Value by default: 1 (by default, search of the value '' will find nothing)
Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', search_ignore_empty => 0);
- search_ignore_accents
-
If true, accents are ignored by search indexes.
The search functions can also be called with the option "ignore_accents", that takes precedence over the object-level
search_ignore_accents
attribute value. See "vlookup" help.Value by default: 1 (by default, accents are ignored by search functions)
Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', search_ignore_accents => 0);
- search_value_if_not_found
-
When a search is done with a unique value to return (field_add_link member function behavior or return value of vlookup), default value of option "value_if_not_found". See "vlookup".
- search_value_if_found
-
When a search is done with a unique value to return (field_add_link member function behavior or return value of vlookup), default value of option "value_if_found". See "vlookup".
IMPORTANT
This attribute is extremly unusual. Once you've provided it, all vlookups and the target field value of fields created with field_add_link will all be populated with the value provided with this option.
Don't use it unless you know what you are doing.
- search_ignore_ambiguous
-
When a search is done with a unique value to return (field_add_link member function behavior or return value of search_1hr and vlookup), default value of option "ignore_ambiguous". See "vlookup".
- search_value_if_ambiguous
-
When a search is done with a unique value to return (field_add_link member function behavior or return value of vlookup), default value of option "value_if_ambiguous". See "vlookup".
- fields_dates
-
Array ref of field names that contain a date.
Once the formats of these fields is known (auto-detection by default), each of these fields will get a specific "in_map" sub that converts the text in a DateTime object and a "out_map" sub that converts back from DateTime to text.
NOTE
The "out_map" given to a DateTime field is "defensive code": normally, "in_map" converts text into a DateTime object and "out_map" does the opposite, it takes a DateTime object and converts it to text. If ever "out_map" encounters a value that is not a DateTime object, it'll just stringify it (evaluation in a string context), without calling its DateTime formatter.
If the format cannot be detected for a given field, output an error message and as always when an error occurs, croak (unless "croak_if_error" got set to 0).
Value by default: none
Example:
my $csv = Text::AutoCSV->new(in_file => 'logins.csv', fields_dates => ['LASTLOGIN', 'CREATIONDATE']);
- fields_dates_auto
-
Boolean value. If set to 1, will detect dates formats on all fields. Fields in which a DateTime format got detected are then managed as if they had been being listed in "fields_dates" attribute: they get an appropriate "in_map" sub and a "out_map" sub to convert to and from DateTime (see "fields_dates" attribute above).
fields_dates_auto
looks for DateTime on all fields, but it expects nothing: it won't raise an error if no field is found that contains DateTime.Value by default: 0
Example:
my $csv = Text::AutoCSV->new(in_file => 'logins.csv', fields_dates_auto => 1);
- fields_dates_auto_optimize
-
Relevant only if "fields_dates_auto" is set.
Normally when "fields_dates_auto" is set, the input is read completely to make sure auto-detection produces a reliable result. If
fields_dates_auto_optimize
is set, this reading pass will stop as soon as there is no ambiguity left. That is, for every fields in input, the date format (or the fact that no date format is suitable) is known.Using this option is a bit risky because it could trigger a date format detection that later in the input, would turn out to be wrong. Should that be the case, strange errors will occur, that are not easy to understand. Use it at your own risk.
Value by default: 0
Example:
my $csv = Text::AutoCSV->new(in_file => 'logins.csv', fields_dates_auto => 1, fields_dates_auto_optimize => 1);
- dates_formats_to_try
-
Array ref of string formats.
Text::AutoCSV has a default built-in list of 20 date formats to try and 6 time formats (also it'll combine any date format with any time format).
dates_formats_to_try
will replace Text::AutoCSV default format-list will the one you specify, in case the default would not produce the results you expect.The formats are written in Strptime format.
Value by default (see below about the role of the pseudo-format ''):
[ '', '%Y-%m-%d', '%Y.%m.%d', '%Y/%m/%d', '%m.%d.%y', '%m-%d-%Y', '%m.%d.%Y', '%m/%d/%Y', '%d-%m-%Y', '%d.%m.%Y', '%d/%m/%Y', '%m-%d-%y', '%m/%d/%y', '%d-%m-%y', '%d.%m.%y', '%d/%m/%y', '%Y%m%d%H%M%S', '%b %d, %Y', '%b %d %Y', '%b %d %T %Z %Y', '%d %b %Y', '%d %b, %Y' ]
IMPORTANT
The empty format (empty string) has a special meaning: when specified, Text::AutoCSV will be able to identify fields that contain only a time (not preceeded by a date).
Note
Format identification is over only when there is no more ambiguity. So the usual pitfall of US versus French dates (month-day versus day-month) gets resolved only when a date is encountered that disambiguates it (a date of 13th of the month or later).
Example with a weird format that uses underscores to separate elements, using either US (month, day, year), French (day, month, year), or international (year, month, day) order:
my $csv = Text::AutoCSV->new(in_file => 'logins.csv', dates_formats_to_try => ['%d_%m_%Y', '%m_%d_%Y', '%Y_%m_%d']);
- dates_formats_to_try_supp
-
Same as "dates_formats_to_try" but instead of replacing the default list of formats used during detection, it is added to this default list.
You want to use this attribute if you need a specific DateTime format while continuing to benefit from the default list.
IMPORTANT
Text::AutoCSV will identify a given Datetime format only when there is no ambiguity, meaning, one unique Datetime format matches (all other failed). Adding a format that already exists in the default list will prevent the format from being identified, as it'll always be ambiguous. See "dates_formats_to_try" for the default list of formats.
Example:
my $csv = Text::AutoCSV->new(in_file => 'logins.csv', dates_formats_to_try_supp => ['%d_%m_%Y', '%m_%d_%Y', '%Y_%m_%d']);
- dates_ignore_trailing_chars
-
If set to 1, DateTime auto-detection will ignore trailing text that may follow detected DateTime-like text.
Value by default: 1 (do ignore trailing chars)
my $csv = Text::AutoCSV->new(in_file => 'logins.csv', dates_ignore_trailing_chars => 0);
- dates_search_time
-
If set to 1, look for times when detecting DateTime format. That is, whenever a date format candidate is found, a longer candidate that also contains a time (after the date) is tested.
Value by default: 1 (do look for times when auto-detecting DateTime formats)
Example:
my $csv = Text::AutoCSV->new(in_file => 'logins.csv', dates_search_time => 0);
- dates_locales
-
Comma-separated string of locales to test when detecting DateTime formats. Ultimately, Text::AutoCSV will try all combinations of date formats, times and locales.
Value by default: none (use perl default locale)
Example:
my $csv = Text::AutoCSV->new(in_file => 'logins.csv', dates_locales => 'fr,de,en');
- dates_zeros_ok
-
Boolean. If true, a date made only of 0s is regarded as being empty.
For example if
dates_zeros_ok
is False, then a date like 0000-00-00 will be always incorrect (as the day and month are out of bounds), therefore a format like '%Y-%m-%d' will never match for the field.Conversely if
dates_zeros_ok
is true, then a date like 0000-00-00 will be processed as if being the empty string, thus the detection of format will work and when parsed, this "full of zeros" dates will be processed the same way as the empty string (= value will be undef).IMPORTANT
"0s dates" are evaluated to undef when parsed, thus when converted back to text (out_map), they are set to an empty string, not to the original value.
Value by default: 1
Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', dates_zeros_ok => 0);
- out_dates_format
-
Enforce the format of dates in output, for all fields that contain a DateTime value.
The format is written in Strptime format.
Value by default: none (by default, use format detected on input)
Example:
# Detect any field containing a DateTime value and convert it to # yyyy-mm-dd whatever the input format is. Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv', fields_dates_auto => 1, out_dates_format => '%F')->write();
- out_dates_locale
-
Taken into account only if "out_dates_format" is used.
Sets the locale to apply on "out_dates_format".
Value by default: none (by default, use the locale detected on input)
Example:
# Detect any field containing a DateTime value and convert it to a US # DateTime whatever the input format is. Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv', fields_dates_auto => 1, out_dates_format => '%b %d, %Y, %I:%M:%S %p', out_dates_locale => 'en')->write();
- croak_if_error
-
If true, stops the program execution in case of error.
IMPORTANT
Value by default: 1
If set to zero (
croak_if_error => 0
), errors are displayed as warnings. This printing can then be affected by setting the "quiet" attribute. - verbose
-
If true, get Text::AutoCSV to be a bit talkative instead of speaking only when warnings and errors occur. Verbose output is printed to STDERR by default, this can be tuned with the "infoh" attribute.
Value by default: 0
Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', verbose => 1);
- infoh
-
File handle to display program's verbose output. Has effect *mainly* with attribute
verbose => 1
.Note infoh is used to display extra information in case of error (if a field does not exist, Text::AutoCSV will display the list of existing fields). If you don't want such output, you can set
infoh
to undef.Value by default: \*STDERR
Example:
open my $infoh, ">", "log.txt"; my $csv = Text::AutoCSV->new(in_file => 'in.csv', infoh => $infoh);
- quiet
-
If true, don't display warnings and errors, unless croaking.
If "croak_if_error" attribute is set (as per default), still, Text::AutoCSV will produce output (on STDERR) when croaking miserably.
When using
croak_if_error => 0
, errors are processed as warnings and if "quiet" is set (in addition to "croak_if_error" being set to 0), there'll be no output. Note this way of working is not recommended, as things can go wrong without any notice to the caller.Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', quiet => 1);
- one_pass
-
If true, Text::AutoCSV will perform one reading of the input. If other readings are triggered, it'll raise an error and no reading will be done. Should that be the case (you ask Text::AutoCSV to do something that'll trigger more than one reading of input), Text::AutoCSV will croak as is always the case if an error occurs.
Normally Text::AutoCSV will do multiple reads of input to work out certain characteristics of the CSV: guess of encoding and guess of escape character.
Also if member functions like "field_add_link", "field_add_copy", "field_add_computed", "read" or "write" are called after input has already been read, it'll trigger further reads as needed.
If one wishes a unique read of the input to occur, one_pass attribute is to be set.
When true, encoding will be assumed to be the first one in the provided list ("encoding" attribute), if no encoding attribute is provided, it'll be the first one in the default list, to date, it is UTF-8.
When true, and if attribute "escape_char" is not set, escape_char will be assumed to be '\\' (backslash).
By default, one_pass is set if inh attribute is set (caller provides the input file handle of input) or if input file is stdin (in_file attribute not set or set to an empty string).
Value by default:
0 if inh attribute is not set and in_file attribute is set to a non empty string 1 if inh attribute is set or in_file is not set or set to an empty string
Example:
my $csv = Text::AutoCSV->new(in_file => 'in.csv', one_pass => 1);
read
$csv->read();
Read input entirely.
Return value
Returns the object itself in case of success. Returns undef if error.
Callback functions (when defined) are invoked, in the following order:
"read_post_update_hr", intended to do updates on fields values after each record read
"walker_ar", called after each record read, with an array ref of fields values
"walker_hr", called after each record read, with a hash ref of fields values
Example:
# Do nothing - just check CSV can be read successfully
Text::AutoCSV->new(in_file => 'in.csv')->read();
read_all_in_mem
$csv->read_all_in_mem();
Created in version 1.1.5. Before, existed only as _read_all_in_mem, meaning, was private.
Read input entirely, as with "read" function, but enforcing content to be kept in-memory.
Having the content kept in-memory is implied by search functions ("vlookup" for example). With read_all_in_mem
you can enforce this behavior without doing a fake search.
reset_next_record_hr
$csv->reset_next_record_hr();
Reset the internal status to start from the beginning with "get_next_record_hr". Used in conjunction with "get_next_record_hr".
get_next_record_hr
my $hr = $csv->get_next_record_hr(\$opt_key);
Get the next record content as a hash ref. $hr
is undef when the end of records has been reached.
When specified, $opt_key
is set to the current (returned) record key.
NOTE
You do not need to call "reset_next_record_hr" once before using get_next_record_hr
.
Therefore "reset_next_record_hr" is useful only if you wish to restart from the beginning before you've reached the end of the records.
NOTE bis
"walker_hr" allows to execute some code each time a record is read, and it better fits with Text::AutoCSV philosophy. Using a loop with get_next_record_hr
is primarily meant for Text::AutoCSV internal usage. Also when using this mechanism, you get very close to original Text::CSV logic, that makes Text::AutoCSV less useful.
Return value
A hashref of the record, or undef once there's no more record to return.
Example:
while (my $hr = $csv->get_next_record_hr()) {
say Dumper($hr);
}
write
$csv->write();
Write input into output.
Return value
Returns the object itself in case of success. Returns undef if error.
- If the content is not in-memory at the time write() is called:
Each record is read (with call of "read_post_update_hr", "walker_ar" and "walker_hr") and then written. The read-and-write is done in sequence, each record is written to output before the next record is read from input.
- If the content is in-memory at the time write() is called:
No "read" operation is performed, instead, records are directly written to output.
If defined, "out_filter" is called for each record. If the return value of "out_filter" is false, the record is not written.
Example:
# Copy input to output.
# As CSV is parsed in-between, this copy also checks a number of
# characteristics about the input, as opposed to a plain file copy
# operation.
Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv')->write();
out_header
$csv->out_header($field, $header);
Set the header text of $field
to $header
.
By default, the input header value is rewritten as is to output. out_header
allows you to change it.
Return value
Returns the object itself.
Example:
Text::AutoCSV->new(in_file => 'in.csv', out_file => 'out.csv')
->out_header('LOGIN', 'Login')
->out_header('FULLNAME', 'Full Name')
->write();
print_id
$csv->print_id();
Print out a description of input. Write to \*STDERR by default or to "infoh" attribute if set.
The description consists in a list of a few characteristics (CSV separator and the like) followed by the list of columns with the details of each.
Example of output:
If you go to the utils
directory of this module and execute the following:
./csvcopy.pl -i f1.csv -l "1:,A->B,f2.csv" --id
You will get this output:
-- f1.csv:
sep_char: ,
escape_char: \
in_encoding: UTF-8
is_always_quoted: no
# FIELD HEADER EXT DATA DATETIME FORMAT DATETIME LOCALE
- ----- ------ -------- --------------- ---------------
0 TIMESTAMP timestamp %Y%m%d%H%M%S
1 A a
2 B b
3 C c
4 D d %d/%m/%Y
5 1:SITE 1:SITE link: f2.csv, chain: A->B->* (SITE)
6 1:B 1:B link: f2.csv, chain: A->B->* (B)
field_add_computed
$csv->field_add_computed($new_field, $subref);
$new_field
is the name of the created field.
$subref
is a reference to a sub that'll calculate the new field value.
Return value
Returns the object itself in case of success. Returns undef if error.
Add a field calculated from other fields values. The subref runs like this:
sub func {
# $new_field is the name of the field (allows to use one subref for
# more than one field calculation).
# $hr is a hash ref of fields values.
# $stats is a hash ref that gets printed (if Text::AutoCSV is
# created with verbose => 1) in the end.
my ($new_field, $hr, $stats) = @_;
my $field_value;
# ... compute $field_value
return $field_value;
}
Example:
# Add a field for the full name, made of the concatenation of the first
# name and the last name.
Text::AutoCSV->new(in_file => 'dirpeople.csv', out_file => 'dirwithfn.csv',
verbose => 1)
->field_add_computed('FULLNAME', \&calc_fn)->write();
sub calc_fn {
my ($new_field, $hr, $stats) = @_;
die "Man, you are in serious trouble!" unless $new_field eq 'FULLNAME';
my $fn = $hr->{'FIRSTNAME'} . ' ' . uc($hr->{'LASTNAME'});
$stats->{'empty full name'}++ if $fn eq ' ';
return $fn;
}
field_add_copy
$csv->field_add_copy($new_field, $src_field, $opt_subref);
$new_field
if the name of the new field.
$src_field
is the name of the field being copied.
$opt_subref
is optional. It is a reference to a sub that takes one string (the value of $src_field
) and returns a string (the value assigned to $new_field
).
Return value
Returns the object itself in case of success. Returns undef if error.
field_add_copy
is a special case of "field_add_computed". The advantage of field_add_copy
is that it relies on a sub that is Text::AutoCSV "unaware", just taking one string as input and returning another string as output.
IMPORTANT
The current field value is passed to field_add_copy
in $_.
A call to
$csv->field_add_copy($new_field, $src_field, $subref);
is equivalent to
$csv->field_add_computed($new_field, \&subref2);
sub subref2 {
my (undef, $hr) = @_;
local $_ = $hr->{$src_field};
return $subref->();
}
Example of a field copy + pass copied field in upper case and surround content with <<>>:
my $csv = Text::AutoCSV->new(in_file => 'dirpeople.csv',
out_file => 'd2.csv');
$csv->field_add_copy('UCLAST', 'LASTNAME', \&myfunc);
$csv->write();
sub myfunc { s/^.*$/<<\U$&>>/; $_; }
Note that the calls can be chained as most member functions return the object itself upon success. The example above is equivalent to:
Text::AutoCSV->new(in_file => 'dirpeople.csv', out_file => 'd2.csv')
->field_add_copy('UCLAST', 'LASTNAME', \&myfunc)
->write();
sub myfunc { s/^.*$/<<\U$&>>/; $_; }
field_add_link
$csv->field_add_link($new_field, $chain, $linked_file, \%opts);
$new_field
is the name of the new field.
$chain
is the CHAIN of the link, that is: 'LOCAL->REMOTE->PICK' where:
LOCAL
is the field name to read the value from.
REMOTE
is the linked field to find the value in. This field belongs to $linked_file.
PICK
is the field from which to read the value of, in the record found by the search. This field belongs to $linked_file.
If $new_field is undef, the new field name is the name of the third field of $chain (PICK).
$linked_file
is the name of the linked file, that gets read in a Text::AutoCSV object created on-the-fly to do the search on. $linked_file
can also be a Text::AutoCSV object that you created yourself, allowing for more flexibility. Example:
my $lcsv = Text::AutoCSV->new(in_file => 'logins.csv', case => 1);
$csv->field_add_link('NAME', 'ID->LOGIN->DISPLAYNAME', $lcsv);
\%opts
is a hash ref of optional attributes. The same values can be provided as with vlookup.
- trim
-
If set to 1, searches will ignore leading and trailing spaces. That is, a
LOCAL
value of ' x ' will match with aREMOTE
value of 'x'.If option is not present, use "search_value_if_not_found" attribute of object (default value: 1).
Example:
$csv->field_add_link('NAME', 'ID->LOGIN->DISPLAYNAME', 'logins.csv', { trim => 0 });
- case
-
If set to 1, searches will take the case into account. That is, a
LOCAL
value of 'X' will not match with aREMOTE
value of 'x'.If option is not present, use "search_case" attribute of object (default value: 0).
Example:
$csv->field_add_link('NAME', 'ID->LOGIN->DISPLAYNAME', 'logins.csv', { case => 1 });
- ignore_empty
-
If set to 1, empty values won't match. That is, a
LOCAL
value of '' will not match with aREMOTE
value of ''.If option is not present, use "search_ignore_empty" attribute of object (default value: 1).
Example:
$csv->field_add_link('NAME', 'ID->LOGIN->DISPLAYNAME', 'logins.csv', { ignore_empty => 0 });
- value_if_not_found
-
If the searched value is not found, the value of the field is undef, that produces an empty string at write time. Instead, you can specify the value.
If option is not present, use "search_value_if_not_found" attribute of object (default value: undef).
Example:
$csv->field_add_link('NAME', 'ID->LOGIN->DISPLAYNAME', 'logins.csv', { value_if_not_found => '<not found!>' });
- value_if_found
-
If the searched value is found, you can specify the value to return.
If option is not present, use "search_value_if_found" attribute of object (default value: none).
NOTE
Although the
PICK
field is ignored when using this option, you must specify it any way.Example:
$csv->field_add_link('NAME', 'ID->LOGIN->DISPLAYNAME', 'logins.csv', { value_if_not_found => '0', value_if_found => '1' });
- value_if_ambiguous
-
If the searched value is found in more than one record, the value of the field is undef, that produces an empty string at write time. Instead, you can specify the value.
If option is not present, use "search_value_if_ambiguous" attribute of object (default value: undef).
Example:
$csv->field_add_link('NAME', 'ID->LOGIN->DISPLAYNAME', 'logins.csv', { value_if_ambiguous => '<ambiguous!>' });
- ignore_ambiguous
-
Boolean value. If ignore_ambiguous is true and the searched value is found in more than one record, then, silently fall back on returning the value of the first record. Obviously if
ignore_ambiguous
is true, then the value of "value_if_ambiguous" is ignored.If option is not present, use "search_ignore_ambiguous" attribute of object (default value: 1).
Example:
$csv->field_add_link('NAME', 'ID->LOGIN->DISPLAYNAME', 'logins.csv', { ignore_ambiguous => 1 });
Example with multiple options:
$csv->field_add_link('NAME', 'ID->LOGIN->DISPLAYNAME', 'logins.csv', { value_if_not_found => '?', ignore_ambiguous => 1 });
Return value
Returns the object itself in case of success. Returns undef if error.
Example of field_add_link usage:
my $nom_compose = 0;
my $zip_not_found = 0;
Text::AutoCSV->new(in_file => 'pers.csv', walker_hr => \&walk)
->field_add_link('MYCITY', 'ZIP->ZIPCODE->CITY', 'zips.csv')->read();
sub walk {
my $hr = shift;
$nom_compose++ if $hr->{'NAME'} =~ m/[- ]/;
$zip_not_found++ unless defined($hr->{'MYCITY'});
}
print("Number of persons with a multi-part name: $nom_compose\n");
print("Number of persons with unknown zipcode: $zip_not_found\n");
links
$csv->links($prefix, $chain, $linked_file, \%opts);
$prefix
is the name to add to joined fields
$chain
is the JOINCHAIN of the link, that is: 'LOCAL->REMOTE' where:
LOCAL
is the field name to read the value from.
REMOTE
is the linked field to find the value in. This field belongs to $linked_file.
As opposed to "field_add_link", there is no PICK
part, as all fields of target are read.
As opposed to Text::AutoCSV habits of croaking whenever a field name is duplicate, here, the duplicates are resolved by appending _2 to the joined field name if it already exists. If _2 already exists, too, then _3 is appended instead, and so on, until a non-duplicate is found. This mechanism is executed given the difficulty to control all field names when joining CSVs.
$linked_file
and \%opts
work exactly the same way as for "field_add_link", see "field_add_link" for help.
Return value
Returns the object itself in case of success. Returns undef if error.
NOTE
This function used to be called join
but got renamed to avoid clash with perl' builtin join
.
Example:
Text::AutoCSV->new(in_file => 'pers.csv', out_file => 'pers_with_city.csv')
->links('Read from zips.csv:', 'ZIP->ZIPCODE', 'zips.csv')->write();
get_in_encoding
my $enc = $csv->get_in_encoding();
Return the string of input encoding, for example 'latin2' or 'UTF-8', etc.
get_in_file_disp
my $f = $csv->get_in_file_disp();
Return the printable name of in_file.
get_sep_char
my $s = $csv->get_sep_char();
Return the string of the input CSV separator character, for example ',' or ';'.
get_escape_char
my $e = $csv->get_escape_char();
Return the string of the input escape character, for example '"' or '\\'.
get_is_always_quoted
my $a = $csv->get_is_always_quoted();
Return 1 if all fields of input are always quoted, 0 otherwise.
get_coldata
my @cd = get_coldata();
Return an array that describes each column, from the first one (column 0) to the last.
Each element of the array is itself an array ref that contains 5 elements:
0: Name of the field (as accessed in *_hr functions)
1: Content of the field in the header line (if input has a header line)
2: Column content type, shows some meta-data of fields created with
field_add_* functions
3: Datetime format detected, if ever, in the format Strptime
4: Locale of DateTime format detected, if ever
5: Multiline field: '1' if not, 'm' if newlines encountered in the field
get_pass_count
my $n = $csv->get_pass_count();
Return the number of input readings done. Useful only if you're interested in Text::AutoCSV internals.
get_in_mem_record_count
my $m = $csv->get_in_mem_record_count();
Return the number of records currently stored in-memory. Useful only if you're interested in Text::AutoCSV internals.
get_max_in_mem_record_count
my $mm = $csv->get_max_in_mem_record_count();
Return the maximum number of records ever stored in-memory. Indeed this number can decrease: certain functions like field_add* member-functions discard in-memory content. Useful only if you're interested in Text::AutoCSV internals.
get_fields_names
my @f = $csv->get_fields_names();
Return an array of the internal names of the columns.
get_field_name
my $name = $csv->get_field_name($n);
Return the $n
-th column name, the first column being number 0.
Example:
# Get the field name of the third column
my $col = $csv->get_field_name(2);
get_stats
my %stats = $csv->get_stats();
Certain callback functions provide a parameter to record event count: "field_add_computed", "read_post_update_hr", "walker_ar" and "walker_hr". By default, these stats are displayed if Text::AutoCSV got created with attribute verbose => 1
. get_stats() returns the statistics hash of the object.
IMPORTANT
As opposed to most functions that trigger input reading automatically (search functions and other get_* functions), get_stats
just returns you the stats as it is, regardless of whether some execution already occured.
get_nb_rows
my $nb_rows = $csv->get_nb_rows();
Gives the number of rows of the input. Does not trigger any reading - just provides the number of rows as known at the moment of the call. If unknown, return undef. Typically, the number of rows is known after doing the initial detection of CSV options (escape character, etc.), or, after doing one complete reading.
The header line counts for one row.
IMPORTANT
As some fields can contain new lines, this number is not necessarily identical to the number of lines.
set_walker_ar
$csv->set_walker_ar($subref);
Normally one wants to define it at object creation time using "walker_ar" attribute. set_walker_ar
allows to assign the attribute walker_ar after object creation.
See attribute "walker_ar" for help about the way $subref
should work.
Return value
Returns the object itself in case of success. Returns undef if error.
Example:
# Calculate the total of the two first columns, the first column being
# money in and the second one being money out.
my ($actif, $passif) = (0, 0);
$csv->set_walker_ar(
sub {
my $ar = $_[0]; $actif += $ar->[0]; $passif += $ar->[1];
}
)
->read();
print("Actif = $actif\n");
print("Passif = $passif\n");
set_walker_hr
$csv->set_walker_hr($subref);
Normally one wants to define it at object creation time using "walker_hr" attribute. set_walker_hr
allows to assign the attribute "walker_hr" after object creation.
See attribute "walker_hr" for help about the way $subref
should work.
Return value
Returns the object itself in case of success. Returns undef if error.
Example:
my $csv = Text::AutoCSV->new(in_file => 'directory.csv', verbose => 1);
# ...
$csv->set_walker_hr(
sub {
my ($hr, $stat) = @_;
$stat{'not capital name'}++, return
if $hr->{'NAME'} ne uc($hr->{'NAME'});
$stat{'name is capital letters'}++;
}
)->read();
set_out_file
$csv->set_out_file($out_file);
Normally one wants to define it at object creation time using "out_file" attribute. set_out_file
allows to assign the attribute "out_file" after object creation. It is set to $out_file
value.
Return value
Returns the object itself in case of success. Returns undef if error.
Example:
$csv->set_out_file('mycopy.csv')->write();
get_keys
my @allkeys = $csv->get_keys();
Returns an array of all the record keys of input. A record key is a unique identifier that designates the record.
At the moment it is just an integer being the record number, the first one (that comes after the header line) being of number 0. For example if $csv input is made of one header line and 3 records (that is, a 4-line file typically, if no record contains a line break), $csv->get_keys() returns:
(0, 1, 2)
IMPORTANT
If not yet done, this function causes the input to be read entirely and stored in-memory.
get_hr_all
my @allin = $csv->get_hr_all();
Returns an array of all record contents of the input, each record being a hash ref.
IMPORTANT
If not yet done, this function causes the input to be read entirely and stored in-memory.
get_row_ar
my $row_ar = $csv->get_row_ar($record_key);
Returns an array ref of the record designated by $record_key
.
Example:
# Get content (as array ref) of last record
my @allkeys = $csv->get_keys();
my $lastk = $allkeys[-1];
my $lastrec_ar = $csv->get_row_ar($lastk);
IMPORTANT
If not yet done, this function causes the input to be read entirely and stored in-memory.
get_row_hr
my $row_hr = $csv->get_row_hr($record_key);
Returns a hash ref of the record designated by $record_key
.
Example:
# Get content (as hash ref) of first record
my @allkeys = $csv->get_keys();
my $firstk = $allkeys[0];
my $firstrec_hr = $csv->get_row_hr($firstk);
IMPORTANT
If not yet done, this function causes the input to be read entirely and stored in-memory.
get_cell
my $val = $csv->get_cell($record_key, $field_name);
Return the value of the cell designated by its record key ($record_key
) and field name ($field_name
).
Example:
my @allkeys = $csv->get_keys();
my $midk = $allkeys[int($#allkeys / 2)];
my $midname = $csv->get_cell($midk, 'NAME');
Note the above example is equivalent to:
my @allkeys = $csv->get_keys();
my $midk = $allkeys[int($#allkeys / 2)];
my $midrec_hr = $csv->get_row_hr($midk);
my $midname = $midrec_hr->{'NAME'};
IMPORTANT
If not yet done, this function causes the input to be read entirely and stored in-memory.
get_values
my @vals = $csv->get_values($field_name, $opt_filter_subref);
Return an array made of the values of the given field name ($field_name
), for every records, in the order of the records.
$opt_filter_subref
is an optional subref. If defined, it is called with every values in turn (one call per value) and only values for which $opt_filter_subref
returned True are included in the returned array. Call to $opt_filter_subref
is done with $_ to pass the value.
Example:
my @logins = $csv->get_values('LOGIN");
This is equivalent to:
my @allkeys = $csv->get_keys();
my @logins;
push @logins, $csv->get_cell($_, 'LOGIN') for (@allkeys);
Example bis
# @badlogins is the list of logins that contain non alphanumeric
# characters
my @badlogins = Text::AutoCSV->new(in_file => 'logins.csv')
->get_values('LOGIN', sub { m/[^a-z0-9]/ });
This is equivalent to:
# @badlogins is the list of logins that contain non alphanumeric
# characters This method leads to carrying all values of a given field
# across function calls...
my @badlogins = grep { m/[^a-z0-9]/ } (
Text::AutoCSV->new(in_file => 'logins.csv')->get_values('LOGIN')
);
IMPORTANT
If not yet done, this function causes the input to be read entirely and stored in-memory.
get_recnum
my $r = $csv->get_recnum();
Returns the current record identifier, if a reading is in progress. If no read is in progress, return undef.
in_map
See "read_update_after" below.
read_update_after
read_update_after
is an alias of in_map
.
$csv->in_map($field, $subref);
After reading a record from input, update $field
by calling $subref
. The value is put in $_
. Then the field value is set to the return value of $subref
.
This feature is originally meant to manage DateTime fields: the input and output CSVs carry text content, and in-between, the values dealt with are DateTime objects.
See "out_map" for an example.
out_map
See "write_update_before" below.
write_update_before
write_update_before
is an alias of out_map
.
$csv->out_map($field, $subref);
Before writing $field
field content into the output file, pass it through out_map
. The value is put in $_
. Then the return value of $subref
is written in the output.
Example:
Suppose you have a CSV file with the convention that a number surrounded by parenthesis is negative. You can register corresponding "in_map" and "out_map" functions. During the processing of data, the field content will be just a number (positive or negative), while in input and in output, it'll follow the "negative under parenthesis" convention.
In the below example, we rely on convention above and add a new field converted from the original one, that follows the same convention.
sub in_updt {
return 0 if !defined($_) or $_ eq '';
my $i;
return -$i if ($i) = $_ =~ m/^\((.*)\)$/;
$_;
}
sub out_updt {
return '' unless defined($_);
return '(' . (-$_) . ')' if $_ < 0;
$_;
}
sub convert {
return ;
}
Text::AutoCSV->new(in_file => 'trans-euros.csv',
out_file => 'trans-devises.csv')
->in_map('EUROS', \&in_updt)
->out_map('EUROS', \&out_updt)
->out_map('DEVISE', \&out_updt)
->field_add_copy('DEVISE', 'EUROS',
sub { sprintf("%.2f", $_ * 1.141593); } )
->write();
search
my $found_ar = $csv->search($field_name, $value, \%opts);
Returns an array ref of all records keys where the field $field_name
has the value $value
.
\%opts
is an optional hash ref of options for the search. See help of "vlookup" about options.
IMPORTANT
An unsuccessful search returns an empty array ref, that is, [ ]. Thus you cannot check for definedness of search
return value to know whether or not the search found something.
On the other hand, you can always examine the value search(...)->[0]
, as search is always an array ref. If the search found nothing, then, search(...)->[0]
is not defined.
IMPORTANT bis
If not yet done, this function causes the input to be read entirely and stored in-memory.
Example:
my $linux_os_keys_ar = $csv->search('OS', 'linux');
search_1hr
my $found_hr = $csv->search_1hr($field_name, $value, \%opts);
Returns a hash ref of the first record where the field $field_name
has the value $value
.
\%opts
is an optional hash ref of options for the search. See help of "vlookup" about options.
Note the options "value_if_not_found" and "value_if_ambiguous" are ignored. If not found, return undef. If the result is ambiguous (more than one record found) and ignore_ambiguous is set to a false value, return undef.
The other options are taken into account as for any search: "ignore_ambiguous", "trim", "case", "ignore_empty".
IMPORTANT
As opposed to "search", an unsuccessful search_1hr
will return undef
.
IMPORTANT bis
If not yet done, this function causes the input to be read entirely and stored in-memory.
Example:
my $hr = $csv->search_1hr('LOGIN', $login);
my $full_name = $hr->{'FIRSTNAME'} . ' ' . $hr->{'LASTNAME'};
vlookup
my $val = $csv->vlookup($searched_field, $value, $target_field, \%opts);
Find the first record where $searched_field
contains $value
and out of this record, returns the value of $target_field
.
\%opts
is optional. It is a hash of options for vlookup
:
- trim
-
If true, ignore spaces before and after the values to search.
If option is not present, use "search_trim" attribute of object (default value: 1).
- case
-
If true, do case sensitive searches.
If option is not present, use "search_case" attribute of object (default value: 0).
- ignore_empty
-
If true, ignore empty values in the search. The consequence is that you won't be able to find empty values by searching it.
If option is not present, use "search_ignore_empty" attribute of object (default value: 1).
- ignore_accents
-
If true, ignore accents in searches. For exampe, if
ignore_accents
is set, a string like "élémentaire" will match "elementaire".If option is not present, use "search_ignore_accents" attribute of object (default value: 1).
NOTE
This option uses the function "remove_accents" to build its internal hash tables. See "remove_accents" help for more details.
- value_if_not_found
-
Return value if vlookup finds nothing.
If option is not present, use "search_value_if_not_found" attribute of object (default value: undef).
- value_if_found
-
Return value if vlookup finds something.
If option is not present, use "search_value_if_found" attribute of object (default value: none).
This option is to just check whether a value exists, regardless of the target value found.
NOTE
Although the $target_field is ignored when using this option, you must specify it any way.
- value_if_ambiguous
-
Return value if vlookup find more than one result. Tune it only if ignore_ambiguous is unset.
If option is not present, use "search_value_if_ambiguous" attribute of object (default value: undef).
- ignore_ambiguous
-
If true, then if more than one result is found, silently return the first one.
If option is not present, use "search_ignore_ambiguous" attribute of object (default value: 1).
IMPORTANT
If not yet done, this function causes the input to be read entirely and stored in-memory.
Example:
my $name = $csv->vlookup('LOGIN', $id, 'NAME',
{ value_if_not_found => '<login not found>' });
remove_accents
my $t = $csv->remove_accents($s);
Take the string $s
as argument and return the string without accents. Uses a Unicode decomposition followed by removal of every characters that have the Unicode property Nonspacing_Mark
.
NOTE
Only accents are removed. It is not a whatever-encoding -> us-ascii
conversion. For example, the French œ character (o followed by e) or the German ß (eszett) are kept as is.
NOTE bis
Tested with some latin1 and latin2 characters.
NOTE ter
There is no language-level transformation during accents removal. For example Jürgen is returned as Jurgen, not Juergen.
This function is not exported by default.
Example:
use Text::AutoCSV qw(remove_accents);
my $s = remove_accents("Français: être élémentaire, Tchèque: služba dům");
die "This script will never die"
if $s ne 'Francais: etre elementaire, Tcheque: sluzba dum';
AUTHOR
Sébastien Millet <milletseb@laposte.net>
COPYRIGHT AND LICENSE
This software is copyright (c) 2016, 2017 by Sébastien Millet.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.