NAME
csv - process CSV files from the command line
SYNOPSIS
# On the command line:
csv 1 2 -1 < report.csv
# Reads the first two fields, as well as the last one, from "report.csv".
# Data is cleaned up and emitted as CSV.
csv --input report.csv --to_tsv
# Converts the whole report to TSV (tab-separated values).
DESCRIPTION
CSV (comma-separated value) files are the lowest common denominator of structured data interchange formats. For such a humble file format, it is pretty difficult to get right: embedded quote marks and linebreaks, slipshod delimiters, and no One True Validity Test make CSV data found in the wild hard to parse correctly. Text::CSV_XS provides flexible and performant access to CSV files from Perl, but is cumbersome to use in one-liners and the command line.
csv is intended to make commandline processing of CSV files as easy as plain text is meant to be on Unix. Internally, it holds two Text::CSV objects (for input and for output), which have reasonable defaults but which you can reconfigure to suit your needs. Then you can extract just the fields you want, change the delimiter, clean up the data etc.
In the simplest usage, csv filters stdio and takes a list of integers. These are 1-based column numbers to select from the input CSV stream. Negative numbers are counted from the line end. Without any column list, csv selects all columns (this is still useful to normalize quoting style etc.).
Command line options
The following options are passed to Text::CSV. When preceded by the prefix "output_", the destination is affected. Otherwise these options affect both input and output.
- --quote_char
- --escape_char
- --sep_char
- --eol
- --always_quote
- --binary
- --keep_meta_info
- --allow_loose_quotes
- --allow_loose_escapes
- --allow_whitespace
- --verbatim
NOTE: binary is set to 1 by default in csv. The other options have their Text::CSV defaults.
The following additional options are available:
- --input
- --output
-
Filenames for input and output. "-" means stdio. Useful to trigger TSV mode (
--from_tsv
and--to_tsv
). - --from_tsv
- --to_tsv
-
Use tabs instead of commas as the delimiter. When csv has the input or output filenames available, this is inferred when they end with
.tsv
. To disable this dwimmery, you may say--to_tsv=0
and--from_tsv=0
.
SEE ALSO
AUTHOR
Gaal Yahas <gaal@forum2.org>
THANKS
nothingmuch, gphat, and t0m
BUGS
Please report any bugs or feature requests to bug-app-csv at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=App-CSV. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
You're also invited to work on a patch. The source repo is at
git://github.com/gaal/app-csv.git
http://github.com/gaal/app-csv/tree/master
COPYRIGHT (The "MIT" License)
Copyright 2009 Gaal Yahas.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.