NAME

parsepica - fetch, parse and transform PICA+ data

VERSION

version 0.585

SYNOPSIS

parsepica [options] [input file(s) or SRU-Server(s) and queries(s)]

DESCRIPTION

This script provides a simple command line client to fetch and transform PICA+ records. You can parse and transform local files (compressed .gz files can directly be read) or query records from a server via various protocols. You can also specify a configuration file for PICA::Source which includes a pointer to an SRU, Z39.50, PSI, or unAPI source.

The records can then be written to a file or STDOUT in PICA+ or PICA/XML format. Instead of writing full records you can select single PICA+ fields. Selecting fields with parsepica is around half as fast as using grep, but grep does not really parse and check for wellformedness.

By default input is read from STDIN and written to STDOUT ('-') without logging. On request logging information is printed to STDOUT or to a specified logfile. Records that cannot be parseded produce error messages to STDERR.

OPTIONS

-input FILE     file with input files on each line ('-': STDIN)
-files FILE     read input files from another file ('-': STDIN)
-output FILE    print all valid records to a given file ('-': STDOUT)
-xml [FILE]     print records in XML
-pxml [FILE]    print records in pretty XML (with linebreaks)
-pretty [FILE]  print records in pretty format
-null           supress record output
-quiet          supress logging
-select FIELD   select a specific field or subfield (not if XML output)
-count          print simple statistics
-stats 0|1|2    print full statistics (1: fields, 2: subfields)
-config FILE    read configuration from a file ('-': search default file)
-auto           use default config file $PICASOURCE or ./pica.conf
-log [FILE]     print logging to a given file ('-': STDOUT, default)
-help           brief help message
-limit N        limit the result set to N records (only for SRU)
-man            full documentation with examples

EXAMPLES

parsepica file1 -o file2

Read from 'file1' and print parseable records to 'file2'

parsepica file1 -px file2.xml

Parse from 'file1' and pretty print XML format to 'file2.xml'.

parsepica http://gso.gbv.de/sru/DB=2.1/ pica.isb=3-423-31039-1

Get records with ISBN 3-423-31039-1 via SRU.

parsepica -c pica.isb=3-423-31039-1

Get records with ISBN 3-423-31039-1 via SRU if the default config file contains SRU =.http://gso.gbv.de/sru/DB=2.1/.

parsepica -se 021A -o - -q picadata

Select all fields '021A' from 'picadata' and write to STDOUT.

parsepica -log -count -null file1

Parse from 'file1' and count fileds

parsepica -log -stat 2 file1

Parse from 'file1' and print detailed statistics

LIMITATIONS

Error handling for broken records is not fully implemented. If you want to parse PICA+ records downloaded via WinIBW, you may need to first clean them with the script winibw2pica.

The limit parameter should also be implemented for other sources but SRU and an offset parameter would be useful. Fetching records via other protocols but SRU has not been tested. The statistics method can be improved a lot.

AUTHOR

Jakob Voß <voss@gbv.de>

COPYRIGHT AND LICENSE

This software is copyright (c) 2014 by Verbundzentrale Goettingen (VZG) and Jakob Voss.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.