NAME

picadata - parse and validate PICA+ data

SYNOPSIS

picadata [<command>] {path} {options} {files}

DESCRIPTION

Convert, analyze and validate PICA+ data from the command line.

COMMANDS

convert

Convert between PICA+ serialization formats (the default command).

get

Print subfield values.

levels

Split records into multiple records for each level. Implies -o.

join

Join multiple records into one and sort afterwards.

count

Count number of records, holdings, items, and fields.

filter

Filter records that include any of some given (sub)fields.

fields/subfields/sf

List distinct fields or subfields in the data. Provide an Avram schema (-s/--schema) to include documenation.

explain

Lookup (sub)fields in an Avram schema given by option or from stdin. Optional (o/*), mandatory (./+), repeatable (+/*).

validate

Validate data against an Avram schema (-s/--schema).

diff

Compare PICA records from two inputs. Output is always annotated PICA Plain.

patch

Apply modifications given in annotated PICA Plain.

modify

Change subfield values and return result or patch (option -a).

build

Build an Avram schema from input data, optionally based on an existing schema (-s/--schema). Add option -B/--abbrev to abbreviate.

OPTIONS

--from, -f

PICA serialization type (plain, plus/normalized, binary, import, XML, ppxml, pixml, patch) with Plain as default. Guessed from first input filename unless specified. See format documentation at http://format.gbv.de/pica.

--to, -t

PICA serialization type to enable writing parsed PICA data.

--number, -n

Stop parsing after n records. Can be abbreviated as -1, -2...

--order, -o

Sort record fields by field identifier and by occurrence at level 2.

--level, -l

Split record into selected level, includes higher level identifiers.

--annotate, -a, -A

Enforce annotated PICA as output format or prevent with -A. Combined with --schema this will set annotations ! and ? to mark validation errors.

--path, -p

Select fields or subfield values specified by PICA Path expressions. Multiple expressions can be separated by | or by repeating the option. Positions such as /3-7 are read as occurrence ranges.

--schema, -s

Avram Schema given by file or URL. Default set via environment variable PICA_SCHEMA.

--unknown, -u

Report unknown fields and subfields on validation (disabled by default).

--abbrev, -B

Abbreviate the Avram schema (with command <build>).

--color, -C

Colorize output. Only supported for PICA plain and PICA plus format.

--mono, -M

Monochrome (don't colorize output).

--version, -V

Print version number and exit.

EXAMPLES

picadata pica.dat -t xml                    # convert binary to XML
picadata count -f plain < pica.plain        # parse and count records
picadata 003@ pica.xml                      # extract field 003@
picadata validate pica.xml -s schema.json   # validate against Avram schema
picadata modify 021A.a "New Title" pica.pp  # modify subfield value

# document fields used in a record
picadata fields pica.xml -s https://format.k10plus.de/avram.pl?profile=k10plus

SEE ALSO

See catmandu for a more elaborated command line tool for data processing (transformation, API access...), including PICA+ with Catmandu::PICA.