NAME
MARC::Moose::Lint::Checker::RulesFile - A class to 'lint' biblio record based on a rules file
VERSION
version 1.0.40
DESCRIPTION
A MARC biblio record, MARC21, UNIMARC, whatever, can be validated against rules. Rules check various conditions:
Unknown tag - If a field is present in a record but is not specified by its tag in a validation rule, a warning is emitted saying that this field has an Unknown tag. This way all tags which are not specifically defined in validation rules are identified.
Unknown letter - If a subfield is present in a field but is not specified by its letter in a validation rule, a warning is emitted saying that this subfield has an Unknown letter. This way all subfields which are not specifically defined in validation rules are identified.
Mandatory field - When a validation rule defines that a field is mandatory, if this field is not found in a record, a warning is emitted saying that this field is missing.
Mandatory subfield - When a validation rule defines that a subfield is mandatory, if this subfield is not found in a field, a warning is emitted saying that this subfield is missing.
Repeatable field - When a validation rule specify that a field is not repeatable, if this field is repeated in a record, a warning is emitted saying that this field is "non repeatable".
Repeatable subfield - When a validation rule specify that a subfield is not repeatable, if this subfield is repeated in a field, a warning is emitted saying that this subfield is "non repeatable".
Indicator values - Authorised values for indicators 1 and 2 are specified in validation rule. When a field uses another value, a warning is emitted saying invalid indicator value.
Field content - The content of a field, control field value, or subfield value, can be tested on a regular expression. This way it's possible to check that a field comply to a specific format.
.{3}
will accept values with 3 characters length.[0-9]{8}
will accept digit-only value with 8 digits. And this regular expression will validate UNIMARC 100 code field:^[0-9]{8}[a-ku][0-9 ]{8}[abcdeklu ]{3}[a-huyz][01 ][a-z]{3}[a-cy][01|02|03|04|05|06|07|08|09|10|11|50]{2}
Validation tables - Validation tables can be specified. For example, table of ISO language codes. Field/subfield content can be validated against a table in order to identify unauthorised values. When such a value is found, a warning is emitted saying that this value is not in this table.
ATTRIBUTES
file
Name of the file containing validation rules based on which a biblio record can be validated.
METHODS
check( record )
This method checks a biblio record, based on the current 'lint' object. The biblio record is a MARC::Moose::Record object. An array of validation errors/warnings is returned. Those errors are just plain text explanation on the reasons why the record doesn't comply with validation rules.
SYNOPSYS
use MARC::Moose::Record;
use MARC::Moose::Reader::File::Iso2709;
use MARC::Moose::Lint::Checker::RulesFile;
# Read an ISO2709 file, and dump found errors
my $reader = MARC::Moose::Reader::File::Iso2709->new(
file => 'biblio.mrc' );
my $lint = MARC::Moose::Lint::Checker::RulesFile->new(
file => 'unimarc.rules' );
while ( my $record = $reader->read() ) {
if ( my @result = $lint->check($record) ) {
say "Biblio record #", $record->field('001')->value;
say join("\n", @result), "\n";
}
}
VALIDATION RULES
Validation rules are defined in a textual form. The file is composed of two parts: (1) field rules, (2) validation tables.
- (1) Field rules
-
Define validation rules for each tag. A blank line separates tags. For example:
102+ # # abc+i@CTRY ^[a-z]{3}$ 2+
Line 1 contains the field tag. If a + is present, the field is repeatable. If a _ is present, the field is mandatory. For control fields (tag under 010), an optional second line can contain a regular expression on which validating field content. For <standard fields>, line 2 and 3 contains a regular expression on which indicators 1 and 2 are validated. # means a blank indicator. Line 4 and the following define rules for validating subfields. A first part contains subfield's letters, and + (repeatable) and/or _ (mandatory), followed by an optional validation table name begining with @. A blank separates the first part from the second part. The second part contains a regular expression on which subfield content is validated.
- (2) Validation tables
-
This part of the file allows to define several validation tables. The table name begins with
==== TABLE NAME
in uppercase. Then each line is a code in the validation table.This could be:
==== LANG 100a22,3 101a
In this case, the table will be used to validate coded values in coded fields. In this example, the language table will check 100$a subfield, position 22, length 3, and 101$a. A table must contain all possible values. It not possible to use regular expressions. If you can have a blank value, you need a line containing juste a blank.
This is for example, a simplified standard UNIMARC validation rules file:
000
.{5}[cdnop][abcdefgijklmr][aimsc][ 012]
001_
005
\d{14}\.\d
100_
#
#
a ^[0-9]{8}[a-ku][0-9 ]{8}[abcdeklu ]{3}[a-huyz][01 ][a-z]{3}[a-cy][01|02|03|04|05|06|07|08|09|10|11|50]{2}
101_
0|1|2
#
abcdfghij+@LANG ^[a-z]{3}$
200_
0|1
#
a_+
bcdefghi+
v
z5+
==== CTRY
AF
AL
DZ
GG
GN
GW
GY
HT
HM
VE
VN
VG
VI
ZM
ZW
==== LANG 100a22,3 101a
aar
afh
afr
afa
ain
aka
akk
SEE ALSO
AUTHOR
Frédéric Demians <f.demians@tamil.fr>
COPYRIGHT AND LICENSE
This software is copyright (c) 2020 by Frédéric Demians.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.