The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

MARC.pm - Perl extension to manipulate MAchine Readable Cataloging records.

SYNOPSIS

use MARC 0.91;

$x=MARC->new("mymarcfile.mrc");
$x->output({file=>">my_text.txt",'format'=>"ascii"});
$x->output({file=>">my_marcmaker.mkr",'format'=>"marcmaker"});
$x->output({file=>">my_html.html",'format'=>"html"});
$x->output({file=>">my_xml.xml",'format'=>"xml"});
$x->output({file=>">my_urls.html",'format'=>"urls"});
print $x->length();

DESCRIPTION

MARC.pm is a Perl 5 module for reading in, manipulating, and outputting bibliographic records in the USMARC format. You will need to have Perl 5.004 or greater for MARC.pm to work properly. Since it is a Perl module you use MARC.pm from one of your own Perl scripts. To see what sorts of conversions are possible you can try out a web interface to MARC.pm which will allow you to upload MARC files and retrieve the results (for details see the section below entitled "Web Interface").

However, to get the full functionality you will probably want to install MARC.pm on your server or PC. MARC.pm can handle both single and batches of MARC records. The limit on the amount of records in a batch is determined by the memory capacity of the machine you are running. If memory is an issue for you MARC.pm will allow you to read in records from a batch gradually. MARC.pm also includes a variety of tools for searching, removing, and even creating records from scratch.

Types of Conversions:

  • MARC -> ASCII : separates the MARC fields out into separate lines

  • MARC <-> MARCMaker : The MARCMaker format is a format that was developed by the Library of Congress for use with their DOS based MARCMaker and MARCBreaker utilities. This format is particularly useful for making global changes (ie. with a text editor's search and replace) and then converting back to MARC (MARC.pm will read properly formatted MARCMaker records). For more information about the MARCMaker format see http://lcweb.loc.gov/marc/marcsoft.html

  • MARC -> HTML : The MARC to HTML conversion creates an HTML file from the fields and field labels that you supply. You could possibly use this to create HTML bibliographies from a batch of MARC records.

  • MARC -> XML : The MARC to XML conversion creates an XML document that does not have a Document Type Definition. Fortunately, since XML does not require a DTD this is OK.

  • MARC -> URLS : This conversion will extract URLs from a batch of MARC records. The URLs are found in the 856 field, subfield u. The HTML page that is generated can then be used with link-checking software to determine which URLs need to be repaired. Hopefully library system vendors will soon support this activity soon and make this conversion unecessary!

Downloading and Installing

Download

The module is provided in standard CPAN distribution format. It will extract into a directory MARC-version with any necessary subdirectories. Change into the MARC top directory. Download the latest version from http://www.cpan.org/modules/by-module/MARC/

Unix
perl Makefile.PL
make
make test
make install
Win9x/WinNT/Win2000
perl Makefile.PL
perl test.pl
perl install.pl
Test

Once you have installed, you can check if Perl can find it. Change to some other directory and execute from the command line:

perl -e "use MARC"

If you do not get any response that means everything is OK! If you get an error like Can't locate method "use" via package MARC. then Perl is not able to find MARC.pm--double check that the file copied it into the right place during the install.

Todo

  • Support for other MARC formats.

  • Create a map and instructions for using and extending the MARC.pm data structure.

  • Develop better error catching mechanisms.

  • Support for character conversions from MARC to Unicode ??

  • Managing MARC records that exceed 99999 characters in length (not uncommon for MARC AMC records)

  • MARC <-> DC/RDF conversion ??

Web Interface

A web interface to MARC.pm is available at http://libstaff.lib.odu.edu/cgi-bin/marc.cgi where you can upload records and observe the results. If you'd like to check out the cgi script take a look at http://libstaff.lib.odu.edu/depts/systems/iii/scripts/MARCpm/marc-cgi.txt However, to get the full functionality you will want to install MARC.pm on your server or PC.

Notes

Please let us know if you run into any difficulties using MARC.pm--we'd be happy to try to help. Also, please contact us if you notice any bugs, or if you would like to suggest an improvement/enhancement. Email addresses are listed at the bottom of this page.

METHODS

Here is a list of the methods in MARC.pm that are available to you for reading in, manipulating and outputting MARC data.

new()

Creates a new MARC object.

$x = new MARC;

You can also use the optional file and format parameters to create and populate the object with data from a file. If a file is specified it will read in the entire file. If you wish to read in only portions of the file see openmarc(), nextmarc(), and closemarc() below.

$x = MARC->new("mymarc.dat","usmarc");
$x = MARC->new("mymarcmaker.mkr","marcmaker");

openmarc()

Opens a specified file for reading data into a MARC object. If no format is specified openmarc() will default to USMARC. The increment parameter defines how many records you would like to read from the file. If no increment is defined then the file will just be opened, and no records will be read in. If increment is set to -1 then the entire file will be read in.

$x = new MARC;
$x->openmarc({file=>"mymarc.dat",'format'=>"usmarc",increment=>"1"});
$x->openmarc({file=>"mymarcmaker.mkr",'format'=>"marcmaker",increment=>"5"});

note: openmarc() will return the number of records read in. If the file opens successfully, but no records are read, it returns "0 but true". For example:

$y=$x->openmarc({file=>"mymarc.dat",'format'=>"usmarc",increment=>"5"});
print "Read in $y records!";

nextmarc()

Once a file is open nextmarc() can be used to read in the next group of records. The increment can be passed to change the amount of records read in if necessary. An icrement of -1 will read in the rest of the file.

$x->nextmarc();
$x->nextmarc(10);
$x->nextmarc(-1);

note: Similar to openmarc(), nextmarc() will return the amount of records read in.

$y=$x->nextmarc();
print "$y more records read in!";

closemarc()

If you are finished reading in records from a file you should close it immediately.

$x->closemarc();

length()

Returns the total amount of records in a MARC object.

$length=$x->length();

getvalue()

This method will retrieve MARC field data from a specific record in the MARC object. getvalue() takes four paramters: record, field, subfield, and delimiter. Since a single MARC record could contain several of the fields or subfields the results are returned to you as an array. If you only pass record and field you will be returned the entire field without subfield delimters. Optionally you can use delimiter to specify what character to use for the delimeter, and you will also get the subfield delimiters. If you also specify subfield your results will be limited to just the contents of that subfield.

        #get the 650 field(s)
    @results = $x->getvalue({record=>'1',field=>'650'}); 
	#get the 650 field(s) with subfield delimiters (ie. |x |v etc)
    @results = $x->getvalue({record=>'1',field=>'650',delimiter=>'|'});
        #get all of the subfield u's from the 856 field
    @results = $x->getvalue({record=>'12',field=>'856',subfield=>'u'});
		

deletemarc()

This method will allow you to remove a specific record, fields or subfields from a MARC object. Accepted parameters include: record, field and subfield. Note: you can use the .. operator to delete a range of records. deletemarc() will return the amount of items deleted (be they records, fields or subfields). The record parameter is optional.

    #delete all the records in the object
$x->deletemarc();
    #delete records 1-5 and 7 
$x->deletemarc({record=>[1..5,7]});
    #delete all of the 650 fields from all of the records
$x->deletemarc({field=>'650'});
    #delete the 110 field in record 2
$x->deletemarc({record=>'2',field=>'110'});
    #delete all of the subfield h's in the 245 fields
$x->deletemarc({field=>'245',subfield=>'h'});

selectmarc()

This method will select specific records from a MARC object and delete the rest. You can specify both individual records and ranges of records in the same way as deletemarc(). selectmarc() will also return the amount of records deleted.

$x->selectmarc(['3']);
$y=$x->selectmarc(['4','21-50','60']);
print "$y records selected!";

searchmarc()

This method will allow you to search through a MARC object, and retrieve record numbers for records that matched your criteria. You can search for: 1) records that contain a particular field, or field and subfield ; 2) records that have fields or subfields that match a regular expression ; 3) and records that have fields or subfields that do not match a regular expression. The record numbers are returned to you in an array which you can then use with deletemarc(), selectmarc() and output() if you want.

  • 1) Field/Subfield Presence:

    @records=$x->searchmarc({field=>"245"});
    @records=$x->searchmarc({field=>"245",subfield=>"a"});
  • 2) Field/Subfield Match:

    @records=$x->searchmarc({field=>"245",regex=>"/huckleberry/i"});
    @records=$x->searchmarc({field=>"260",subfield=>"c",regex=>"/19../"});
  • 3) Field/Subfield NotMatch:

    @records=$x->searchmarc({field=>"245",notregex=>"/huckleberry/i"});
    @records=$x->searchmarc({field=>"260",subfield=>"c",notregex=>"/19../"});

createrecord()

You can use this method to initialize a new record. It only takes one optional parameter, leader which sets the 24 characters in the record leader: see http://lcweb.loc.gov/marc/bibliographic/ecbdhome.html for more details on the leader. Note: you do not need to pass character positions 00-04 or 12-16 since these are calculated by MARC.pm if outputting to MARC you can assign 0 to each position. If no leader is passed a default USMARC leader will be created of "00000nam 2200000 a 4500". createrecord() will return the record number for the record that was created, which you will need to use later when adding fields with addfield().

use MARC;
my $x = new MARC;
$record_number = $x->createrecord();
$record_number = $x->createrecord({leader=>"00000nmm  2200000 a 4500"});

addfield()

This method will allow you to addfields to a specified record. The syntax may look confusing at first, but once you understand it you will be able to add fields to records that you have read in, or to records that you have created with createrecord(). addfield() takes six parameters: record which indicates the record number to add the field to, field which indicates the field you wish to create (ie. 245), i1 which holds one character for the first indicator, i2 which holds one character for the second indicator, and value which holds the subfield data that you wish to add to the field. addfield() will automatically try to insert your new field in tag order (ie. a 500 field before a 520 field), however you can turn this off if you set ordered to "no" which will add the field to the end. Here are some examples:

$y = $x->createrecord(); # $y will store the record number created

$x->addfield({record=>"$y", field=>"100", i1=>"1", i2=>"0",value=>
             [a=>"Twain, Mark, ",
              d=>"1835-1910."]});

$x->addfield({record=>"$y", field=>"245", i1=>"1", i2=>"4", value=>
             [a=>"The adventures of Huckleberry Finn /",
              c=>"Mark Twain ; illustrated by E.W. Kemble."]});

This example intitalized a new record, and added a 100 field and a 245 field. For some more creative uses of the addfield() function take a look at the EXAMPLES section.

output()

Output is a multifunctional method for creating formatted output from a MARC object. There are three parameters file, format, records. If file is specified the output will be directed to that file. It is important to specify with and >> whether you want to create or append the file!> If no file is specified then the results of the output will be returned to a variable (both variations are listed below).

Valid format values currently include usmarc, marcmaker, ascii, html, xml, urls, and isbd. The optional records parameter allows you to pass an array of record numbers which you wish to output. You must pass the array as a reference, hence the forward-slash in \@records below. If you do not include records the output will default to all the records in the object.

  • MARC

    $x->output({file=>">mymarc.dat",'format'=>"usmarc"});
    $x->output({file=>">mymarc.dat",'format'=>"usmarc",records=>\@records});
    $y=$x->output({'format'=>"usmarc"}); #put the output into $y
  • MARCMaker

    $x->output({file=>">mymarcmaker.mkr",'format'=>"marcmaker"});
    $x->output({file=>">mymarcmaker.mkr",'format'=>"marcmaker",records=>\@records});
    $y=$x->output({'format'=>"marcmaker"}); #put the output into $y
  • ASCII

    $x->output({file=>">myascii.txt",'format'=>"ascii"});
    $x->output({file=>">myascii.txt",'format'=>"ascii",records=>\@records});
    $y=$x->output({'format'=>"ascii"}); #put the output into $y
  • HTML

    The HTML output method has some additional parameters. fields which if set to "all" will output all of the fields. Or you can pass the tag number and a label that you want to use for that tag. This will result in HTML output that only contains the specified tags, and will use the label in place of the MARC code.

        $x->output({file=>">myhtml.html",'format'=>"html",fields=>"all"});
            #this will only output the 100 and 245 fields, with the 
    	#labels "Title: " and "Author: "
        $x->output({file=>">myhtml.html",'format'=>"html",
                    245=>"Title: ",100=>"Author: "});    
        $y=$x->output({'format'=>"html"});

    If you want to build the HTML file in stages, there are three other format values available to you: 1) "html_header", 2) "html_body", and 3) "html_footer". Be careful to use the >> append when adding to a file though!

    $x->output({file=>">myhtml.html",'format'=>"html_header"}); 
    $x->output({file=>">>myhtml.html",'format'=>"html_body",fields=>"all"});
    $x->output({file=>">>myhtml.html",'format'=>"html_footer"});
  • XML

    $x->output({file=>">myxml.xml",'format'=>"xml"});
    $y=$x->output({'format'=>"xml"});

    Similar to the HTML output, the XML output has three different formats for creating the XML file in stages. Again, be careful to use the >> append where necessary.

    $x->output({file=>">myxml.xml",'format'=>"xml_header"});
    $x->output({file=>">>myxml.xml",'format'=>"xml_body"});
    $x->output({file=>">>myxml.xml",'format'=>"xml_footer"});
  • URLS

    $x->output({file=>"urls.html",'format'=>"urls"});
    $y=$x->output({'format'=>"urls"});
  • ISBD

    An experimental output format that attempts to mimic the ISBD.

    $x->output({file=>"isbd.txt",'format'=>"isbd"});
    $y=$x->output({'format'=>"isbd"});

EXAMPLES

Here are a few examples to fire your imagination.

  • This example will read in the complete contents of a MARC file called "mymarc.dat" and then output it as XML to a file called "myxml.xml".

    #!/usr/bin/perl
    use MARC;
    $x = MARC->new("mymarc.dat","usmarc");
    $x->output({file=>"myxml.xml",'format'=>"xml");
  • The MARC object occupies a fair amount of working memory, and you may want to do conversions on very large files. In this case you will want to use the openmarc(), nextmarc(), deletemarc(), and closemarc() methods to read in portions of the MARC file, do something with the record(s), remove them from the object, and then read in the next record(s). This example will read in one record at a time from a MARC file called "mymarc.dat" and convert it to XML. Note the use of formats "xml_header", "xml_body", and "xml_footer".

        #!/usr/bin/perl
        use MARC;
        $x = new MARC;
        $x->openmarc({file=>"mymarc.dat",'format'=>"usmarc"});
        $x->output({file=>">myxml.xml",'format'=>"xml_header"});
        while ($x->nextmarc(1)) {
    	$x->output({file=>">>myxml.xml",'format'=>"xml_body"});
    	$x->deletemarc(); #empty the object for reading in another
        }        
        $x->output({file=>"myxml.xml",'format'=>"xml_footer"});
  • Perhaps you have a tab delimited text file of data for online journals you have access to from Dow Jones Interactive, and you would like to create a batch of MARC records to load into your catalog. In this case you can use createrecord(), addfield() and output() to create records as you read in your delimited file. When you are done, you then output to a file in USMARC.

        #!/usr/bin/perl
        use MARC;
        $x = new MARC;
        open (INPUT_FILE, "delimited_file");
        while ($line=<INPUT_FILE>) {
            ($journaltitle,$issn) = split /\t/,$line;
            $num=$x->createrecord();
            $x->addfield({record=>$num, 
                          field=>"022", 
                          i1=>" ", i2=>" ", 
                          value=>$issn});
            $x->addfield({record=>$num, 
                          field=>"245", 
                          i1=>"0", i2=>" ", 
                          value=>[a=>$journaltitle]});
            $x->addfield({record=>$num, 
                          field=>"260", 
                          i1=>" ", i2=>" ", 
                          value=>[a=>"New York (N.Y.) :",b=>"Dow Jones & Company"]});
    	$x->addfield({record=>$num,
    		      field=>"710",
    		      i1=>"2", i2=>" ",
    		      value=>[a=>"Dow Jones Interactive."]});
    	$x->addfield({record=>$num,
    		      field=>"856",
    		      i1=>"4", i2=>" ",
    		      value=>[u=>"http://www.djnr.com",z=>"Connect"]});
        }
        close INPUT_FILE;
        $x->output({file=>">dowjones.mrc",'format'=>"usmarc"})

AUTHORS

Chuck Bearden cbearden@rice.edu

Bill Birthisel wcbirthisel@alum.mit.edu

Charles McFadden chuck@vims.edu

Ed Summers esummers@odu.edu

SEE ALSO

perl(1), MARC http://lcweb.loc.gov/marc , XML http://www.w3.org/xml .

COPYRIGHT

Copyright (C) 1999, Bearden, Birthisel, McFadden, Summers. All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. 19 October 1999.