NAME

Tie::Hash::Abbrev::BibRefs - match bibliographic references to the original titles

SYNOPSIS

use Tie::Hash::Abbrev::BibRefs;

tie my %hash, 'Tie::Hash::Abbrev::BibRefs',
    preprocess => sub { s/\s+[[:upper:]]:.*// },
    stopwords  => [ qw( a and de del der des di
                        et for für i if in la las
                        of on part Part Pt. Sect.
                        the to und ) ],
    exceptions => { jpn => 'japan',
                    natl => 'national' };

$hash{'Physical Review B'} = '0163-1829';

print $hash{'Phys. Rev. B: Condens. Matter Mater. Phys.'};
  # will print '0163-1829'

DESCRIPTION

This module is an attempt to ease the mapping of often abbreviated bibliographical references to the original titles.

To achieve this, it simplyfies the title according to parameterizable rules and stores it as a normalized key.

When accessing the hash, the key given is also normalized and compared to the normalized version of the original title. In addition, each word (words are separated by whitespace) may be abbreviated by specifying only the first few letters.

If more than one matching hash entry is found, the values of all matching entries are compared; as long as they are all equal (or all undef), the lookup is still considered to be successful.

KEY NORMALIZATION

The process of normalization is implemented as follows:

  1. execute any preprocessing code (see "SYNOPSIS" in example above), which is expected to operate on $_. You can use subroutine references or strings here; strings will be eval()uated.

  2. split the key into parts (at whitespace).

  3. remove any parts contained in the list of stopwords (see example above).

  4. replace any parts contained in the list of exceptions by their corresponding value. If the value is undef, the entire part will be removed. (In the example above, "Jpn" would be replaced by "japan".) This lookup is done case-insensitively.

  5. remove any non-word characters at the end of each part or followed by a dash

ADDITIONAL METHODS

debug

turn debug mode on (when given a true value as argument) or off (when given a false value). Returns the (possibly new) value.

In debug mode, the "find" method will print debug messages to STDERR.

delete_abbrev

my @deleted = tied(%hash)->delete_abbrev('foo','bar');

Will delete all elements on the basis of all unambiguous abbreviations given as arguments and return a (possibly empty) list of all deleted values.

exceptions

get or set the exceptions table for the hash. Expects hash references or undef, which clears the table. Returns a reference to the new exception table.

preprocess

set up the preprocessing code chain for the hash. Any code references or strings will be added to the chain, an undef will clear the chain.

stopwords

get or set the /stopwords for the hash. Any arguments given will be added to the list of stopwords. An undef as argument will clear the list of stopwords. The method returns the new list of stopwords (in an unsorted manner).

INTERNAL METHODS

The following methods should usually not be called "from the outside"; the main intention of ducumenting them is that the author still wants to understand his own module in case changes will be neccessary later. :o)

exact

expects a key as first and a position as second argument. Returns the position if the given key equals (case-insensitively) the real key stored at that position or undef if not.

find

This is the central method for lookups, used by exists() and FETCH.

It expects a key as its only argument.

Upon success, the method returns an array index at which the corresponding value can be found, or undef otherwise.

normalize

Given a key as the its only argument, this method will return the normalized key in scalar and a three element list in array context, consisting of

0.

the "prefix"

1.

the "search pattern" and

2.

the "normalized key".

pos

expects an (usually normalized) key as (its only) argument and returns the position at which this key is stored (if it exists) or should be sorted (if it does not already exist).

startover

expects no arguments and simply resets the iterator for the hash, so that the next call to each() will return the first key/value pair again.

BUGS

None known so far.

AUTHOR

Martin H. Sluka
mailto:martin@sluka.de
http://martin.sluka.de/

THANKS TO

Dr. Hermann Schier from the Max Planck Institute for Solid State Research in Stuttgart/Germany for initiating and underwriting the development of this module and for contribution a lot of ideas.

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

SEE ALSO

Tie::Hash::Array

3 POD Errors

The following errors were encountered while parsing the POD:

Around line 14:

Non-ASCII character seen before =encoding in 'für'. Assuming CP1252

Around line 380:

Expected text after =item, not a number

Around line 384:

Expected text after =item, not a number