NAME
Lingua::EN::BioLemmatizer - Perl interface to the University of Colorado's BioLemmatizer
SYNOPSIS
Procedural summary:
use Lingua::EN::BioLemmatizer qw(biolemma);
print biolemma("phyla"), "\n";
print biolemma("phyla", "NNS"), "\n";
use Lingua::EN::BioLemmatizer qw(biolemma parse_response);
my @triples = parse_response(biolemma("phyla"));
Object-Oriented summary:
use Lingua::EN::BioLemmatizer;
my $server = new Lingua::EN::BioLemmatizer;
my $answer = $server->get_biolemma("phyla");
my $answer = $server->get_biolemma("phyla", "NNS");
use Lingua::EN::BioLemmatizer;
my $server = new Lingua::EN::BioLemmatizer qw(parse_response);
my @triples = parse_response( $server->get_biolemma("phyla") );
DESCRIPTION
Perl module to interface with the University of Colorado's BioLemmatizer code. Both a procedural and an OO interface are supported. Tested with Perl v5.10, v5.12, and v5.14. Will not work on earlier Perl versions, but should work with later ones.
To use this module, you must first download the BioLemmatizer jarfile from http://biolemmatizer.sourceforge.net, and then set the environment variable BIOLEMATIZER
to the path of that jarfile. You also need a working Java installation. See the SourceForge documentation for any details about the BioLemmatizer itself.
Procedural Interface
The procedural interface is an easy front-end to the underlying object interface. Its advantage is simplicity. Its disadvantage is that the resources associated with the remote server, including filehandles and a lemma cache, will be held onto forever. Use the OO interface if you want normal destructor behavior to take care of that for you.
- $lemma = biolemma(STRING)
-
Returns the raw (unparsed) response from the BioLemmatizer server for the given string. Use
parse_response
to parse this. - @triples = parse_response(STRING)
- $aref = parse_response(STRING)
-
Parses response into an array of triples as subarrays. In scalar context, returns array ref to this array.
For example, given an input of:
"name vvz NUPOS||name VBZ PennPOS||name NNS PennPOS||name n2 NUPOS"
the list-context return works like this:
@list_of_triples = ( ["name", "vvz", "NUPOS"], ["name", "VBZ", "PennPOS"], ["name", "NNS", "PennPOS"], ["name", "n2", "NUPOS"], );
and the scalar context-return works like this:
$ref_to_triples = [ ["name", "vvz", "NUPOS"], ["name", "VBZ", "PennPOS"], ["name", "NNS", "PennPOS"], ["name", "n2", "NUPOS"], ];
Object Interface
- new()
-
Class constructor; must be called as a class method. Takes no arguments. To configure object to take non-default strings, first make class method calls to
java_path
,java_arg
,jar_path
, orjar_arg
with the new strings as arguments. - get_biolemma(STRING)
-
Returns response from BioLemmatizer server when given a request of
STRING
. - command_args()
-
Returns all args used to start server, either as an list in list context or else as one string in scalar context. Used as an object method, returns whatever value was extant when object was constructed. Used as a class method, returns current defaults.
- java_path
-
Returns the current path to Java, which is "java" by default; can be reset by calling as a class method with a new path before a constructor is called. Used as an object method, returns whatever value was extant when object was constructed.
- jar_path
-
Returns the current path to the BioLemmatizer jar file, which is "BioLemmatizer_interactive.jar" by default; can be reset by calling as a class method with a new path before a constructor is called. Used as an object method, returns whatever value was extant when object was constructed.
- java_args
-
Returns any extra args passed to the Java program, either as a list in list context or as an array ref in scalar context. Default is
("-Xmx1G", "-Dfile.encoding=utf8")
but this can be reset by calling as a class method with new arguments before a constructor is called. Used as an object method, returns whatever value was extant when object was constructed. - jar_args
-
Returns any final args passed after the jar file, either as a list in list context or as an array ref in scalar context. Default is
("-t")
but this can be reset by calling as a class method with new arguments before a constructor is called. Used as an object method, returns whatever value was extant when object was constructed. - child_pid()
-
Returns the pid of the BioLemmatizer server. Could be used to inspect the process status.
- into_biolemmer()
-
(INTERNAL API) Returns the filehandle for writing to the BioLemmatizer server.
- from_biolemmer()
-
(INTERNAL API) Returns the filehandle for reading from the BioLemmatizer server.
- lemma_cache()
-
(INTERNAL API) Returns the hash ref used to cache the mapping of strings to lemmas.
EXAMPLES
Procedural example:
use Lingua::EN::BioLemmatizer qw(biolemma);
my @words = qw(these broken pieces are phyla grandchildren);
my @pairs = ("lives NNS", "lives VBZ");
for my $word (@words, @pairs) {
say "$word => ", biolemma($word);
}
OO example:
use Lingua::EN::BioLemmatizer;
my @words = qw(these broken pieces are phyla grandchildren);
my @pairs = ("lives NNS", "lives VBZ");
# scope for private variable
{
my $server = new Lingua::EN::BioLemmatizer;
for my $word (@words, @pairs) {
say "$word => ", $server->get_biolemma($word);
}
}
# server goes out of scope, so gets destroyed
ENVIRONMENT
The following environment variables are used by this module:
- BIOLEMMATIZER
-
If set, holds the path to the BioLemmatizer jarfile. If unset, the jarfile used defaults to the file ./biolemmatizer-core-1.0-jar-with-dependencies.jar in the process's current working directory.
BUGS
None known.
RELEASE HISTORY
AUTHOR
Tom Christiansen <tchrist@perl.com>
COPYRIGHT AND LICENCE
Copyright 2012 Tom Christiansen.
This program is free software; you may redistribute it and/or modify it under the same terms as Perl itself.