NAME
Lingua::LinkParser - Perl module implementing the Link Grammar Parser by Sleator, Temperley and Lafferty at CMU.
SYNOPSIS
use Lingua::LinkParser;
our $parser = new Lingua::LinkParser;
my $sentence = $parser->create_sentence("This is the turning point.");
my @linkages = $sentence->linkages;
# If there are NO LINKAGES, set min_null_count to a positive number:
# $parser->opts('min_null_count' => 1);
# See scripts/parse.pl for examples.
foreach $linkage (@linkages) {
print ($parser->get_diagram($linkage));
}
DESCRIPTION
To quote the Link Grammar documentation, "the Link Grammar Parser is a syntactic parser of English, based on link grammar, an original theory of English syntax. Given a sentence, the system assigns to it a syntactic structure, which consists of set of labeled links connecting pairs of words."
This module provides acccess to the parser API using Perl objects to easily analyze linkages. The module organizes data returned from the parser API into an object hierarchy consisting of, in order, sentence, linkage, sublinkage, and link. If this is unclear to you, see the several examples in the 'eg/' directory for a jumpstart on using these objects. The current Lingua::LinkParser module is based on version 4.0 of the Link Grammar parser API.
The objects within this module should not be confused with the types familiar to users of the Link Parser API. The objects used in this module reorganize the API data in a way more usable and friendly to Perl users, and do not exactly represent the types used in the API. For example, an object of class "Lingua::LinkParser::sentence" does not directly correspond to the struct type "Sentence" of the API; rather, it is a Perl object that provides methods to access the underlying API functions.
This documentation should be supplemented with the extensive texts included with the Link Parser and on the Link Parser web site in order to understand its vernacular and general usage. A new module, Lingua::LinkParser::Definitions stores the link type documentation, and allows in-program retrieval of this information.
- $parser = new Lingua::LinkParser( Lang => "en" )
-
This returns a new Lingua::LinkParser object, loads dictionary files, and sets basic configuration. This constructor no longer takes a full path to the dictionary files; they are expected to exist in the locations standard to the 4.2 parser distribution.
- $parser->opts(OPTION_NAME => OPTION_VALUE, ...)
-
This sets the parser option OPTION_NAME to the value specified by OPTION_VALUE. A full list of these options is found at the end of this document, as well as in the Link Parser distribution documentation.
- $sentence = $parser->create_sentence(TEXT)
-
Creates and assigns a sentence object (Lingua::LinkParser::Sentence) using the supplied value. This object is used in subsequent creation and analysis of linkages.
- $sentence->length
-
Returns the number of words in the tokenized sentence, including the boundary words and punctuation.
- $sentence->num_linkages
-
Returns the number of linkages found for $sentence.
- $sentence->num_valid_linkages
-
Returns the number of valid linkages for $sentence
- $sentence->num_linkages_post_processed
-
Returns the number of linkages that were post-processed.
- $sentence->null_count
-
Returns the number of null links used in parsing the sentence.
- $sentence->num_violations
-
Returns the number of post processing violations for $sentence.
- $sentence->get_word(NUM)
-
Returns the word (with original spelling) at position NUM.
- $linkage = $sentence->linkage(NUM)
-
Assigns a linkage object (Lingua::LinkParser::Linkage) for linkage NUM of sentence $sentence.
- @linkages = $sentence->linkages
-
Assigns a list of linkage objects for all linkages of $sentence.
- $linkage->num_words
-
Returns the number of words within $linkage.
- $linkage->get_words
-
Returns a list of words within $linkage
- $linkage->words
-
Returns a list of ::Word objects for $linkage.
- $linkage->num_sublinkages
-
Returns the number of sublinkages for linkage $linkage.
- $linkage->compute_union
-
Combines the sublinkages for $linkage into one, possibly with crossing links.
- $linkage->violation_name
-
Returns the name of a rule violated by post-processing of the linkage.
- $sublinkage = $linkage->sublinkage(NUM)
-
Assigns a sublinkage object (Lingua::LinkParser::Linkage::Sublinkage) for sublinkage NUM of linkage $linkage.
- @sublinkages = $linkage->sublinkages
-
Assigns an array of sublinkage objects.
- $sublinkage->get_word(NUM)
-
Returns the word for the sublinkage at position NUM.
- $sublinkage->words
-
Returns a list of ::Word objects for $sublinkage.
- $sublinkage->num_links
-
Returns the number of links for sublinkage $sublinkage.
- $word->text
-
Returns the post-parse word text.
- $word->position
-
Returns the number for the word's position in a sentence.
- @links = $word->links
-
Returns a list of link objects for the word.
- $link = $sublinkage->link(NUM)
-
Assigns a link object (Lingua::LinkParser::Link) for link NUM of sublinkage $sublinkage.
- @links = $sublinkage->links
-
Assigns an array of link objects.
- $link->num_domains
-
Returns the number of domains for the sublinkage.
- $link->domain_names
-
Returns a list of the domain names for $link.
- $link->label
-
Returns the "intersection" label for $link.
- $link->llabel
-
Returns the left label for $link.
- $link->rlabel
-
Returns the right label for $link.
- $link->lword
-
Returns the number of the left word for $link.
- $link->rword
-
Returns the number of the right word for $link.
- $link->length
-
Returns the length of the link.
- $link->linklabel
-
Only for link objects created via a word object, this returns the label for the link from the word object that created it.
- $link->linkword
-
Only for link objects created via a word object, this returns the word text which the link points *to* from the object that created it.
- $link->linkposition
-
Only for link objects created via a word object, this returns the number of the word which the link points *to* from the object that created it.
- $parser->get_diagram($linkage)
-
Returns an ASCII pretty-printed diagram of the specified linkage or sublinkage.
- $parser->get_postscript($linkage, MODE)
-
Returns Postscript code for a diagram of the specified linkage or sublinkage.
- $parser->get_domains($linkage)
-
Returns formatted ASCII text showing the links and domains for the specified linkage or sublinkage.
- $parser->print_constituent_tree($linkage, MODE)
-
Returns an ASCII formatted tree displaying the constituent parse tree for $linkage. MODE is an integer with the following meanings: '1' will display the tree using a nested Lisp format, '2' specifies that a flat tree is displayed with brackets, and '0' results in no structure, a null string being returned.
OTHER FUNCTIONS
A few high-level functions have also been provided.
- @bigstruct = $sentence->get_bigstruct
-
Assigns a potentially large data structure merging all linkages/sublinkages/links for $sentence. This structure is an array of hashes, with a single array entry for each word in the sentence. This function is only useful for high-level analysis of sentence grammar; most applications should be served by using the above functions.
This array has the following structure:
@bigstruct ( %{ 'word' => 'WORD', 'links' => %{ 'LINKTYPE_LINKAGENUM' => 'TARGETWORDNUM',... }, } , ...);
Where LINKAGENUM is the number of the linkage for $sentence, and LINKTYPE is the link type label. TARGETWORDNUM is the number of the word to which each link connects.
get_bigstruct() can be useful in finding, for example, all links for a given word in a given sentence:
$sentence = $parser->create_sentence( "Architecture is present in nearly every civilized society."); @bigstruct = $sentence->get_bigstruct; print "\nword 8: ", $bigstruct[8]->{word}, "\n"; while (($k,$v) = each %{$bigstruct[8]->{links}} ) { print " $k => ", $bigstruct[$v]->{word}, "\n"; }
This would output:
word 8: society.n Dsu => every.d Jp => in A => civilized.a
Signifying that for word "society", links are found of type A (pre-noun adjective) with "civilized" (tagged 'a' for adjective), type Jp (preposition to object) with "in", and type Dsu (noun determiner, singular-mass) with word "every", which is tagged 'd' for determiner.
The following example adds the usage of a Lingua::LinkParser::Definitions object to display the link definitions along with the link types. Note that this is an optional module, and is only really useful for human-readable display:
use Lingua::LinkParser::Definitions qw(define); $sentence = $parser->create_sentence( "Architecture is present in nearly every civilized society."); @bigstruct = $sentence->get_bigstruct; print "\nword $i: ", $bigstruct[$i]->{word}, "\n"; while (($k,$v) = each %{$bigstruct[$i]->{links}} ) { print " $k => ", $bigstruct[$v]->{word}, " (", define($k), ")\n"; }
Yielding:
word 8: society.n Dsu => every.d (D connects determiners to nouns: "THE DOG chased A CAT and SOME BIRDS". ) Jp => in (J connects prepositions to their objects: "The man WITH the HAT is here". ) A => civilized.a (A connects pre-noun ("attributive") adjectives to following nouns: "The BIG DOG chased me", "The BIG BLACK UGLY DOG chased me".)
LINK PARSER OPTIONS
The following list of options may be set or retrieved with Lingua::LinkParser object with the function:
$parser->opts(OPTION, [VALUE])
Supplying no VALUE returns the current value for OPTION. Note that not all of the options are implemented by the API, and instead are intended for use by the program. A more complete list of these options may be found in the parser documentation.
verbosity
The level of detail reported during processing, 0 reports nothing.
linkage_limit
The maximum number of linkages to process for a sentence.
disjunct_cost
Determines the maximum disjunct cost used during parsing, where the cost of a disjunct is equal to the maximum cost of all of its connectors.
min_null_count
max_null_count
The range of null links to parse.
null_block
Sets the block count ratio for null linkages; a value of '4' causes a linkage of 1, 2, 3, or 4 null links to have a null cost of 1.
short_length
Limits the number length of links to this value (the number of words a link can span).
islands_ok
Allows 'islands' of links (links not connected to the 'wall') when set.
max_parse_time
Determines the approximate maximum time permitted for parsing.
max_memory
Determines the maximum memory allowed during parsing.
timer_expired
memory_exhausted
resources_exhausted
reset_resources
These options tell whether the timer or memory constraints have been exceeded during parsing.
cost_model_type
screen_width
Sets the screen width for pretty-print functions.
allow_null
Allow or disallow null links in linkages.
display_walls
Toggles the display of linkage "walls".
all_short_connectors
If true, then all connectors have length restrictions imposed on them.
AUTHOR
Daniel Brian, dbrian@brians.org
SEE ALSO
perl(1). http://www.link.cs.cmu.edu/link/.