NAME

Lingua::LinkParser - Perl module implementing the Link Grammar Parser by Sleator, Temperley and Lafferty at CMU.

SYNOPSIS

 use Lingua::LinkParser;

 our $parser = new Lingua::LinkParser;
 my $sentence = $parser->create_sentence("This is the turning point.");
 my @linkages = $parser->linkages;
 foreach $linkage (@linkages) {
     print ($parser->get_diagram($linkage));
 }

DESCRIPTION

To quote the Link Grammar documentation, "the Link Grammar Parser is a syntactic parser of English, based on link grammar, an original theory of English syntax. Given a sentence, the system assigns to it a syntactic structure, which consists of set of labeled links connecting pairs of words."

This module provides acccess to the parser API using Perl objects to easily analyze linkages. The module organizes data returned from the parser API into an object hierarchy consisting of, in order, sentence, linkage, sublinkage, and link. If this is unclear to you, see the several examples in the 'eg/' directory for a jumpstart on using these objects. The current Lingua::LinkParser module is based on version 4.0 of the Link Grammar parser API.

The objects within this module should not be confused with the types familiar to users of the Link Parser API. The objects used in this module reorganize the API data in a way more usable and friendly to Perl users, and do not exactly represent the types used in the API. For example, an object of class "Lingua::LinkParser::sentence" does not directly correspond to the struct type "Sentence" of the API; rather, it is a Perl object that provides methods to access the underlying API functions.

This documentation should be supplemented with the extensive texts included with the Link Parser and on the Link Parser web site in order to understand its vernacular and general usage. A new module, Lingua::LinkParser::Definitions stores the link type documentation, and allows in-program retrieval of this information.

$parser = new Lingua::LinkParser(DictFile => 'PATH', KnowFile => 'PATH', ConstFile => 'PATH', AffixFile => 'PATH')

This returns a new Lingua::LinkParser object, loads the specified dictionary files, and sets basic configuration. If no dictionary files are specified, the parser will attempt to load the files using the path in global $DATA_DIR. This is a change from the Link Parser 3.0 implementation, where defaults were stored in the C API. The hash passed may also contain keys equivalent to the link parser options, in order to set these before a parser object is returned. These options are all lower case, and listed later in this document.

$parser->opts(OPTION_NAME => OPTION_VALUE, ...)

This sets the parser option OPTION_NAME to the value specified by OPTION_VALUE. A full list of these options is found at the end of this document, as well as in the Link Parser distribution documentation.

$sentence = $parser->create_sentence(TEXT)

Creates and assigns a sentence object (Lingua::LinkParser::Sentence) using the supplied value. This object is used in subsequent creation and analysis of linkages.

$sentence->length

Returns the number of words in the tokenized sentence, including the boundary words and punctuation.

$sentence->num_linkages

Returns the number of linkages found for $sentence.

$sentence->num_valid_linkages

Returns the number of valid linkages for $sentence

$sentence->num_linkages_post_processed

Returns the number of linkages that were post-processed.

$sentence->null_count

Returns the number of null links used in parsing the sentence.

$sentence->num_violations

Returns the number of post processing violations for $sentence.

$sentence->get_word(NUM)

Returns the word (with original spelling) at position NUM.

$linkage = $sentence->linkage(NUM)

Assigns a linkage object (Lingua::LinkParser::Linkage) for linkage NUM of sentence $sentence.

@linkages = $sentence->linkages

Assigns a list of linkage objects for all linkages of $sentence.

$linkage->num_words

Returns the number of words within $linkage.

$linkage->get_words

Returns a list of words within $linkage

$linkage->words

Returns a list of ::Word objects for $linkage.

$linkage->num_sublinkages

Returns the number of sublinkages for linkage $linkage.

$linkage->compute_union

Combines the sublinkages for $linkage into one, possibly with crossing links.

$linkage->violation_name

Returns the name of a rule violated by post-processing of the linkage.

$sublinkage = $linkage->sublinkage(NUM)

Assigns a sublinkage object (Lingua::LinkParser::Linkage::Sublinkage) for sublinkage NUM of linkage $linkage.

@sublinkages = $linkage->sublinkages

Assigns an array of sublinkage objects.

$sublinkage->get_word(NUM)

Returns the word for the sublinkage at position NUM.

$sublinkage->words

Returns a list of ::Word objects for $sublinkage.

Returns the number of links for sublinkage $sublinkage.

$word->text

Returns the post-parse word text.

$word->position

Returns the number for the word's position in a sentence.

Returns a list of link objects for the word.

Assigns a link object (Lingua::LinkParser::Link) for link NUM of sublinkage $sublinkage.

Assigns an array of link objects.

Returns the number of domains for the sublinkage.

Returns a list of the domain names for $link.

Returns the "intersection" label for $link.

Returns the left label for $link.

Returns the right label for $link.

Returns the number of the left word for $link.

Returns the number of the right word for $link.

Returns the length of the link.

Only for link objects created via a word object, this returns the label for the link from the word object that created it.

Only for link objects created via a word object, this returns the word text which the link points *to* from the object that created it.

Only for link objects created via a word object, this returns the number of the word which the link points *to* from the object that created it.

$parser->get_diagram($linkage)

Returns an ASCII pretty-printed diagram of the specified linkage or sublinkage.

$parser->get_postscript($linkage, MODE)

Returns Postscript code for a diagram of the specified linkage or sublinkage.

$parser->get_domains($linkage)

Returns formatted ASCII text showing the links and domains for the specified linkage or sublinkage.

$parser->print_constituent_tree($linkage, MODE)

Returns an ASCII formatted tree displaying the constituent parse tree for $linkage. MODE is an integer with the following meanings: '1' will display the tree using a nested Lisp format, '2' specifies that a flat tree is displayed with brackets, and '0' results in no structure, a null string being returned.

OTHER FUNCTIONS

A few high-level functions have also been provided.

@bigstruct = $sentence->get_bigstruct

Assigns a potentially large data structure merging all linkages/sublinkages/links for $sentence. This structure is an array of hashes, with a single array entry for each word in the sentence. This function is only useful for high-level analysis of sentence grammar; most applications should be served by using the above functions.

This array has the following structure:

@bigstruct ( %{ 'word'  => 'WORD',
                'links' => %{
                   'LINKTYPE_LINKAGENUM' => 'TARGETWORDNUM',...
                },
               }
          , ...);

Where LINKAGENUM is the number of the linkage for $sentence, and LINKTYPE is the link type label. TARGETWORDNUM is the number of the word to which each link connects.

get_bigstruct() can be useful in finding, for example, all links for a given word in a given sentence:

$sentence = $parser->create_sentence(
     "Architecture is present in nearly every civilized society.");
@bigstruct = $sentence->get_bigstruct;

print "\nword 8: ", $bigstruct[8]->{word}, "\n";

while (($k,$v) = each %{$bigstruct[8]->{links}} )
     { print " $k => ", $bigstruct[$v]->{word}, "\n"; }

This would output:

  word 8: society.n
   Dsu => every.d
   Jp => in
   A => civilized.a

Signifying that for word "society", links are found of type A (pre-noun adjective) with "civilized" (tagged 'a' for adjective), type Jp (preposition to object) with "in", and type Dsu (noun determiner, singular-mass) with word "every", which is tagged 'd' for determiner.

The following example adds the usage of a Lingua::LinkParser::Definitions object to display the link definitions along with the link types. Note that this is an optional module, and is only really useful for human-readable display:

use Lingua::LinkParser::Definitions qw(define);

$sentence = $parser->create_sentence(
     "Architecture is present in nearly every civilized society.");
@bigstruct = $sentence->get_bigstruct;

print "\nword $i: ", $bigstruct[$i]->{word}, "\n";

while (($k,$v) = each %{$bigstruct[$i]->{links}} )
     { print " $k => ", $bigstruct[$v]->{word}, " (", define($k), ")\n"; }

Yielding:

word 8: society.n
 Dsu => every.d (D connects determiners to nouns: "THE DOG chased A CAT and SOME BIRDS".  )
 Jp => in (J connects prepositions to their objects: "The man WITH the HAT is here".  )
 A => civilized.a (A connects pre-noun ("attributive") adjectives to following nouns: "The BIG DOG chased me", "The BIG BLACK UGLY DOG chased me".)

LINK PARSER OPTIONS

The following list of options may be set or retrieved with Lingua::LinkParser object with the function:

$parser->opts(OPTION, [VALUE])

Supplying no VALUE returns the current value for OPTION. Note that not all of the options are implemented by the API, and instead are intended for use by the program. A more complete list of these options may be found in the parser documentation.

verbosity
 The level of detail reported during processing, 0 reports nothing.

linkage_limit
 The maximum number of linkages to process for a sentence.

disjunct_cost
 Determines the maximum disjunct cost used during parsing, where the cost of a disjunct is equal to the maximum cost of all of its connectors.

min_null_count
max_null_count
 The range of null links to parse.

null_block
 Sets the block count ratio for null linkages; a value of '4' causes a linkage of 1, 2, 3, or 4 null links to have a null cost of 1.

short_length
 Limits the number length of links to this value (the number of words a link can span).

islands_ok
 Allows 'islands' of links (links not connected to the 'wall') when set.

max_parse_time
 Determines the approximate maximum time permitted for parsing.

max_memory
 Determines the maximum memory allowed during parsing.

timer_expired
memory_exhausted
resources_exhausted
reset_resources
 These options tell whether the timer or memory constraints have been exceeded during parsing.

cost_model_type

screen_width
 Sets the screen width for pretty-print functions.

allow_null
 Allow or disallow null links in linkages.

display_walls
 Toggles the display of linkage "walls".

all_short_connectors
 If true, then all connectors have length restrictions imposed on them.

AUTHOR

Daniel Brian, dbrian@brians.org

SEE ALSO

perl(1). http://www.link.cs.cmu.edu/link/.