NAME

WordNet::QueryData - direct perl interface to WordNet database

SYNOPSIS

use WordNet::QueryData;

my $wn = WordNet::QueryData->new;

print "Synset: ", join(", ", $wn->querySense("cat#n#7", "syns")), "\n";
print "Hyponyms: ", join(", ", $wn->querySense("cat#n#1", "hypo")), "\n";
print "Parts of Speech: ", join(", ", $wn->querySense("run")), "\n";
print "Senses: ", join(", ", $wn->querySense("run#v")), "\n";
print "Forms: ", join(", ", $wn->validForms("lay down#v")), "\n";
print "Noun count: ", scalar($wn->listAllWords("noun")), "\n";
print "Antonyms: ", join(", ", $wn->queryWord("dark#n#1", "ants")), "\n";

DESCRIPTION

WordNet::QueryData provides a direct interface to the WordNet database files. It requires the WordNet package (http://www.cogsci.princeton.edu/~wn/). It allows the user direct access to the full WordNet semantic lexicon. All parts of speech are supported and access is generally very efficient because the index and morphical exclusion tables are loaded at initialization. This initialization step is slow (appx. 10-15 seconds), but queries are very fast thereafter---thousands of queries can be completed every second.

USAGE

LOCATING THE WORDNET DATABASE

To use QueryData, you must tell it where your WordNet database is. There are two ways you can do this: 1) by setting the appropriate environment variables, or 2) by passing the location to QueryData when you invoke the "new" function.

QueryData knows about two environment variables, WNHOME and WNSEARCHDIR. By default, QueryData assumes that WordNet data files are located in WNHOME/WNSEARCHDIR (WNHOME\WNSEARCHDIR on a PC), where WNHOME defaults to "/usr/local/WordNet-1.7.1" on Unix and "C:\Program Files\WordNet\1.7.1" on a PC. WNSEARCHDIR defaults to "dict". Normally, all you have to do is to set the WNHOME variable to the location where you unpacked your WordNet distribution. The database files are always unpacked to the "dict" subdirectory.

You can also pass the location of the database files directly to QueryData. To do this, pass the location to "new":

my $wn = new WordNet::QueryData->new("/usr/local/wordnet/dict")

When calling "new" in this fashion, you can give it a second verbosity argument; a true value will have QueryData print debugging information.

QUERYING THE DATABASE

There are two primary query functions, 'querySense' and 'queryWord'. querySense accesses relations between senses; queryWord accesses relations between words. Most relations (including hypernym, hyponym, meronym, holonym) are between senses. Those between words include "also see", antonym, pertainym and "participle of verb." The glossary definition of a sense and the words in a synset are obtained via querySense.

Both functions take as their first argument a query string that takes one of three types:

(1) word (e.g. "dog")
(2) word#pos (e.g. "house#n")
(3) word#pos#sense (e.g. "ghostly#a#1")

Types (1) or (2) passed to querySense will return a list of possible query strings at the next level of specificity. Type (1) passed to queryWord will do the same. When type (3) is passed to querySense, it requires a second argument, a relation. Possible relations are:

syns - synset words
ants - antonyms
hype - hypernyms
hypo - hyponyms
mmem - member meronyms
msub - substance meronyms
mprt - part meronyms
mero - all meronyms
hmem - member holonyms
hsub - substance holonyms
hprt - part holonyms
holo - all holonyms
attr - attributes (?)
enta - entailment (verbs only)
caus - cause (verbs only)
also - also see
vgrp - verb group (verbs only)
sim - similar to (adjectives only)
part - participle of verb (adjectives only)
pert - pertainym (pertains to noun) (adjectives only)
glos - word definition

When called in this manner, querySense will return a list of related senses. When queryWord is called with a type (2), it requires a relation and will return a list of related words (type (2) strings).

OTHER FUNCTIONS

"validForms" accepts a type (1) or (2) query string. It returns a list of all alternate forms (alternate spellings, conjugations, plural/singular forms, etc.) that WordNet recognizes. The type (1) query returns alternates for all parts of speech (noun, verb, adjective, adverb).

"listAllWords" accepts a part of speech and returns the full list of words in the WordNet database for that part of speech.

"level" accepts a type (3) query string and returns a distance (not necessarily the shortest or longest) to the root in the hypernym directed acyclic graph.

"offset" accepts a type (3) query string and returns the binary offset of that sense's location in the corresponding data file.

"tagSenseCnt" accepts a type (2) query string and returns the tagsense_cnt value for that lemma: "number of senses of lemma that are ranked according to their frequency of occurrence in semantic concordance texts."

See test.pl for additional example usage.

NOTES

Requires access to WordNet database files (data.noun/noun.dat, index.noun/noun.idx, etc.)

COPYRIGHT

Copyright 2000, 2001, 2002 Jason Rennie. All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

perl(1)

http://www.cogsci.princeton.edu/~wn/

http://www.ai.mit.edu/people/jrennie/WordNet/