NAME
WordNet::QueryData - direct perl interface to WordNet database
SYNOPSIS
use WordNet::QueryData;
# Load index, mophological exclusion files (time-consuming process)
my $wn = WordNet::QueryData->new;
# Synset of cat, sense #7
print "Cat#7-> ", join (", ", $wn->query ("cat#n#7", "syns")), "\n";
# Hyponyms of cat, sense #1 (house cat)
print "Cat#1-> ", join (", ", $wn->query ("cat#n#1", "hypo")), "\n";
# Senses of run as a verb
print "Run->", join (", ", $wn->query ("run#v")), "\n";
# Base form(s) of the verb 'lay down'
print "lay down-> ", join (", ", $wn->valid_forms ("lay down#v")), "\n";
# Print number of nouns in WordNet
print "Count of nouns: ", scalar($wn->list_all_words("noun")), "\n";
DESCRIPTION
WordNet::QueryData provides a direct interface to the WordNet database files. It requires the WordNet package (http://www.cogsci.princeton.edu/~wn/). It allows the user direct access to the full WordNet semantic lexicon. All parts of speech are supported and access is generally very efficient because the index and morphical exclusion tables are loaded at initialization. This initialization step is slow (appx. 10-15 seconds), but queries are very fast thereafter---thousands of queries can be completed every second.
USAGE
To use the WordNet::QueryData module, incorporate the package with "use WordNet::QueryData;". Then, establish an instance of the package with "my $wn = new WordNet::QueryData->new;". Make sure that the environment variable WNHOME defines the location of your WordNet directory. You can also specify the WordNet directory directly: "my $wn = new WordNet::QueryData->new("/usr/local/wn17/dict");" A second argument to new can be used to have QueryData print out progress and warning messages (e.g. "my $wn = new WordNet::QueryData->new("/usr/local/wn17/dict",1);").
The WordNet::QueryData object has two main uses, 'valid_forms' and 'query'. 'query' gives you direct access to the large set of WordNet relations. It accepts a query string and a relation. The query string may be at one of three specification levels:
1) WORD (e.g. dog)
2) WORD#POS (e.g. house#n)
3) WORD#POS#SENSE (e.g. ghostly#a#1)
WORD is simply an english word. Spaces should be used to separate tokens (not underscores as is used in the WordNet database). Case does not matter. In order to get a meaningful result, the word must exactly match one of the words in the WordNet database files. Use 'valid_forms' to determine the form in which WordNet stores the word.
POS is the part of speech. Use 'n' for noun, 'v' for verb, 'a' for adjective and 'r' for adverb. POS is optional for calls to 'query', but required for calls to 'valid_forms'. SENSE is a number that uniquely identifies the word sense. Call 'query' with a WORD string to get the list of possible WORD#POS strings for that WORD. Call 'query' with a WORD#POS string to get the list of possible WORD#POS#SENSE strings.
When calling 'query' with a WORD#POS#SENSE string, a relation must also be supplied. The relation is the second argument to 'query' and may be any of the following 3- or 4-letter strings:
syns - synset words
ants - antonyms
hype - hypernyms
hypo - hyponyms
mmem - member meronyms
msub - substance meronyms
mprt - part meronyms
mero - all meronyms
hmem - member holonyms
hsub - substance holonyms
hprt - part holonyms
holo - all holonyms
attr - attributes (?)
enta - entailment (verbs only)
caus - cause (verbs only)
also - also see
vgrp - verb group (verbs only)
sim - similar to (adjectives only)
part - participle of verb (adjectives only)
pert - pertainym (pertains to noun) (adjectives only)
glos - word definition
The returned value is usually a list of WORD#POS#SENSE strings, but can vary according to the relation (e.g. 'glos' always returns a free-form string).
The 'valid_forms' function returns variants on a word---alternate spellings, conjugations, plural/singular forms, etc. It accepts a WORD#POS string and returns a list of WORD#POS strings that qualify as alternate forms of the same word. An empty list will be returned if the argument is a word that WordNet does not know about.
QueryData also has functionality for retrieving WordNet datafile offsets (the unique number that identifies a word sense). The function 'offset' accepts a fully-qualified word sense in the form WORD#POS#SENSE and returns the corresponding numerical offset. These numbers correspond to locations in the data files (e.g. data.noun/noun.dat).
The function 'list_all_words' returns a list of all words with that part of speech.
See test.pl for additional example uses of these functions.
NOTES
Requires access to WordNet database files (data.noun/noun.dat, index.noun/noun.idx, etc.)
COPYRIGHT
Copyright 2000, 2001, 2002 Jason Rennie <jrennie@ai.mit.edu> All rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
SEE ALSO
perl(1)
http://www.cogsci.princeton.edu/~wn/
http://www.ai.mit.edu/~jrennie/WordNet/