NAME
WordNet::QueryData - direct perl interface to WordNet database
SYNOPSIS
use WordNet::QueryData;
# Load index, mophological exclusion files (time-consuming process)
my $wn = WordNet::QueryData->new ("/usr/local/dict", 1);
# Synset of cat, sense #7
print "Cat#7-> ", join (", ", $wn->query ("cat#n#7", "syns")), "\n";
# Hyponyms of cat, sense #1 (house cat)
print "Cat#1-> ", join (", ", $wn->query ("cat#n#1", "hypo")), "\n";
# Senses of run as a verb
print "Run->", join (", ", $wn->query ("run#v")), "\n";
# Base form(s) of the verb 'lay down'
print "lay down-> ", join (", ", $wn->valid_forms ("lay down#v")), "\n";
# Print number of nouns in WordNet
print "Count of nouns: ", scalar($wn->list_all_words("noun")), "\n";
DESCRIPTION
WordNet::QueryData provides a direct interface to the WordNet database files. It requires the WordNet package (http://www.cogsci.princeton.edu/~wn/). It allows the user direct access to the full WordNet semantic lexicon. All parts of speech are supported and access is generally very efficient because the index and morphical exclusion tables are loaded at initialization. Things are more or less optimized for long sessions of queries---the 'new' invocation load the entire index table and all of the morphological exclusions. My PII/400 takes about 15 seconds to do this. Memory usage is on the order of 18 Megs. If I get enough requests, I may work on making this a less demanding step. However, once the index and morph. exc. files are loaded, queries are very fast.
USAGE
To use the WordNet::QueryData module, incorporate the package with "use WordNet::QueryData;". Then, establish an instance of the package with "my $wn = new WordNet::QueryData ("/usr/local/dict");". If the WordNet dict is not located in /usr/local/dict on your system, pass the correct directory as the first argument of the function call. You may pass a second argument of 1 if you wish the module to print out progress and verbose error messages.
WordNet::QueryData is object-oriented. You can establish multiple instances simply by using 'new' multiple times, however, the only practical use I can see for this is comparing data from different WordNet versions. I did the module OO-style because I had never done an OO perl module, figured it was time to learn and thought it might make the code a bit cleaner.
The WordNet::QueryData object has two object functions that you might want to use, 'valid_forms' and 'query'. 'query' gives you direct access to the large set of WordNet relations. It accepts a query string and a relation. The query string may be at one of three specification levels:
1) WORD (e.g. dog)
2) WORD#POS (e.g. house#noun)
3) WORD#POS#SENSE (e.g. ghostly#adj#1)
WORD is simply an english word. Spaces should be used to separate tokens (not underscores as is used in the WordNet database). Case does not matter. In order to get a meaningful result, the word must exactly match one of the words in the WordNet database files. Use 'valid_forms' to determine the form in which WordNet stores the word.
POS is the part of speech. Use 'n' for noun, 'v' for verb, 'a' for adjective and 'r' for adverb. You may also use full names and some abbreviations (as above and in test.pl). POS is optional for calls to 'query', and required for calls to 'valid_forms'. SENSE is a number that uniquely identifies the word sense. You can 'query' using a WORD#POS form to get a list of the word's senses for that part of speech.
Executing 'query' with only a WORD will return a list of WORD#POS strings. Passing 'query' a WORD#POS will return a list of WORD#POS#SENSE strings. 'query' calls of these forms do not return any information about WordNet relations pertaining to WORD. The third format, WORD#POS#SENSE, requires a second argument, RELATION, which may be any of the strings used by WordNet to designate a relation. Here is a list of most (if not all) of them:
syns - synset words
ants - antonyms
hype - hypernyms
hypo - hyponyms
mmem - member meronyms
msub - substance meronyms
mprt - part meronyms
mero - all meronyms
hmem - member holonyms
hsub - substance holonyms
hprt - part holonyms
holo - all holonyms
attr - attributes (?)
enta - entailment (verbs only)
caus - cause (verbs only)
also - also see
vgrp - verb group (verbs only)
sim - similar to (adjectives only)
part - participle of verb (adjectives only)
pert - pertainym (pertains to noun) (adjectives only)
glos - word definition
Such queries return a list of corresponding strings in the WORD#POS#SENSE format. In the case of 'syns', one string is returned for each word that is part of the synset. For other relations, one WORD#POS#SENSE string is returned for each synset (you can map 'syns' on to the returned array to get a full list of the words for a relation). In the case of relations like 'hype' and 'hypo', query returns only the immediate hypernyms or hyponyms. You can use 'query' recursively to get a full hyper/hyponym tree.
While 'query' requires that WORD exactly matches an entry in WordNet, QueryData has functionality for determining the WordNet baseforms to which a particular WORD may correpsond. This functionality is encapsulated in 'valid_forms'. After suppling 'valid_forms' with QUERY, a string in the form WORD#POS, 'valid_forms' will return a list of WORD#POS strings which are existing WordNet entries that are base forms of QUERY. Normally, one string will be returned. An empty list will be returned if QUERY is not a word that WordNet knows about.
QueryData also has functionality for retrieving WordNet datafile offsets (the unique number that identifies a word sense). The function 'offset' accepts a fully-qualified word sense in the form WORD#POS#SENSE and returns the corresponding numerical offset. See WordNet documentation for more information about this quantity.
The function 'list_all_words' will return an array of all words of a particular part-of-speech given that part-of-speech as its only argument.
NOTES
Requires access to WordNet database files (data.noun, index.noun, etc.)
COPYRIGHT
Copyright 2000 Jason Rennie <jrennie@ai.mit.edu> All rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
SEE ALSO
perl(1)
http://www.cogsci.princeton.edu/~wn/
http://www.ai.mit.edu/~jrennie/WordNet/