NAME

WordNet::QueryData - direct perl interface to WordNet database

SYNOPSIS

use WordNet::QueryData;

# Load index, mophological exclusion files---slow process my $wn = WordNet::QueryData->new ("/usr/local.foo/dict", 1);

# Possible forms that you might find 'ghostliest' in WordNet print "Ghostliest-> ", join (", ", $wn->forms ("ghostliest", 3)), "\n";

# Synset of cat, sense #7 print "Cat#7-> ", join (", ", $wn->query ("cat#n#7", "synset")), "\n";

# Hyponyms of cat, sense #1 (house cat) print "Cat#1-> ", join (", ", $wn->query ("cat#n#1", "hyponym")), "\n";

# Senses of run as a verb print "Run->", join (", ", $wn->query ("run#verb")), "\n";

DESCRIPTION

WordNet::QueryData provides a direct interface to the WordNet database files. It requires the WordNet package (http://www.cogsci.princeton.edu/~wn/). It allows the user direct access to the full WordNet semantic lexicon. All parts of speech are supported and access is generally very efficient because the index and morphical exclusion tables are loaded at initialization. Things are more or less optimized for long sessions of queries---the 'new' invocation load the entire index table and all of the morphological exclusions. My PII/400 takes about 15 seconds to do this. Memory usage is on the order of 18 Megs. If I get enough requests, I may work on making this a less demanding step. Once the index and morph. exc. files are loaded, queries are very fast.

USAGE

To use the WordNet::QueryData module, incorporate the package with "use WordNet::QueryData;". Then, establish an instance of the package with "my $wn = new WordNet::QueryData ("/usr/local/dict");". If the WordNet dict directory is not /usr/local/dict on your system, pass the correct directory as the first argument of the function call. You may pass a second argument of 1 if you wish the module to print out progress and verbose error messages.

WordNet::QueryData is object-oriented. You can establish multiple instances simply by using 'new' multiple times, however, the only practical use I can see for this is comparing data from different WordNet versions. I did the module OO-style because I had never done an OO perl module, figured it was time to learn and thought it might make the code a bit cleaner. The WordNet::QueryData object has two object functions that you might want to use, 'forms' and 'query'. 'query' is the function that gives you access to all of the WordNet relations. It accepts a query string and a relation. The query string may be at one of three specification levels:

1) WORD (e.g. dog) 2) WORD#POS (e.g. house#noun) 3) WORD#POS#SENSE (e.g. ghostly#adj#1)

WORD is simply an english word. Spaces should be used to separate tokens (not underscores as is used in the WordNet database). Case does not matter. At this time, the word must exactly match one of the words in the WordNet database files. You can use 'forms' to search for the form of a word that WordNet contains. Eventually, I'll integrate this with 'query' so that no manual search is necessary.

POS is the part of speech. Use 'n' for noun, 'v' for verb, 'a' for adjective and 'r' for adverb. You may also use full names and some abbreviations (as above and in test.pl).

SENSE is the sense number that uniquely identifies the sense of the word.

Query #1 will return a list of WORD#POS strings, one for each part of speech that the word is used as. Query #2 will return a list of WORD#POS#SENSE strings. Query #1 and #2 are essentially used to search for the sense for which you are looking. When making such a query, no relation (2nd argument) should be passed to 'query'. Query #3 is the interesting one and allows you to make use of all of the WordNet relations. It requires a second argument, a relation, which may be one of the following:

syns - synset words ants - antonyms hype - hypernyms hypo - hyponyms mmem - member meronyms msub - substance meronyms mprt - part meronyms mero - all meronyms hmem - member holonyms hsub - substance holonyms hprt - part holonyms holo - all holonyms attr - attributes (?) enta - entailment (verbs only) caus - cause (verbs only) also - also see vgrp - verb group (verbs only) sim - similar to (adjectives only) part - participle of verb (adjectives only) pert - pertainym (pertains to noun) (adjectives only)

Longer names are also allowed. Each relation returns a list of strings. Each string is in WORD#POS#SENSE form and corresponds to a specific sense. In the case of 'syns', one string is returned for each word that is part of the synset. For other relations, a single string is returned for each synset (you can map 'syns' on to the returned array to get the words for a relation). In the case of relations like 'hype' and 'hypo', query returns only the immediate hypernyms or hyponyms. You can use 'query' recursively to get a full hyper/hyponym tree.

NOTES

Requires existence of WordNet database files (stored in 'dict' directory).

COPYRIGHT

Copyright 1999 Jason Rennie <jrennie@ai.mit.edu> All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

perl(1)

http://www.cogsci.princeton.edu/~wn/

http://www.ai.mit.edu/~jrennie/WordNet/

LOG

$Log: QueryData.pm,v $ Revision 1.2 1999/09/15 19:53:50 jrennie add url

Revision 1.1 1999/09/15 13:27:44 jrennie new QueryData directory

Revision 1.3 1999/09/15 12:16:59 jrennie (get_all_words): fix allow long relation names; allow long POS names; check for illegal POS

Revision 1.2 1999/09/14 22:23:35 jrennie first draft of direct access to WordNet data files; 'new'ing is slow; about 15 seconds on my PII/400. Memory consumption using WordNet 1.6 is appx. 16M. Still need to integrate forms into query. query requires the word form to be exactly like that in WordNet (although caplitalization may differ)

Revision 1.1 1999/09/13 14:59:35 jrennie access data files directly; us a more OO style of coding; initialization (new) code is pretty much done; forms is done