The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

TODO things to do for WordNet::SenseRelate::AllWords

SYNOPSIS

Plans for future versions of AllWords, including AllWords.pm, wsd.pl driver, and web interface.

DESCRIPTION

For version 0.10

  • Had a number of reported test failures in 0.08 due to WordNet version issues. Need to add version checks to test cases to avoid this. These tests should use WordNet::Tools, not the deprecated version() method in QueryData.

  • Do compoundify using the WordNet::Tools module.

  • Investigate more on tagged text with WordNet::Similarity::jcn and in general tagged text with all options. When run via web interface of wsd.pl, pos tagged text and jcn together produced output that was of the form word#pos, with no sense indicated. Problem lies in AllWords.pm

  • Add an option in web interface to upload context as a file. File should be one line per sentence, one sentence per line.

  • Add an option to web interface for uploading config file for WordNet-Similarity relatedness measure. This will need to be passed through AllWords.pm to WordNet-Similarity.

  • The --compound option is not working with wsd.pl, should repair as a user might want to use a subset of the compounds in WordNet.

  • Investigate on stoplist handling for tagged text. (what is the problem here?)

  • Change from raw and parsed to a single plain format. The plain format will assume that the text has already been tokenized, and that each white space separated string is a token (word or punctuation mark). The format will further assume that sentence boundary detection has already been performed.

  • Move sentence splitter from wsd.pl to util program. All input formats will assume that input is one sentence per line, one line per sentence.

  • Expanding set of POS tags that can be used, either by modifying AllWords.pm or allowing user to submit a config file of some kind defining the tag set. At present limited to Penn TreeBank set of 47 tags.

  • Return codes from AllWords.pm to indicate if no relatedness found, stopword, or not in wordnet.

  • Return codes that identifies trace level (to enable color coding).

  • Graceful shutdown and restart of web server. Allow for a stop or restart command from the command line, rather than having to kill the process.

  • Adding proper logging for server. (What does this mean?)

  • Make a design decision about whether web interface should communicate directly with AllWords.pm via disambiguation method, or should use wsd.pl command.

  • Develop methods for testing web interface, or at least directly testing disambiguate method as used by web interface. Should have test cases that can be run to demonstrate problems as listed here, and also make sure that once they are fixed they stay fixed.

  • Expand the testing in /t for AllWords.pm and wsd.pl. Right now it's quite minimal, and has very limited coverage. We should have multiple .t files, organized in some way to indicate what kind of testing we are doing, maybe based on format and then options being used, as in tagged.t for testing pos tagged data, wntagged.t for wordnet tagged data, raw.t or plain.t for that format, and so there. There is no reason that the testing be confined to one file per module or program as it is now.

AUTHORS

 Varada Kolhatkar, University of Minnesota, Duluth
 kolha002 at d.umn.edu

 Ted Pedersen, University of Minnesota, Duluth
 tpederse at d.umn.edu

This document last modified by : $Id: TODO.pod,v 1.4 2008/04/09 22:48:44 tpederse Exp $

SEE ALSO

COPYRIGHT AND LICENSE

Copyright (c) 2008, Varada Kolhatkar, Ted Pedersen, Jason Michelizzi

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

Note: a copy of the GNU Free Documentation License is available on the web at http://www.gnu.org/copyleft/fdl.html and is included in this distribution as FDL.txt.