NAME
windower.pl
SYNOPSIS
Limits the contexts of given instances to W tokens around the target word.
USAGE
windower.pl [OPTIONS] SVAL2 W
INPUT
Required Arguments:
SVAL2
SVAL2 must be a tokenized and preprocessed instance file in the Senseval-2 format.
W
Should be a positive integer number specifying the window size. windower will display only the tokens that appear in the window of [-W, +W] centered around the target word.
Optional Arguments:
--plain
Output will be displayed in plain text format showing context of each instance on a single separate line. i.e. each i'th line on stdout will show the context of the i'th instance in the given SVAL2 file. By default, output is created in Senseval-2 format.
--token TOKENREGEX
TOKENREGEX should be a file containing Perl regular expressions that define the tokenization scheme in SVAL2. windower recognizes only those character sequences from SVAL2 that match the specified token regex/s, everything else will be ignored. If --token is not specified, windower searches the default token.regex file in the current directory.
--target TARGETREGEX
Specify a file containing Perl regular expressions that define the target word/s. Target words must be valid tokens recognizable by the specified tokenization scheme (via --token or token.regex)
Following are some of the examples of TARGET word regex files -
/<head>[Ll]ines?<\/head>/
which specifies that the target word could be
line, Line, lines or Lines
delimited in <head> and </head> tags.
Above regex can also be specified as multiple regexes in TARGET as -
/<head>line<\/head>/
/<head>lines<\/head>/
/<head>Line<\/head>/
/<head>Lines<\/head>/
with a single regex per line
Regex
/<head>\w+<\/head>/
shows a more general regex for target words marked in <head> tags
Regex
/<head.*>\w+<\/head>/
Shows the regex for matching target words in the original Senseval-2 data.
/[Ll]ines?/
shows that any occurrence of words - Line, line, Lines, lines are target words (that are not delimited in any special tags).
Other Options :
--help
Displays this message.
--version
Displays the version information.
OUTPUT
When --plain is not selected, OUTPUT is in Senseval-2 format that looks same as the input SVAL2 file except the context of each instance shows atmost W words around the target word.
When --plain is ON, OUTPUT shows each context on a single line i.e. context of i'th instance in the given SVAL2 file is shown on the i'th line on stdout.
AUTHOR
Amruta Purandare, Ted Pedersen. University of Minnesota, Duluth.
COPYRIGHT
Copyright (c) 2002-2005,
Amruta Purandare, University of Pittsburgh. amruta@cs.pitt.edu
Ted Pedersen, University of Minnesota, Duluth. tpederse@umn.edu
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to
The Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.