NAME

windower.pl

SYNOPSIS

Limits the contexts of given instances to W tokens around the target word.

USAGE

windower.pl [OPTIONS] SVAL2 W

INPUT

Required Arguments:

SVAL2

SVAL2 must be a tokenized and preprocessed instance file in the Senseval-2 format.

W

Should be a positive integer number specifying the window size. windower will display only the tokens that appear in the window of [-W, +W] centered around the target word.

Optional Arguments:

--plain

Output will be displayed in plain text format showing context of each instance on a single separate line. i.e. each i'th line on stdout will show the context of the i'th instance in the given SVAL2 file. By default, output is created in Senseval-2 format.

--token TOKENREGEX

TOKENREGEX should be a file containing Perl regular expressions that define the tokenization scheme in SVAL2. windower recognizes only those character sequences from SVAL2 that match the specified token regex/s, everything else will be ignored. If --token is not specified, windower searches the default token.regex file in the current directory.

--target TARGETREGEX

Specify a file containing Perl regular expressions that define the target word/s. Target words must be valid tokens recognizable by the specified tokenization scheme (via --token or token.regex)

Following are some of the examples of TARGET word regex files -

  1. /<head>[Ll]ines?<\/head>/

    which specifies that the target word could be

    line, Line, lines or Lines 

    delimited in <head> and </head> tags.

  2. Above regex can also be specified as multiple regexes in TARGET as -

    /<head>line<\/head>/

    /<head>lines<\/head>/

    /<head>Line<\/head>/

    /<head>Lines<\/head>/

    with a single regex per line

  3. Regex

    /<head>\w+<\/head>/

    shows a more general regex for target words marked in <head> tags

  4. Regex

    /<head.*>\w+<\/head>/

    Shows the regex for matching target words in the original Senseval-2 data.

  5. /[Ll]ines?/

    shows that any occurrence of words - Line, line, Lines, lines are target words (that are not delimited in any special tags).

Other Options :

--help

Displays this message.

--version

Displays the version information.

OUTPUT

When --plain is not selected, OUTPUT is in Senseval-2 format that looks same as the input SVAL2 file except the context of each instance shows atmost W words around the target word.

When --plain is ON, OUTPUT shows each context on a single line i.e. context of i'th instance in the given SVAL2 file is shown on the i'th line on stdout.

AUTHOR

Amruta Purandare, Ted Pedersen. University of Minnesota, Duluth.

COPYRIGHT

Copyright (c) 2002-2005,

Amruta Purandare, University of Pittsburgh. amruta@cs.pitt.edu

Ted Pedersen, University of Minnesota, Duluth. tpederse@umn.edu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

The Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.