The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

prepare_sval2.pl

SYNOPSIS

Prepares Senseval-2 Data for SenseClusters experiments.

USAGE

prepare_sval2.pl [Options] SOURCE

Type 'prepare_sval2.pl --help' for quick summary of the Options.

INPUT

Required Arguments:

SOURCE

A Senseval-2 formatted Data file that is to be prepared for the SenseClusters experiments.

Optional Arguments:

--key KEY

Sense Tagging mechanism in prepare_sval2.pl -

prepare_sval2.pl makes sure that all SOURCE instances are tagged with some answer tags (or NOTAGs at least). If the sense tags are found in the same SOURCE file, these will be retained, however if the SOURCE instances are not tagged, instances will be either attached "NOTAG"s or will be attached the sense tags given in the separate KEY file.

A KEY file that has true answer keys of the SOURCE instances can be provided via --key option. If the SOURCE instances are not sense tagged, they will be tagged with the sense tags as given in the KEY file.

KEY file should be in SenseClusters format showing

                <instance id="I"/>  [<sense id="S"/>]+

on each line where an instance id is followed by its true sense ids on a single line.

prepare_sval2 takes into account following anamolies in SOURCE/KEY -

  1. If the 1st SOURCE instance is sense tagged, it assumes that SOURCE is sense tagged and will disable the KEY file option. If some of the SOURCE instances are not tagged, regardless of whether they have keys in KEY file or not, these are given "NOTAG"s.

  2. If the 1st SOURCE instance is not sense tagged, it assumes that SOURCE is untagged and will give an error if any SOURCE instance is found sense tagged in the SOURCE file.

  3. If the 1st SOURCE instance is not sense tagged and has an entry in the KEY file, it will enable the KEY file and will attach the instances with their answer keys as given in the KEY file. Any instance that doesn't have an answer key in the KEY file is attached "NOTAG".

  4. If the 1st SOURCE instance is not sense tagged and doesn't have an entry in the KEY file, KEY file will be disabled and no instance will be attached a tag from the KEY file. All instances are given "NOTAG"s.

--attachP

P tag handling mechanism in prepare_sval2.pl -

prepare_sval2.pl by default removes the sense tags that have value P. According to Senseval-2 standard, these are not true sense tags but indicate that the target word is a proper noun.

--attachP option will attach a P tag to an immediately following sense tag for the same instance.

e.g. If --attachP is selected,

 <instance id="art.40012" docsrc="bnc_A0E_130">
 <answer instance="art.40012" senseid="P"/>
 <answer instance="art.40012" senseid="arts%1:09:00::"/>

will be modified to

 <instance id="art.40012" docsrc="bnc_A0E_130">
 <answer instance="art.40012" senseid="P_arts%1:09:00::"/>

and if --attachP is not selected, by default P tag will be removed as

 <instance id="art.40012" docsrc="bnc_A0E_130">
 <answer instance="art.40012" senseid="arts%1:09:00::"/>
 

--modifysat

This switch if selected will remove the satellite tag ids from <head sats=" ID"/> and <sat id="ID"/> tags, retaining basic <head> and <sat> tag information.

e.g. by selecting --modifysat,

 ------------------------------------------------------------------------
 Perhaps he 'd have <head sats="call_for.018:0">called</head> <sat
 id="call_for.018:0">for</sat> a decentralized political and economic
 system
 ------------------------------------------------------------------------

will be transformed to

 ------------------------------------------------------------------------
 perhaps he 'd have <head> called </head> <sat> for </sat> a 
 decentralized political and economic system
 ------------------------------------------------------------------------

By not selecting --modifysat, the satellite ids would be retained.

--nolc

prepare_sval2 converts everything to lowercase by default. Select this switch to not do any case conversion.

--help

Displays this message.

--version

Displays the version information.

OUTPUT

Output will be a Senseval-2 file displayed to stdout.

AUTHOR

Amruta Purandare, Ted Pedersen. University of Minnesota, Duluth.

COPYRIGHT

Copyright (c) 2002-2005,

Amruta Purandare, University of Pittsburgh. amruta@cs.pitt.edu

Ted Pedersen, University of Minnesota, Duluth. tpederse@umn.edu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

The Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.