NAME

text2sval.pl Convert plain text data into Senseval-2 format

SYNOPSIS

Converts a plain text instance data file into a Senseval-2 formatted XML file.

USGAE

text2sval.pl [OPTIONS] TEXT

INPUT

Required Arguments:

TEXT

Should be a plain text data file containing context of a single instance on each line. In other words, contexts of different instances should be separated by a newline character and there should not be any newline characters within the context of a single instance.

e.g.

--------------------------------------------------------------------

market capitalization draws a <head>line</head> between big and small stocks

volunteers using a dozen telephone <head>lines</head> at the group's washington headquarters this week will be urging members in alabama arizona

he proceeded briskly through a reception <head>line</head> of party officials and old friends

--------------------------------------------------------------------

shows 3 instances with context of each instance on each line.

Optional Arguments:

--lexelt LEX

Specifies the value of the <lexelt> item tag to be used in the output Senseval-2 file.

--key KEYFILE

Displays the instance ids and optional sense tags of the instances in the TEXT file. These will be used as the values of the <instance> and <sense> tags in the output Senseval-2 file.

Each line in KEYFILE should show the instance id and optional sense tags of the instance displayed on the corresponding line in the TEXT file, in the format -

<instance id=\"IID\"\/> [<sense id=\"SID\"\/>]*

where an <instance> tag is followed by zero or more <sense> tags.

Other Options :

--help

Displays this message.

--version

Displays the version information.

OUTPUT

Given TEXT input file is converted to a Senseval-2 formatted XML file that is displayed on stdout.

Sample Outputs

1 No options specified

Input TEXT file => input.text

------------------------------------------------------------------

the maiden seated herself upon the golden chair and offered the silver one to her companion they were <head>served</head> by maidens dressed in white whose feet made no sound as they moved about and not a word was spoken during the meal

why leftover beef should ever be a problem i cannot understand there is nothing better than cold roast sliced paper thin and <head>served</head> with mustard chutney or pickled walnuts these can be found in almost any food specialty shop meat to be served cold should be removed from the refrigerator an hour or so before eating to allow it to return to room temperature

continue cooking for hours remove the ribs to a hot platter and <head>serve</head> the pan juices separately

an agency spokesman al heier said it granted the exceptions because these crops are grown by few farmers in small areas that can be closely monitored dinoseb is a herbicide that also <head>serves</head> as a fungicide and an insecticide 

------------------------------------------------------------------

Command => text2sval.pl input.text

STDOUT will display =>

---------------------------------------------------------------
<corpus lang="english">
<lexelt item="LEXELT">
<instance id="0">
<answer instance="0" senseid="NOTAG"/>
<context>
the maiden seated herself upon the golden chair and offered the silver one to her companion they were <head>served</head> by maidens dressed in white whose feet made no sound as they moved about and not a word was spoken during the meal
</context>
</instance>
<instance id="1">
<answer instance="1" senseid="NOTAG"/>
<context>
why leftover beef should ever be a problem i cannot understand there is nothing better than cold roast sliced paper thin and <head>served</head> with mustard chutney or pickled walnuts these can be found in almost any food specialty shop meat to be served cold should be removed from the refrigerator an hour or so before eating to allow it to return to room temperature
</context>
</instance>
<instance id="2">
<answer instance="2" senseid="NOTAG"/>
<context>
continue cooking for hours remove the ribs to a hot platter and <head>serve</head> the pan juices separately
</context>
</instance>
<instance id="3">
<answer instance="3" senseid="NOTAG"/>
<context>
an agency spokesman al heier said it granted the exceptions because these crops are grown by few farmers in small areas that can be closely monitored dinoseb is a herbicide that also <head>serves</head> as a fungicide and an insecticide
</context>
</instance>
</lexelt>
</corpus>
---------------------------------------------------------------

Notice that -

1. Since the instance ids are not provided (via --key KEYFILE), text2sval uses ordinal numbers 0,1,2 ... etc as the instance ids for the instances in the same order i.e. Instance id assigned to instance at position i in the TEXT file is (i-1)

2. Since the sense tags are not provided, all instances are assigned tag 'NOTAG'

3. Since --lexelt is not provided, value of <lexelt> tag shows LEXELT i.e. as <lexelt item=\"LEXELT\">

2 --key KEY is provided and KEY shows only the instance ids

In this case, text2sval uses instance ids from KEY file as the values of instance ids in <instance> and <answer> tags while sense ids will have values NOTAG

For TEXT file in example (1),

if the KEY file is => serve.key

<instance id="serve-v.aphb_34700303_2142"/> 
<instance id="serve-v.aphb_51903174_3841"/> 
<instance id="serve-v.aphb_51903399_3856"/> 
<instance id="serve-v.w7_022806_525"/> 

Command => text2sval.pl --key serve.key --lexelt serve-v input.text

will display on stdout =>

--------------------------------------------------------------------
<corpus lang="english">
<lexelt item="LEXELT">
<instance id="serve-v.aphb_34700303_2142">
<answer instance="serve-v.aphb_34700303_2142" senseid="NOTAG"/>
<context>
the maiden seated herself upon the golden chair and offered the silver one to her companion they were <head>served</head> by maidens dressed in white whose feet made no sound as they moved about and not a word was spoken during the meal
</context>
</instance>
<instance id="serve-v.aphb_51903174_3841">
<answer instance="serve-v.aphb_51903174_3841" senseid="NOTAG"/>
<context>
why leftover beef should ever be a problem i cannot understand there is nothing better than cold roast sliced paper thin and <head>served</head> with mustard chutney or pickled walnuts these can be found in almost any food specialty shop meat to be served cold should be removed from the refrigerator an hour or so before eating to allow it to return to room temperature
</context>
</instance>
<instance id="serve-v.aphb_51903399_3856">
<answer instance="serve-v.aphb_51903399_3856" senseid="NOTAG"/>
<context>
continue cooking for hours remove the ribs to a hot platter and <head>serve</head> the pan juices separately
</context>
</instance>
<instance id="serve-v.w7_022806_525">
<answer instance="serve-v.w7_022806_525" senseid="NOTAG"/>
<context>
an agency spokesman al heier said it granted the exceptions because these crops are grown by few farmers in small areas that can be closely monitored dinoseb is a herbicide that also <head>serves</head> as a fungicide and an insecticide
</context>
</instance>
</lexelt>
</corpus>

--------------------------------------------------------------------

Note that the instance ids are taken from the KEY file while sense ids have NOTAGs.

3 KEY file contains both the instance and sense tags

For TEXT file in example (1),

if the KEY file is => serve.key

<instance id="serve-v.aphb_34700303_2142"/> <sense id="SERVE10"/>
<instance id="serve-v.aphb_51903174_3841"/> <sense id="SERVE10"/>
<instance id="serve-v.aphb_51903399_3856"/> <sense id="SERVE10"/>
<instance id="serve-v.w7_022806_525"/> <sense id="SERVE2"/>

Command => text2sval.pl --key serve.key --lexelt serve-v input.text

will display on STDOUT =>

--------------------------------------------------------------------
<corpus lang="english">
<lexelt item="LEXELT">
<instance id="serve-v.aphb_34700303_2142">
<answer instance="serve-v.aphb_34700303_2142" senseid="SERVE10"/>
<context>
the maiden seated herself upon the golden chair and offered the silver one to her companion they were <head>served</head> by maidens dressed in white whose feet made no sound as they moved about and not a word was spoken during the meal
</context>
</instance>
<instance id="serve-v.aphb_51903174_3841">
<answer instance="serve-v.aphb_51903174_3841" senseid="SERVE10"/>
<context>
why leftover beef should ever be a problem i cannot understand there is nothing better than cold roast sliced paper thin and <head>served</head> with mustard chutney or pickled walnuts these can be found in almost any food specialty shop meat to be served cold should be removed from the refrigerator an hour or so before eating to allow it to return to room temperature
</context>
</instance>
<instance id="serve-v.aphb_51903399_3856">
<answer instance="serve-v.aphb_51903399_3856" senseid="SERVE10"/>
<context>
continue cooking for hours remove the ribs to a hot platter and <head>serve</head> the pan juices separately
</context>
</instance>
<instance id="serve-v.w7_022806_525">
<answer instance="serve-v.w7_022806_525" senseid="SERVE2"/>
<context>
an agency spokesman al heier said it granted the exceptions because these crops are grown by few farmers in small areas that can be closely monitored dinoseb is a herbicide that also <head>serves</head> as a fungicide and an insecticide
</context>
</instance>
</lexelt>
</corpus>

--------------------------------------------------------------------

where instance ids and sense tags are both extracted from the KEY file.

SYSTEM REQUIREMENTS

No special requirements ...

AUTHOR

Amruta Purandare, Ted Pedersen.
University of Minnesota at Duluth.

COPYRIGHT

Copyright (c) 2002-2005,

Amruta Purandare, University of Pittsburgh. amruta@cs.pitt.edu

Ted Pedersen, University of Minnesota, Duluth. tpederse@umn.edu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

The Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.