The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

frequency.pl Display distribution of senses in Senseval-2 file

SYNOPSIS

Displays distribution of senses in a given Senseval-2 file.

USGAE

Usage: frequency.pl [OPTIONS] SOURCE

Type 'frequency.pl --help' for quick summary of the Options.

INPUT

Required Arguments:

SOURCE

SOURCE should be a Senseval-2 formatted file. The sense ids are searched by matching a regex /sense\s*id="S"/.

An instance having multiple sense ids should appear only once with multiple <answer> tags. e.g. If an instance IID has 2 sense ids SID1 and SID2, then in the SOURCE file, instance IID should be formatted as -

 <instance id="IID"> 
 <answer instance="IID" senseid="SID1"/>
 <answer instance="IID" senseid="SID2"/>
 <context>
        Context Data comes here ....
 </context>
 </instance>

Optional Arguments:

--help

Displays this message.

--version

Displays the version information.

OUTPUT

Output displays

1. Total number of instances in SOURCE

These are counted by matching regex /instance id=\"ID\"/ for unique instance ids.

2. Total number of distinct sense tags found in SOURCE

These are searched by matching a regex /sense\s*id="S"/.

3. Sense Distribution

Output shows

<sense id="S" percent="P"/>

for each sense id found in SOURCE. P is the percentage frequency of the sense S.

4. % of Majority sense

This will be the highest sense percentage found in SOURCE.

Sample Output

 <sense id="begin%2:30:00::" percent="59.49"/>
 <sense id="begin%2:30:01::" percent="13.38"/>
 <sense id="begin%2:42:00::" percent="4.70"/>
 <sense id="begin%2:42:03::" percent="3.44"/>
 <sense id="begin%2:42:04::" percent="18.99"/>
 Total Instances = 548
 Total Distinct Senses=5
 Distribution={59.49,18.99,13.38,4.70,3.44}
 % of Majority Sense = 59.49

Shows that there are total 548 instances and 5 senses.

The senses are distributed with frequencies

{59.49,18.99,13.38,4.70,3.44}

where majority sense has frequency = 59.49

The <sense> tags show the frequency of each individual tag.

AUTHOR

Amruta Purandare, Ted Pedersen. University of Minnesota, Duluth.

COPYRIGHT

Copyright (c) 2002-2005, Amruta Purandare, University of Pittsburgh. amruta@cs.pitt.edu Ted Pedersen, University of Minnesota, Duluth. tpederse@umn.edu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

The Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.