NAME

label.pl - Assign labels to clusters in a confusion matrix to maximize agreement

SYNOPSIS

label.pl [OPTIONS] PRELABEL

Type label.pl --help for a quick summary of options

DESCRIPTION

Labels the discovered clusters with sense tags such that maximum number of contexts are correctly assigned.

INPUT

Required Arguments:

PRELABEL

Should be the output of cluto2label.pl.

Sample CLUTO2LABEL format

2
//	cord  phone   text   div
C0:	 4       3       0       0
C1:	 2       2       2       2
C2:	 1       3       3       2

where the 1st line shows the number of unclustereted instances = 2 

2nd line shows a space separated list of sense classes starting with // mark.

Each line thereafter shows the sense distribution of the instances belonging to each discovered cluster in the form of a cluster by sense distribution matrix. A cell value at (i,j) in the matrix shows the number of instances belonging to cluster Ci that have the sense tag Sj.

Note that each row begins with the cluster id that precedes a colon (:). Also, the number of sense classes on 2nd line should be same as the number of columns in the cluster by sense distribution table.

Optional Arguments:

--help

Displays this message.

--version

Displays the version information.

OUTPUT

Output shows the sense labels attached to each of the discovered clusters along with the score. Score tells the percentage of the total number of instances correctly clustered if the clusters are tagged with the sense labels as suggested.

Example :

Prelabel file =>

0
//      cord    divi    form    phon    prod    text
C0:     35      26      44      18      23      43
C1:     64      34      50      43      57      52
C2:     0       3       1       2       0       3
C3:     0       0       2       31      0       0
C4:     1       28      0       4       6       0
C5:     0       9       3       2       14      2

Label Output =>

ClusterID -> SenseID
C0 -> form
C1 -> cord
C2 -> text
C3 -> phon
C4 -> divi
C5 -> prod
Score = 30.67

shows that

cluster C0 represents the 'form' sense
cluster C1 represents the 'cord' sense
cluster C2 represents the 'text' sense
cluster C3 represents the 'phon' sense
cluster C4 represents the 'divi' sense
and cluster C5 represents the 'prod' sense

Also, 30.67 % of the total instances are in their right sense classes if the clusters are tagged with this labeling scheme.

AUTHORS

Ted Pedersen, University of Minnesota, Duluth
tpederse at d.umn.edu

Amruta Purandare, University of Pittsburgh

Anagha Kukarni, Carnegie-Mellon University

COPYRIGHT

Copyright (c) 2002-2008, Ted Pedersen, Amruta Purandare, Anagha Kulkarni

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA  02111-1307, USA.