NAME
label.pl - Assign labels to clusters in a confusion matrix to maximize agreement
SYNOPSIS
label.pl [OPTIONS] PRELABEL
Type label.pl --help
for a quick summary of options
DESCRIPTION
Labels the discovered clusters with sense tags such that maximum number of contexts are correctly assigned.
INPUT
Required Arguments:
PRELABEL
Should be the output of cluto2label.pl.
Sample CLUTO2LABEL format
2
// cord phone text div
C0: 4 3 0 0
C1: 2 2 2 2
C2: 1 3 3 2
where the 1st line shows the number of unclustereted instances = 2
2nd line shows a space separated list of sense classes starting with // mark.
Each line thereafter shows the sense distribution of the instances belonging to each discovered cluster in the form of a cluster by sense distribution matrix. A cell value at (i,j) in the matrix shows the number of instances belonging to cluster Ci that have the sense tag Sj.
Note that each row begins with the cluster id that precedes a colon (:). Also, the number of sense classes on 2nd line should be same as the number of columns in the cluster by sense distribution table.
Optional Arguments:
--help
Displays this message.
--version
Displays the version information.
OUTPUT
Output shows the sense labels attached to each of the discovered clusters along with the score. Score tells the percentage of the total number of instances correctly clustered if the clusters are tagged with the sense labels as suggested.
Example :
Prelabel file =>
0
// cord divi form phon prod text
C0: 35 26 44 18 23 43
C1: 64 34 50 43 57 52
C2: 0 3 1 2 0 3
C3: 0 0 2 31 0 0
C4: 1 28 0 4 6 0
C5: 0 9 3 2 14 2
Label Output =>
ClusterID -> SenseID
C0 -> form
C1 -> cord
C2 -> text
C3 -> phon
C4 -> divi
C5 -> prod
Score = 30.67
shows that
cluster C0 represents the 'form' sense
cluster C1 represents the 'cord' sense
cluster C2 represents the 'text' sense
cluster C3 represents the 'phon' sense
cluster C4 represents the 'divi' sense
and cluster C5 represents the 'prod' sense
Also, 30.67 % of the total instances are in their right sense classes if the clusters are tagged with this labeling scheme.
AUTHORS
Ted Pedersen, University of Minnesota, Duluth
tpederse at d.umn.edu
Amruta Purandare, University of Pittsburgh
Anagha Kukarni, Carnegie-Mellon University
COPYRIGHT
Copyright (c) 2002-2008, Ted Pedersen, Amruta Purandare, Anagha Kulkarni
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to
The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.