NAME
report.pl - Summarize SenseClusters results with precision, recall, and confusion matrix
SYNOPSIS
report.pl [OPTIONS] LABEL PRELABEL
Type report.pl --help
for a quick summary of options
DESCRIPTION
Reports the performance of discrimination in terms of the precision, recall and confusion table.
INPUT
Required Arguments:
LABEL
An output created by label.pl showing sense labels attached to the discovered clusters.
Sample LABEL files =>
- 1. report.pl will minimally expect LABEL in this format -
-
C0 -> fine%5:00:00:elegant:00 C1 -> fine%3:00:00:: C2 -> fine%5:00:00:superior:02 C3 -> fine%5:00:00:satisfactory:00 C4 -> fine%5:00:00:thin:01
report will only read those lines from LABEL file that contain right arrow (->), all other lines will be ignored.
Lines containing '->' should show the cluster id on the left of the arrow and a sense tag on the right.
- 2.
-
ClusterID -> SenseID 0 -> fine%5:00:00:elegant:00 1 -> fine%3:00:00:: 2 -> fine%5:00:00:superior:02 3 -> fine%5:00:00:satisfactory:00 4 -> fine%5:00:00:thin:01 Score = 60.00
Shows the actual output of label which contains a descriptive header line on the 1st line and the score of the mapping scheme on the last line.
PRELABEL
Should be an output created by cluto2label.pl program showing the distribution of instances from each sense class in each of the clusters.
This distribution should be shown in a cluster by sense matrix where the rows represent the clusters and the columns represent the senses. Cell entry at CS[i][j] shows the number of instances belonging to cluster Ci that have the true true sense tag Sj.
e.g.
0
//phone cord txt div form
0 2 1 0 0
0 1 2 4 0
5 2 2 25 4
0 1 9 0 0
0 1 0 1 0
Note that -
- 1. 1st line shows the number of instances unclustered.
- 2. 2nd line starts with // and shows the sense labels of corresponding columns.
- 3. 3rd line and onwards show the cluster by sense distribution matrix.
Optional Arguments:
Other Options :
--help
Displays this message.
--version
Displays the version information.
OUTPUT
Output will display a confusion table whose rows represent the discovered clusters and columns represent the actual sense classes such that cell value at (i,j) indicates the number of instances belonging to cluster Ci that have true sense id Sk where Sk is the column label on the top of the jth column. Columns are reordered such that the sense representing the rth column most accurately represents the rth cluster Cr and diagonal value at (r,r) shows the number of instances in the rth cluster that belong to their correct sense class.
When #clusters > #senses, clusters that aren't assigned a sense tag will have star (*) on them. When #senses > #clusters, senses that aren't assigned to any cluster will be hash (#) marked.
The sum of the diagonal entries shows the total number of instances that are correctly discriminated(#hits). From this number, report computes precision and recall where
precision = #hits / #clustered
#clustered = Number of instances clustered = total #instances - #instances that belong to the unlabelled clusters - #thrown shown in PRELABEL input file.
recall = #hits / #total instances
Sample Output :
S1 S2 S0 S3 TOTAL
C0: 221 11 3 15 250 (5.71)
C1: 295 395 448 144 1282 (29.28)
C6: 430 233 441 68 1172 (26.77)
C9: 145 44 149 105 443 (10.12)
C2:* 0 1 135 2 138 (3.15)
C3:* 138 4 4 2 148 (3.38)
C4:* 0 0 182 0 182 (4.16)
C5:* 2 6 150 6 164 (3.75)
C7:* 41 159 99 97 396 (9.05)
C8:* 0 0 203 0 203 (4.64)
1272 853 1814 439 4378
(29.05) (19.48) (41.43) (10.03)
Precision = 36.92(1162/3147)
Recall = 26.54(1162/4378+0)
Legend of Sense Tags
S0 = SERVE10
S1 = SERVE12
S2 = SERVE2
S3 = SERVE6
shows
- 1. 9 clusters(C0-C8) and 4 senses (S0-S3).
- 2. Cluster C0 represents sense S1 which stands for actual sense SERVE12
-
C1 represents S2 (stands for SERVE2), C6 represents S0 (stands for SERVE10) C9 represents S3 (stands for SERVE6)
- 3. The above maximal mapping gives precision of 36.92% and recall of 26.54% where total 1162 instances are correctly discriminated among the total 4378 instances.
- 4. The last two columns show the total number and percentage of instances in each cluster(row marginal totals) while the last two rows indicate the total number and percentage of instances in each sense class(column marginal totals).
AUTHORS
Ted Pedersen, University of Minnesota, Duluth
Amruta Purandare, University of Pittsburgh
Anagha Kulkarni, Carnegie-Mellon University
COPYRIGHT
Copyright (c) 2002-2008, Ted Pedersen, Amruta Purandare, Anagha Kulkarni
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to
The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 45:
Expected text after =item, not a number