Name
Text::SenseClusters::LabelEvaluation::ConfusionMatrixTotalCalc - Module responsible for processing of decision matrix.
DESCRIPTION
This module provide two functions. First function will calculate the probability decision matrix from the scores of the original decision matrix. The second function will then use the new decision matrix to decide whether labels are appropriately assigned or not.
function: printCalculatedScoreMatrix
The following function is responsible for printing the calculated score
matrix from the decision matrix.
@argument1 : outputFileHandle: DataType(File Handler)
This the file handler used for defining where to print
the output message/statements of this module.
Its default value is: STDERR.
@argument2 : clusterNameArrayRef: DataType(Reference_Of_Array)
Reference to Array containing Cluster Name.
@argument3 : standardTermsArrayRef: DataType(Reference_Of_Array)
Reference to Array containing Standard terms.
@argument4 : hashForClusterTopicScoreRef: DataType(Reference_Of_Hash)
Reference to hash containing Cluster Name, corresponding
StandardTopic and its score.
@argument5 : topicTotalSumHashRef: DataType(Reference_Of_Hash)
Hash which will contains the total score for a topic
against each clusters.
@argument6 : clusterTotalSumHashRef: DataType(Reference_Of_Hash)
Hash which will contains the total score for a cluster
against each topics.
@argument7 : $isDecisionMatrixDebugOn: DataType(number 0 or 1)
Verbose:: This decide whether to detail output or not.
@return : SimilarityScore
This indicate the similarity score of labels and actual
topics which are correctly identified by SenseClusters
or similar application.
@description :
This module is responsible of decision matrix which is identified as:
Calculated Decision MATRIX:
=========================================================
| Cluster0 | Cluster1 |
---------------------------------------------------------
Bill Clinton: | 0.478 | 0.522 |
---------------------------------------------------------
---------------------------------------------------------
Tony Blair: | 0.625 | 0.375 |
---------------------------------------------------------
=========================================================
Where, 1) Cluster0, Cluster1 are Cluster Names, (Column Header).
2) Bill Clinton, Tony Blair are Standard Topics, (Row Header).
3) Cell content is the probability measure which indicates
likelihood of a cluster's label against a Topic.
Steps:
1. First, it will iterate through hash, '%hashForClusterTopicScore'.
2. It will divide the cluster-topic overlapping score with the total
count value of the decision matrix.
3. This will give the normalized score.
4. Based on user input on Verbose, it will display the normalized
decision matrix.
5. It will then call the function 'concludingFromDecisionMatrix'
which will used the normalized decision matrix to conclude
a) which cluster's labels is matching with which Gold-Standard
-topic's data.
a) which Gold-Standard-topic's data label is matching with
which cluster's labels.
6. Finally, it will compare the Clusterwise results with Topicwise
results to conclude final cluster-topic match results along with
their matching score.
function: concludingFromDecisionMatrix
The following matrix is responsible for printing the calculated score
matrix from the decision matrix.
@argument1 : hashForClusterTopicScoreRef: DataType(Reference_Of_Hash)
Reference to hash containing Cluster Name, corresponding
StandardTopic and its score.
@argument2 : topicTotalSumHashRef: DataType(Reference_Of_Hash)
Hash which will contains the total score for a topic
against each clusters.
@argument3 : clusterTotalSumHashRef: DataType(Reference_Of_Hash)
Hash which will contains the total score for a cluster
against each topics.
@argument4 : directClusterTopicHashRef: DataType(Reference_Of_Hash)
HashOfHash to store conclusion of Direct calculation,
row-wise i.e a topic (OuterKey) score against each
cluster(InnerKey).
@argument5 : directTopicClusterHashRef: DataType(Reference_Of_Hash)
HashOfHash to store conclusion of Direct calculation,
columnwise i.e a Cluster (OuterKey) scores against
each topics(InnerKey).
@return1 : directClusterTopicHashRef: DataType(Reference_Of_Hash)
HashOfHash which store conclusion of calculation,
row-wise i.e a topic (OuterKey) score against each
cluster(InnerKey).
@return2 : directTopicClusterHashRef: DataType(Reference_Of_Hash)
HashOfHash to store conclusion of calculation,
columnwise i.e a Cluster (OuterKey) scores against
each topics(InnerKey).
@description :
The following block of code is responsible for
1. Calculating the probabilities (normalized value) of all the
topic against a cluster.
2. Chosing a topic which has the maximum probability (normali
-zed value) value for the given cluster.
3. In current approach, for calculating the probability (norm
-alized value) we will divide the similarity score of a
topic against a cluster with total similarity score of all
the topics against all the cluster.
Future enhancement::
4. The above approach can be done in two way i.e. using the
direct way as well as inverse way.
5. In direct approach, for calculating the probability we
will divide the similarity score of a topic against a
cluster with total similarity score of all the topics
against that cluster.
6. In inverse approach, for calculating the probability we
will divide the similarity score of a topic against a
cluster with total similarity score of all the clusters
against that topic.
SEE ALSO
http://senseclusters.cvs.sourceforge.net/viewvc/senseclusters/LabelEvaluation/
@Last modified by : Anand Jha @Last_Modified_Date : 24th Dec. 2012 @Modified Version : 1.6
AUTHORS
Ted Pedersen, University of Minnesota, Duluth
tpederse at d.umn.edu
Anand Jha, University of Minnesota, Duluth
jhaxx030 at d.umn.edu
COPYRIGHT AND LICENSE
Copyright (C) 2012 Ted Pedersen, Anand Jha
See http://dev.perl.org/licenses/ for more information.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to:
The Free Software Foundation, Inc., 59 Temple Place, Suite 330,
Boston, MA 02111-1307 USA