Name

Text::SenseClusters::LabelEvaluation::ConfusionMatrixTotalCalc - Module responsible for processing of decision matrix.

DESCRIPTION

This module provide two functions. First function will calculate the probability decision matrix from the scores of the original decision matrix. The second function will then use the new decision matrix to decide whether labels are appropriately assigned or not.

function: printCalculatedScoreMatrix

The following function is responsible for printing the calculated score 
matrix from the decision matrix.

@argument1	:  outputFileHandle:  	DataType(File Handler)
				This the file handler used for defining where to print
				the output message/statements of this module.
				Its default value is: STDERR.
				 
@argument2	: clusterNameArrayRef:  	DataType(Reference_Of_Array)
				Reference to Array containing Cluster Name.
				
@argument3	: standardTermsArrayRef:  	DataType(Reference_Of_Array)  
				Reference to Array containing Standard terms.
				 
@argument4	: hashForClusterTopicScoreRef:  DataType(Reference_Of_Hash)
				Reference to hash containing Cluster Name, corresponding 
				StandardTopic and its score.
				
@argument5	: topicTotalSumHashRef:  DataType(Reference_Of_Hash)
				Hash which will contains the total score for a topic 
				against each clusters.
				
@argument6	: clusterTotalSumHashRef:  DataType(Reference_Of_Hash)
				Hash which will contains the total score for a cluster 
				against each topics.

@argument7	: $isDecisionMatrixDebugOn:  DataType(number 0 or 1)
			  Verbose:: This decide whether to detail output or not.   	


@return		: SimilarityScore
			  This indicate the similarity score of labels and actual
			  topics which are correctly identified by SenseClusters 
			  or similar application.		

@description	:

This module is responsible of decision matrix which is identified as:				

Calculated Decision MATRIX:

	=========================================================
						|	Cluster0		|		Cluster1		|
	---------------------------------------------------------
		Bill Clinton:	|		0.478		|		0.522			|
	---------------------------------------------------------
	---------------------------------------------------------
		Tony Blair:	|		0.625		|		0.375			|
	---------------------------------------------------------
	=========================================================


 Where, 1) Cluster0, Cluster1 are  Cluster Names, (Column Header).
		 2) Bill Clinton, Tony Blair are  Standard Topics, (Row Header).
		 3) Cell content is the probability measure which indicates 
		    likelihood of a cluster's label against a Topic.
		    

 Steps:
 		1. First, it will iterate through hash, '%hashForClusterTopicScore'.
 		2. It will divide the cluster-topic overlapping score with the total 
 		   count value of the decision matrix. 
 		3. This will give the normalized score.
 		4. Based on user input on Verbose, it will display the normalized 
 		   decision matrix.
 		5. It will then call the function 'concludingFromDecisionMatrix' 
 		   which will used the normalized decision matrix to conclude 
 		   		a) which cluster's labels is matching with which Gold-Standard
 		   		   -topic's data.
 		   		a) which Gold-Standard-topic's data label is matching with 
 		   		   which cluster's labels.
 		6. Finally, it will compare the Clusterwise results with Topicwise 
 		   results to conclude final cluster-topic match results along with
 		   their matching score.  		    

function: concludingFromDecisionMatrix

The following matrix is responsible for printing the calculated score 
matrix from the decision matrix.

@argument1	: hashForClusterTopicScoreRef:  DataType(Reference_Of_Hash)
				Reference to hash containing Cluster Name, corresponding 
				StandardTopic and its score.
@argument2	: topicTotalSumHashRef:  DataType(Reference_Of_Hash)
				Hash which will contains the total score for a topic 
				against each clusters.
@argument3	: clusterTotalSumHashRef:  DataType(Reference_Of_Hash)
				Hash which will contains the total score for a cluster 
				against each topics.
@argument4	: directClusterTopicHashRef:  DataType(Reference_Of_Hash)
				HashOfHash to store conclusion of Direct calculation, 
				row-wise i.e a topic (OuterKey) score against each 
				cluster(InnerKey).
@argument5	: directTopicClusterHashRef:  DataType(Reference_Of_Hash)
				HashOfHash to store conclusion of Direct calculation, 
				columnwise i.e a Cluster (OuterKey) scores against 
				each topics(InnerKey).


 @return1	: directClusterTopicHashRef:  DataType(Reference_Of_Hash)
				HashOfHash which store conclusion of calculation, 
				row-wise i.e a topic (OuterKey) score against each 
				cluster(InnerKey).
 @return2	: directTopicClusterHashRef:  DataType(Reference_Of_Hash)
				HashOfHash to store conclusion of calculation, 
				columnwise i.e a Cluster (OuterKey) scores against 
				each topics(InnerKey).

@description :

			 	The following block of code is responsible for 
			 	1. Calculating the probabilities (normalized value) of all the   
					topic against a cluster. 
				2. Chosing a topic which has the maximum probability (normali
					-zed value) value for the given cluster.
				3. In current approach, for calculating the probability (norm
 	 				-alized value) we will divide the similarity score of a  
	 				topic against a cluster with total similarity score of all 
					the topics against all the cluster.

 
				 Future enhancement::
				 4. The above approach can be done in two way i.e. using the  
				 	direct way as well as inverse way.
				 5. In direct approach, for calculating the probability we 
    				 will divide	the similarity score of a topic against a 
    				 cluster with total similarity score of all the topics 
    				 against that cluster.
    			 6. In inverse approach, for calculating the probability we 
    			 	 will divide the similarity score of a topic against a 
    			 	 cluster with total similarity score of all the clusters 
    			 	 against that topic.

SEE ALSO

http://senseclusters.cvs.sourceforge.net/viewvc/senseclusters/LabelEvaluation/

@Last modified by : Anand Jha @Last_Modified_Date : 24th Dec. 2012 @Modified Version : 1.6

AUTHORS

Ted Pedersen, University of Minnesota, Duluth
tpederse at d.umn.edu

Anand Jha, University of Minnesota, Duluth
jhaxx030 at d.umn.edu

COPYRIGHT AND LICENSE

Copyright (C) 2012 Ted Pedersen, Anand Jha

See http://dev.perl.org/licenses/ for more information.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to:

The Free Software Foundation, Inc., 59 Temple Place, Suite 330, 
Boston, MA  02111-1307  USA