NAME
comparePartitions.pl - Script to compare set partitions.
SYNOPSIS
comparePartitions.pl [-f fileP fileQ -t ',' -h -c]
DESCRIPTION
The script comparePartitions.pl
computes the accuracy and precision of the set partitions stored in the files fileP
and fileQ
.
OPTIONS
-f fileP fileQ
The option -f
specifies the two files containing the partitions to be compared. Each line of a file is treated as a subset of the partition whose elements are stored as comma separated values. The module Text::CSV is used to parse each line. The files must in UTF-8 format.
The set of elements comprizing each partition must be equal to properly compare them. Set elements missing from either partition are added to the other partition as singleton subsets. For example, if fileP
and fileQ
contained the lines indicated below
fileP fileQ
----- -----
line 1 a,b,c a,b
line 2 d,e,f c,d
line 3 g,h
then the singleton sets {g}
and {h}
are added to partition P
making it equal {{a,b,c}, {d,e,f}, {g}, {h}}
and similarly, the sets {e}
and {f}
are added to Q
making it equal {{a,b}, {c,d}, {e}, {f}, {g,h}}
.
-t ','
Use option -t
to set the delimiter to use in the CSV files fileP
and fileQ
; the default delimiter is a comma.
-c
If option -c
is present, the subsets in each partition are checked to ensure they are disjoint. If they are not, an exception is thrown.
-h
Causes this documentation to be printed.
OUTPUT
If there are no errors, the output is the comma separated line accuracy,precision,fileP,fileQ
.
ERRORS
If option -c
is present, the subsets in each partition are checked to ensure they are disjoint. If they are not, an exception is thrown.
INSTALLATION
To install the module run the following commands:
perl Makefile.PL
make
make test
make install
If you are on a windows box you should use 'nmake' rather than 'make'.
BUGS
Please email bugs reports or feature requests to bug-set-partitions-similarities@rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Set-Partitions-Similarity. The author will be notified and you can be automatically notified of progress on the bug fix or feature request.
AUTHOR
Jeff Kubina<jeff.kubina@gmail.com>
COPYRIGHT
Copyright (c) 2009 Jeff Kubina. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.
KEYWORDS
accuracy, clustering, measure, metric, partitions, precision, set, similarity
SEE ALSO
Concise explainations of many cluster validity measures (including set partition measures) are available on the Cluster validity algorithms page of the Machaon Clustering and Validation Environment web site by Nadia Bolshakova.
The Wikipedia article Accuracy and precision has a good explaination of the accuracy and precision measures when applied to binary classifications.
The report Objective Criteria for the Evaluation of Clustering Methods (1971) by W.M. Rand in the Journal of the American Statistical Association provides an excellent analysis of the accuracy measure of partitions.