NAME

comparePartitions.pl - Script to compare set partitions.

SYNOPSIS

comparePartitions.pl [-f fileP fileQ -t ',' -h -c]

DESCRIPTION

The script comparePartitions.pl computes the accuracy and precision of the set partitions stored in the files fileP and fileQ.

OPTIONS

-f fileP fileQ

The option -f specifies the two files containing the partitions to be compared. Each line of a file is treated as a subset of the partition whose elements are stored as comma separated values. The module Text::CSV is used to parse each line. The files must in UTF-8 format.

The set of elements comprizing each partition must be equal to properly compare them. Set elements missing from either partition are added to the other partition as singleton subsets. For example, if fileP and fileQ contained the lines indicated below

           fileP      fileQ
           -----      -----
line 1     a,b,c      a,b
line 2     d,e,f      c,d
line 3                g,h

then the singleton sets {g} and {h} are added to partition P making it equal {{a,b,c}, {d,e,f}, {g}, {h}} and similarly, the sets {e} and {f} are added to Q making it equal {{a,b}, {c,d}, {e}, {f}, {g,h}}.

-t ','

Use option -t to set the delimiter to use in the CSV files fileP and fileQ; the default delimiter is a comma.

-c

If option -c is present, the subsets in each partition are checked to ensure they are disjoint. If they are not, an exception is thrown.

-h

Causes this documentation to be printed.

OUTPUT

If there are no errors, the output is the comma separated line accuracy,precision,fileP,fileQ.

ERRORS

If option -c is present, the subsets in each partition are checked to ensure they are disjoint. If they are not, an exception is thrown.

INSTALLATION

To install the module run the following commands:

perl Makefile.PL
make
make test
make install

If you are on a windows box you should use 'nmake' rather than 'make'.

BUGS

Please email bugs reports or feature requests to bug-set-partitions-similarities@rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Set-Partitions-Similarity. The author will be notified and you can be automatically notified of progress on the bug fix or feature request.

AUTHOR

Jeff Kubina<jeff.kubina@gmail.com>

COPYRIGHT

Copyright (c) 2009 Jeff Kubina. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

KEYWORDS

accuracy, clustering, measure, metric, partitions, precision, set, similarity

SEE ALSO

Concise explainations of many cluster validity measures (including set partition measures) are available on the Cluster validity algorithms page of the Machaon Clustering and Validation Environment web site by Nadia Bolshakova.

The Wikipedia article Accuracy and precision has a good explaination of the accuracy and precision measures when applied to binary classifications.

The report Objective Criteria for the Evaluation of Clustering Methods (1971) by W.M. Rand in the Journal of the American Statistical Association provides an excellent analysis of the accuracy measure of partitions.

Math::Set::Partitions::Similarity, Text::CSV