NAME

README.Toolkit Description of SenseClusters Toolkit directory structure

Toolkit Organization

This briefly describes the structure of the Toolkit directory, and gives a brief idea of what each program does. Directories are indicated with a / at the end of their name (preprocess/) while programs end with the .pl suffix. All of this is contained in the Toolkits/ directory. Note that these are organized roughly in the order in which they will be used by SenseClusters.

Please review the flowcharts found in doc/Flowcharts for additional information.

preprocess/ (text preprocessing programs)

plain/ (processes input in plain text format)
- text2sval.pl - Convert simple plain text into Senseval2 format
sval2/ (processes input in Senseval-2 format)
- balance.pl - Balances sense distribution in a Senseval-2 input file by removing some instances
- filter.pl - Removes instances associated with low frequency sense tags from Senseval-2 input
- frequency.pl - Displays frequency distribution of senses
- keyconvert.pl - Convert KEY file from Senseval-2 format to SenseCluster's format
- maketarget.pl - Create a Perl regex for the target word by spotting all <head> tags in the given file
- prepare_sval2.pl - Prepare Senseval-2 data for experiments
- preprocess.pl - Tokenize and optionally split Senseval-2 input into training and test portions
- sval2plain.pl - Convert a Senseval-2 input file to plain text format
- windower.pl - Cut a window of context W words big around a target word in a given Senseval-2 input file

count/ (Modify count.pl output from Text-NSP)

reduce-count.pl - Reduce the size of the Text-NSP output created with huge training data

matrix/ - (Similarity matrix constructors)

bitsimat.pl - Create a similarity matrix for given bit vectors
simat.pl - Create a similarity matrix for given non-binary (integer or real) vectors

vector/ (Represent contexts as vectors to be clustered)

nsp2regex.pl - Creates regular expressions from Text-NSP output to represent features
order1vec.pl - Creates first order context vectors
order2vec.pl - Creates second order context vectors
wordvec.pl - Creates word vectors from Text-NSP output

svd/ (SVDPACKC interface)

mat2harbo.pl - Convert matrices from SenseClusters format to Harwell-Boeing format
svdpackout.pl - Reconstruct a matrix from its singular vectors as found by by SVDPACKC

clusterstopping/ (Cluster Stopping program)

clusterstopping.pl - Predicts the number of clusters that a given data should be divided into. Provides three such cluster stopping measures.

evaluate/ (Evaluate the results of SenseClusters by comparing to gold standard data)

cluto2label.pl - Convert clustering output of Cluto to a cluster by sense confusion matrix for evaluation
format_clusters.pl - Display contexts that were clustered with assigned sense id, or display senseval-2 format with assigned sense id
label.pl - Assign sense tags to the discovered clusters for evaluation
report.pl - Report performance in terms of the precision, recall, and F-Measure, and show a confusion matrix

clusterlabel/ (Cluster Labeling programs)

clusterlabeling.pl - Selects significant word-pairs from the contents/instances of the clusters and assigns them as the labels to the clusters. Also creates separate file for each cluster.

Acknowledgements

This work has been partially supported by a National Science Foundation Faculty Early CAREER Development award (#0092784).

COPYRIGHT

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

Note: a copy of the GNU Free Documentation License is available on the web at http://www.gnu.org/copyleft/fdl.html and is included in this distribution as FDL.txt.

To install Text::SenseClusters, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Text::SenseClusters

CPAN shell

perl -MCPAN -e shell
install Text::SenseClusters

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)