NAME

README.Toolkit - SenseClusters Toolkit directory structure with links to all program documentation

DIRECTORY STRUCTURE

This briefly describes the structure of the Toolkit directory, and gives a brief idea of what each program does. Directories are indicated with a / at the end of their name (preprocess/) while programs end with the .pl suffix. All of this is contained in the Toolkits/ directory. Note that these are organized roughly in the order in which they will be used by SenseClusters.

Please review the flowcharts found in doc/Flowcharts for additional information.

preprocess/ (text preprocessing programs)

plain/ (processes input in plain text format)
- text2sval.pl - Convert simple plain text into Senseval2 format
sval2/ (processes input in Senseval-2 format)
- balance.pl - Balances sense distribution in a Senseval-2 input file by removing some instances
- filter.pl - Removes instances associated with low frequency sense tags from Senseval-2 input
- frequency.pl - Displays frequency distribution of senses
- keyconvert.pl - Convert KEY file from Senseval-2 format to SenseCluster's format
- maketarget.pl - Create a Perl regex for the target word by spotting all <head> tags in the given file
- prepare_sval2.pl - Prepare Senseval-2 data for experiments
- preprocess.pl - Tokenize and optionally split Senseval-2 input into training and test portions
- sval2plain.pl - Convert a Senseval-2 input file to plain text format
- windower.pl - Cut a window of context W words big around a target word in a given Senseval-2 input file

count/ (Modify count.pl output from Text-NSP)

reduce-count.pl - Reduce the size of the Text-NSP output created with huge training data

matrix/ - (Similarity matrix constructors)

bitsimat.pl - Create a similarity matrix for given bit vectors
simat.pl - Create a similarity matrix for given non-binary (integer or real) vectors

vector/ (Represent contexts as vectors to be clustered)

nsp2regex.pl - Creates regular expressions from Text-NSP output to represent features
order1vec.pl - Creates first order context vectors
order2vec.pl - Creates second order context vectors
wordvec.pl - Creates word vectors from Text-NSP output

svd/ (SVDPACKC interface)

mat2harbo.pl - Convert matrices from SenseClusters format to Harwell-Boeing format
svdpackout.pl - Reconstruct a matrix from its singular vectors as found by by SVDPACKC

clusterstopping/ (Cluster Stopping program)

clusterstopping.pl - Predicts the number of clusters that a given data should be divided into. Provides three such cluster stopping measures.

evaluate/ (Evaluate the results of SenseClusters by comparing to gold standard data)

cluto2label.pl - Convert clustering output of Cluto to a cluster by sense confusion matrix for evaluation
format_clusters.pl - Display contexts that were clustered with assigned sense id, or display senseval-2 format with assigned sense id
label.pl - Assign sense tags to the discovered clusters for evaluation
report.pl - Report performance in terms of the precision, recall, and F-Measure, and show a confusion matrix

clusterlabel/ (Cluster Labeling programs)

clusterlabeling.pl - Selects significant word-pairs from the contents/instances of the clusters and assigns them as the labels to the clusters. Also creates separate file for each cluster.

AUTHOR

Ted Pedersen, University of Minnesota, Duluth
tpederse at d.umn.edu

COPYRIGHT

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

Note: a copy of the GNU Free Documentation License is available on the web at http://www.gnu.org/copyleft/fdl.html and is included in this distribution as FDL.txt.

To install Text::SenseClusters, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Text::SenseClusters

CPAN shell

perl -MCPAN -e shell
install Text::SenseClusters

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)