Documentation
Provide a "fuzzy" diff command for comparing svd output to our key
Label discovered clusters based on their content
Predict the optimal number of clusters in a data set
Reduce size of feature space by removing words not in evaluation data
Convert Cluto output to a confusion matrix
Map Cluto output to Senseval-2 format input file
Assign labels to clusters in a confusion matrix to maximize agreement
Summarize SenseClusters results with precision, recall, and confusion matrix
Build a similarity matrix from binary context vectors
Build a similarity matrix from real-valued context vectors
Convert a plain text file with one context per line into Senseval-2 format
Create a balanced Senseval-2 data file that has the same number of instances for each possible sense.
Remove the instances of low frequency sense tags from a Senseval-2 data file
Compute the distribution of senses in a Senseval-2 data file
Convert Senseval-2 answer key to Senseclusters format
Create target.regex file for a given Senseval-2 data file that shows all the forms of the target word
Makes sure Senseval-2 data is cleaned and has sense tags prior to invocation of SenseClusters
Split Senseval-2 data file into one file per lexical item (lexelt), and carry out various tokenization and formatting tasks
Convert a Senseval-2 data file into plain text format
Limit window of context around a target word specified in a Senseval-2 input file
Convert matrix in Senseclusters sparse format to Harwell-Boeing (HB) format and set input parameters (lap2) for input to SVDPACKC.
Reconstruct post-SVD form of matrix from singular values output by SVDPACKC
Convert Text-NSP output into regular expressions to be used for feature matching
Convert Senseval-2 format contexts into first order feature vectors in Cluto format
Convert Senseval-2 contexts into second order context vectors in Cluto format
Construct word vectors from bigram or co-occurrence matrices
[Web Interface] How to install SenseClusters Web interface
[Web Interface] Description of cgi files used in SenseClusters web interface
[Web Interface] Check user input and create command to run on Web interface
[Web Interface] Creates gnuplot file (*.gp file) for Web user
[Web Interface] Create gnuplot output for Web interface user
[Web Interface] Create .tex file output for Web interface user
[Web Interface] Check XML data to see if well-formed
[Web Interface] Description of user_data directory in Web interface
[Web Interface] Description of the htdocs directory in the Web interface
Revision history of SenseClusters
Word and Context Clustering Flowcharts
Skeleton for creating new SenseClusters programs
Modules
Cluster similar contexts using co-occurrence matrices and Latent Semantic Analysis
Examples
- samples/Data/begin.v-test.xml
- samples/Data/eng-global-train.txt.gz
- samples/Data/eng-lex-sample.evaluation.xml.gz
- samples/Data/eng-lex-sample.key.gz
- samples/Data/eng-lex-sample.training.xml.gz
- samples/README.samples.pod
- samples/Regexs/nontoken.regex
- samples/Regexs/stoplist-nsp.regex
- samples/Regexs/target.regex
- samples/Regexs/token.regex
- samples/makedata.sh
- samples/sc-toolkit.sh
- samples/setdirs.sh
- samples/setup.pl
- samples/target-wrapper.sh
- samples/word-wrapper.sh