Author image Ted Pedersen
and 1 contributors

Documentation

Explains structure and use of /Testing
Provide a "fuzzy" diff command for comparing svd output to our key
SenseClusters Toolkit directory structure with links to all program documentation
Label discovered clusters based on their content
Predict the optimal number of clusters in a data set
Reduce size of feature space by removing words not in evaluation data
Convert Cluto output to a confusion matrix
Map Cluto output to Senseval-2 format input file
Assign labels to clusters in a confusion matrix to maximize agreement
Summarize SenseClusters results with precision, recall, and confusion matrix
Build a similarity matrix from binary context vectors
Build a similarity matrix from real-valued context vectors
Convert a plain text file with one context per line into Senseval-2 format
Create a balanced Senseval-2 data file that has the same number of instances for each possible sense.
Remove the instances of low frequency sense tags from a Senseval-2 data file
Compute the distribution of senses in a Senseval-2 data file
Convert Senseval-2 answer key to Senseclusters format
Create target.regex file for a given Senseval-2 data file that shows all the forms of the target word
Makes sure Senseval-2 data is cleaned and has sense tags prior to invocation of SenseClusters
Split Senseval-2 data file into one file per lexical item (lexelt), and carry out various tokenization and formatting tasks
Convert a Senseval-2 data file into plain text format
Limit window of context around a target word specified in a Senseval-2 input file
Convert matrix in Senseclusters sparse format to Harwell-Boeing (HB) format and set input parameters (lap2) for input to SVDPACKC.
Reconstruct post-SVD form of matrix from singular values output by SVDPACKC
Convert Text-NSP output into regular expressions to be used for feature matching
Convert Senseval-2 format contexts into first order feature vectors in Cluto format
Convert Senseval-2 contexts into second order context vectors in Cluto format
Construct word vectors from bigram or co-occurrence matrices
[Web Interface] How to install SenseClusters Web interface
[Web Interface] Description of cgi files used in SenseClusters web interface
[Web Interface] Check user input to Web interface and create discriminate.pl command to run on Web server
[Web Interface] Creates gnuplot file (*.gp file) for Web user
[Web Interface] Create gnuplot output for Web interface user
[Web Interface] Create .tex file output for Web interface user
[Web Interface] Check XML data to see if well-formed
[Web Interface] Description of user_data directory in Web interface
[Web Interface] Description of the htdocs directory in the Web interface
Revision history of SenseClusters
FAQ
Frequently Asked Questions about SenseClusters
Word and Context Clustering Flowcharts
List of things TODO for SenseClusters
Skeleton for creating new SenseClusters programs

Modules

Cluster similar contexts using co-occurrence matrices and Latent Semantic Analysis