NAME

INSTALL Installation instructions for SenseClusters

SYNOPSIS

If you have su or sudo access, you should be able to install and test the installation of SenseClusters via automatic download from CPAN as follows:

# install SenseClusters and all dependent CPAN modules
perl -MCPAN -e 'install Bundle::Text::SenseClusters';

# install cluto and SVDPACKC (included in SenseClusters)
cd ~/.cpan/build/Text-SenseClusters-[insert_version]
cd External
csh ./install.sh /usr/local/bin
cd ~

# run SC test cases (note that location of cpan build 
# directory might vary on your system. 

cd ~/.cpan/build/Text-SenseClusters-[insert_version]
cd Testing
csh ./ALL-TESTS.sh
cd ~

This assumes that /usr/local/bin is in your PATH and is your preferred location for user installed executable scripts. If it is not, substitute your perferred directory here.

INSTALLATION OVERVIEW

SenseClusters consists of the core SenseClusters programs (primarily .pl programs found in this distribution), Perl modules available from CPAN, and two external programs (SVDPACKC and Cluto).

You can download the core and external programs from CPAN http://search.cpan.org/dist/Text-SenseClusters or Sourceforge http://senseclusters.sourceforge.net.

You may be able to download and install the SenseClusters core, dependent CPAN modules, and External programs via a single command using the CPAN module.

If you have sudo or su access, then installation of the CPAN modules and the core SenseClusters programs can be achieved as follows:

perl -MCPAN -e 'install Bundle::Text::SenseClusters';

If you do not have su or sudo access, then you will need to install manually, as described below.

You may be able to install the external programs Cluto and SVDPACKC via the following script we provide :

cd Text-SenseClusters-[insert_version]
cd External
csh ./install.sh INSTALLDIR
cd ~

If you have sudo or su access, INSTALLDIR should be a directory in your PATH, such as /usr/local/bin. If you do not, you will need to install into a directory you have read and write access to, and then include in your path. If install.sh fails for some reason, you will need to install Cluto and SVDPATH manually, as described below.

At present SenseClusters does not utilize 'make test', so testing must be done via scripts found in the /Testing directory. Please make certain to run these tests after the installation of the External programs, CPAN modules, and SenseClusters core has concluded.

cd ~/.cpan/build/Text-SenseClusters-[insert_version]
cd Testing
csh ./ALL-TESTS.sh
cd ~

DESCRIPTION OF PACKAGE COMPONENTS

SenseClusters has been developed and tested on Linux and Solaris, primarily using Perl and the C shell (csh). It is based primarily on Perl command line programs distributed with the package and Perl modules that are available from CPAN. However, it does include two external programs, one distributed as C source code (SVDPACKC) and one distributed in binary form (CLUTO) for Linux and Solaris. This dependence on CLUTO limits SenseClusters to running on Linux or Solaris. There is a Windows version of CLUTO available, but we have not tested how well this integrates into SenseClusters.

DEPENDENCIES

SenseClusters requires version 5.6.0 (or better) of Perl. You can check with version of perl you have via :

perl -v

If you need a more recent version, you can find that at http://perl.org.

CPAN Modules

SenseClusters uses a number of different CPAN modules. These are all included in the Bundle above, and can be installed via the Bundle (recommended) or individually (described below).

  • PDL (Perl Data Language, version 2.4.1 or better)

  • Algorithm::Munkres (version 0.07 or better)

  • Algorithm::RandomMatrixGeneration (version 0.06 or better)

  • Bit::Vector (version 6.3 or better)

  • Math::SparseMatrix (version 0.02 or better)

  • Math::SparseVector (version 0.04 or better)

  • Set::Scalar (version 1.19 or better)

  • Text::NSP (version 1.09 or better)

External Packages (/External)

The following packages are not written in Perl and are developed outside of the SenseClusters project. SVDPACKC is distributed as C source code, and Cluto is distributed as pre-complied binaries for Linux and Solaris.

Please note that SVDPACK is optional - SenseClusters will run without it (just don't use the --svd option). However, Cluto is mandatory, SenseClusters will not be able to perform clustering without it.

  • CLUTO (version 2.1.1 or better)

  • SVDPACKC (Feb 2004 version or better, compiled with gcc 3.2.2, 3.2.3, or 3.3.0)

MANUAL INSTALLATION OF CPAN MODULES

You can install all of the modules described below and the Core of SenseClusters by using the Bundle described above. However, if you have problems with that or prefer to install modules individually, you can proceed as follows.

You may want to check to see which Perl modules are installed on your system already via :

perldoc perllocal

    The following modules must be installed for proper use of SenseClusters:

    Perl Data Language (version 2.4.1 or better)

    SenseClusters uses the Perl Data Language (PDL) for efficient computations and storage of high dimensional data structures.

    It is available at: http://search.cpan.org/dist/PDL/

    Note that if you have supervisor access on your machine, and have the MCPAN Perl module available, you can install PDL automatically via:

    perl -MCPAN -e 'install PDL';

    If you do not have supervisor access, you will need to install this module locally. Note that you can configure the CPAN module to install locally by setting PREFIX and LIB options to directories you have read write authority over.

    Note that PDL has quite a few dependencies, and can be tricky to install. You may want to check with your system administrator and see if they can install on your behalf before you tackle the local install of PDL. All the other code mentioned here can be locally installed quite routinely.

    This is a good description of how to do local installs of Perl modules: http://www.perl.com/pub/a/2002/04/10/mod_perl.html

    Bit::Vector (version 6.3 or better)

    The Bit::Vector module is used with binary context vectors (via --binary option in wrappers or program bitsimat.pl). This can be downloaded from:

    http://search.cpan.org/dist/Bit-Vector/

    Note that the following installation instructions apply to all of the
    CPAN modules, and will not be repeated in detail for each module.

    If you have supervisor access, or have configured MCPAN for local install, you can install via:

    perl -MCPAN -e 'install Bit::Vector';

    If not, you can, "manually" install by downloading the *.tar.gz file, unpacking, and executing the following commands.

    perl Makefile.PL PREFIX=/space/kulka020/Bit-Vector LIB=/space/kulka020/MyPerlLib
    make
    make test
    make install

    Note that the PREFIX and LIB settings are just examples to help you create a local install, if you do not have supervisor (su) access.

    You must include /space/kulka020/MyPerlLib in your PERL5LIB environment variable to access this module when running.

    Ngram Statistics Package (version 1.05 or better)

    SenseClusters uses Text-NSP to select a variety of lexical features. Text-NSP is freely available at http://search.cpan.org/dist/Text-NSP/

    perl -MCPAN -e 'install Text::NSP';

    or manual installation.

    Set::Scalar (version 1.19 or better)

    The Set::Scalar module is used by the program bitsimat.pl.

    It is available at: http://search.cpan.org/dist/Set-Scalar/

    perl -MCPAN -e 'install Set::Scalar';

    or manual installation.

    Math::SparseVector (version 0.04 or better)

    This is a Perl module that implements sparse vector operations.

    It is available at: http://search.cpan.org/dist/Math-SparseVector/

    perl -MCPAN -e 'install Math::SparseVector';

    or manual installation.

    Math::SparseMatrix (version 0.02 or better)

    This is a Perl module that implements sparse matrix operations, in particular the sparse matrix transpose operation.

    It is available at: http://search.cpan.org/dist/Math-SparseMatrix/

    perl -MCPAN -e 'install Math::SparseMatrix';

    or manual installation.

    Algorithm::Munkres (version 0.07 or better)

    This is a Perl module that implements Munkres' solution to classical Assignment Problem. This is used when carrying out evaluation of discovered clusters with a provided gold standard.

    It is available at: http://search.cpan.org/dist/Algorithm-Munkres/

    perl -MCPAN -e 'install Algorithm::Munkres';

    or manual installation.

    Algorithm::RandomMatrixGeneration (version 0.06 or better)

    This is a Perl module that generates random matrix given the row and column marginals. This is required for SenseClusters to run the Adapted Gap Statistic in clusterstopping.pl.

    It is available at: http://search.cpan.org/dist/Algorithm-RandomMatrixGeneration/

    perl -MCPAN -e 'install Algorithm::RandomMatrixGeneration';

    or manual installation.

External Packages

Please note that we provide a modified version of SVDPACK in /External/SVDPACKC that makes all the changes described below. You should be able to compile and install this code via the External/install.sh script. If that fails you can follow the steps described in the install script manually. If for some reason you would prefer to start with a fresh copy of SVDPACKC, you can follow the directions below (which also explain the changes we have made and included in SenseClusters /External/SVDPACKC).

SVDPACKC (Feb 2004 version or better)

SVDPACKC is a C program that performs SVD. It is available for download from http://www.netlib.org/svdpack. SVDPACKC does not have a version number associated with it, but check the files in your download to make sure they are dated from at least Feb 2004. The version we include and modify in /External/SVDPACKC is the Feb 2004 version.

Please note that you should use version 3.2.2, 3.2.3, or 3.3.0 of the gcc compiler. Segmentation faults results if you use version 4 or better. We are currently investigating the use of SVDLIDC as an alternative for our SVD processing.

While installing SVDPACKC, make sure to -

   1. 	In las2.c, uncomment the following line 

	/*	#define  UNIX_CREAT	*/

	if you are running on a Unix or Linux platform.

   2. 	In las2.h, modify the default values of constants LMTNW, NMAX and  
	NZMAX to some larger numbers such that -

	NMAX 	= Maximum size of the feature space before reduction 
		  (we set this to 30,000)
	NZMAX 	= Maximum possible number of Non-zero entries 
		  (we assume our 30,000 x 30,000 matrix is at most 1% dense
		  and hence NZMAX = 30,000 x 30,000 / 100 = 9,000,000)
	LMTNW 	= Maximum Work Space Area required 
		= 6*NMAX + 4*NMAX + 1 + NMAX*NMAX
		  (we set LMTNW = 900300001 for a 
		  1% dense 30,000 x 30,000 matrix)

   3. 	Modify the file makefile so that ANSI C is used. 

	CC = gcc -ansi

        [Please use gcc version 3.2.2, 3.2.3, or 3.3.0 when compiling SVDPACKC.
	gcc versions 4.0.0 and above appear to result in segmentation faults.] 

   4.	Run 'make las2' after the above modifications are done in las2.h,
	las2.c, and makefile.
  • QUICK TEST of SVDPACKC

The following will will help you check that SVDPACKC is installed correctly.

# unzip the sample belladit.gz data file that comes with SVDPACKC
gunzip belladit.gz

# copy this as the input matrix to SVDPACKC
cp belladit matrix

# run las2 to test if everything is ok
las2

# this will not produce any output to STDOUT, but it should create 2  
# output files - lav2 (binary) and lao2 (text)
CLUTO (version 2.1.1 or better)

The script External/install.sh will attempt to retrieve and install Cluto automatically. If that fails, you can follow the steps outline in the install script, or the instructions below.

SenseClusters uses CLUTO to support extensive clustering options, analysis and visualization. CLUTO is freely available from http://www-users.cs.umn.edu/~karypis/cluto/

If you run on both Linux and Solaris platforms, you will need to set your path slightly differently each time to allow SenseClusters to run. The following code in your .cshrc file will take care of this.

set OSNAME=`uname -s`

if ($OSNAME == "SunOS") then
       set path = (PATH_2_CLUTO/Sun $path)
else if ($OSNAME == "Linux") then
       set path = (PATH_2_CLUTO/Linux $path)
else echo "lost"
endif

where, PATH_2_CLUTO is a complete path to the directory where CLUTO is downloaded and unpacked. If you only run on Solaris or Linux, then of course you can just set the path with the appropriate statement from above.

GCLUTO [optional]

Users interested in graphical visualization of clusters are encouraged to try GCLUTO which is also freely down-loadable from http://www-users.cs.umn.edu/~karypis/cluto/gcluto/index.html

To use GCLUTO, you will require the libglut.so.3 library installed on your system. These can be downloaded from - http://at.rpmfind.net/opsys/linux/RPM/libglut.so.3.html

CORE SENSECLUSTERS INSTALLATION

Note that SenseClusters can be installed via the Bundle command described in the SYNOPSIS. If for some reason that fails, you can proceed as follows:

To install the core of SenseClusters, if you have su or sudo access (root user), then you can install via :

perl -MCPAN -e 'install Text::SenseClusters';

Or you can install manually as follows:

perl Makefile.PL
make
cd Testing
csh testall.sh
cd ..
make install 

Note that the testall.sh scripts will not be run via automatic installation. If you do not install manually, you should go back and run the test scripts just to verify that everything is working as expected.

The exact location where SenseClusters will be installed depends on your system configuration. A message will be printed out after 'make install' telling you exactly where it was installed.

Local Installation of Core SenseClusters programs

If you are not able to log in as su or sudo (to be the root user), then you may need to install SenseClusters in a local directory that you own and have permissions to read and write into. You can proceed as above, except that you will need to provide PREFIX and LIB options for your Makefile.PL command, as in:

perl Makefile.PL PREFIX = /YOUR/DIR LIB=/YOUR/DIR/lib

This will set up a Makefile that will install the core SenseClusters programs (*.pl) into :

/YOUR/DIR/bin/

The Sensecluster.pm module will be installed into :

/YOUR/DIR/lib

man pages will be installed into :

/YOUR/DIR/share/man/ (Linux)
/YOUR/DIR/man/       (Solaris)

You will have to explicitly set your $PATH to include :

/YOUR/DIR/bin/

You will have to explicitly set your $PERL5LIB to include :

/YOUR/DIR/lib/

and your $MANPATH to include:

/YOUR/DIR/share/man/ (Linux)
/YOUR/DIR/man/       (Solaris)

Note that the exact locations will be shown after executing 'make install' command. Please double check the recommended settings for PATH and MANPATH there as those will be tailored to your system.

C SHELL (csh) SETUP

If you install without root or superuser access, you will need to set the paths of the dependent packages mentioned previously. The following is an example of how you might set your paths before using SenseClusters if you are using the C shell (csh). If you use another shell then you will need to modify this accordingly.

This assumes that Perl and PDL have been installed by your system administrator and you do not need to set paths to find them. In general we would recommend that Perl and PDL be installed with root access as it's more simple that way.

Assume that all of the external C packages (SVDPACKC, Cluto) have been installed in directories beneath /space/kulka020 (our home directory for this example). It also assumes that all of the CPAN modules have been installed in /space/kulka020/MyPerlLib. In other words, it is assumed that all CPAN modules were installed via the following command:

perl Makefile.PL PREFIX=/space/kulka020/Text-SenseClusters-1.00 LIB=/space/kulka020/MyPerlLib

#######################################################################
#    insert the following into ~/.cshrc and modify HOMEDIR and LIBHOME
#######################################################################

# local directory where we are installing everything

setenv HOMEDIR /space/kulka020

# library name extension used by Perl on our system

setenv LIBDIR /space/kulka020/MyPerlLib

# UMD developed code, we need to set their /bin directories in the PATH

setenv SENSECLUSTERS $HOMEDIR/Text-SenseClusters-1.00
setenv NSP $HOMEDIR/Text-NSP-1.09

# externally developed C code, directories contain executable code so must 
# be included in PATH

setenv SVDPACK $HOMEDIR/SVDPACKC
setenv CLUTO $HOMEDIR/cluto-2.1.1

# pick the right version of Cluto (Solaris or Linux)

set OSNAME=`uname -s`

if ($OSNAME == "SunOS") then
       setenv MYCLUTO $CLUTO/Sun
else if ($OSNAME == "Linux") then
       setenv MYCLUTO $CLUTO/Linux
else echo "lost"
endif

# set the path that Perl searches for CPAN modules

setenv PERL5LIB $LIBDIR

# set the path that is searched for executables

set AKPATH = ($SVDPACK $NSP/bin $MYCLUTO $SENSECLUSTERS/bin .)

set path = ($AKPATH $path)

INSTALLING SenseClusters' Web-interface:

If you would like to setup the SenseClusters' web-interface locally please refer to README.Web.pod for installation instructions.

AUTHORS

Ted Pedersen, University of Minnesota, Duluth
tpederse at d.umn.edu

Anagha Kulkarni, Carnegie Mellon University

Amruta Purandare, University of Pittsburgh

This document last modified by : $Id: INSTALL.pod,v 1.13 2008/03/29 15:16:30 tpederse Exp $

SEE ALSO

The main SenseClusters web page provides links to downloads, the web interface, documentation, and CVS directories:

L<http://senseclusters.sourceforge.net>

We have three mailing lists available for SenseClusters:

L<http://lists.sourceforge.net/lists/listinfo/senseclusters-news>
L<http://lists.sourceforge.net/lists/listinfo/senseclusters-users>
L<http://lists.sourceforge.net/lists/listinfo/senseclusters-developers>

senseclusters-news will provide announcements of new versions, while users is intended to provide a way for users to post questions or bug reports. senseclusters-developers is where detailed discussion of ongoing coding efforts will take place. You are welcome to subscribe to any of these!

If you have any trouble installing and using SenseClusters, please contact us via the users mailing list :

https://lists.sourceforge.net/lists/listinfo/senseclusters-users

COPYRIGHT AND LICENSE

Copyright (c) 2004-2008, Ted Pedersen

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

Note: a copy of the GNU Free Documentation License is available on the web at http://www.gnu.org/copyleft/fdl.html and is included in this distribution as FDL.txt.

4 POD Errors

The following errors were encountered while parsing the POD:

Around line 167:

You can't have =items (as at line 171) unless the first thing after the =over is an =item

Around line 291:

You forgot a '=back' before '=head2'

Around line 302:

'=item' outside of any '=over'

Around line 406:

You forgot a '=back' before '=head1'