NAME

getIC.pl - This program returns the information content of a concept or a term.

SYNOPSIS

This program takes in a CUI or a term and returns its information content.

USAGE

Usage: getIC.pl [OPTION] IC | FREQUENCY FILE [CUI|TERM]

INPUT

Required Arguments:

[CUI|TERM}

Concept Unique Identifier (CUI) or a term from the Unified Medical Language System (UMLS)

IC | FREQUENCY FILE

File containing the information content or the frequency counts of CUIs in the following format:

CUI<>freq
CUI<>freq

See the example files called icpropagation and icfrequency in the samples/ directory.

Note: if you are using a frequency file you must specify --icfrequency on the command line because the propagation counts are computed on the fly

Optional Arguments:

--icfrequency

Flag to indicate that the FILE specified on the command line is a frequency file.

--icpropagation

Flag to indicate that the FILE specified on the command line is a propagation file. This is the default.

--config FILE

This is the configuration file. The format of the configuration file is as follows:

SAB :: <include|exclude> <source1, source2, ... sourceN>

REL :: <include|exclude> <relation1, relation2, ... relationN>

RELA :: <include|exclude> <rela1, rela2, ... relaN> (optional)

For example, if we wanted to use the MSH vocabulary with only the RB/RN relations, the configuration file would be:

SAB :: include MSH REL :: include RB, RN RELA :: include inverse_isa, isa

or

SAB :: include MSH REL :: exclude PAR, CHD

If you go to the configuration file directory, there will be example configuration files for the different runs that you have performed.

--smooth

Incorporate Laplace smoothing, where the frequency count of each of the concepts in the taxonomy is incremented by one. The advantage of doing this is that it avoides having a concept that has a probability of zero. The disadvantage is that it can shift the overall probability mass of the concepts from what is actually seen in the corpus.

--infile

Takes a file of CUIs (one per line) and returns their information content.

--debug

Sets the debug flag for testing

--username STRING

Username is required to access the umls database on MySql unless it was specified in the my.cnf file at installation

--password STRING

Password is required to access the umls database on MySql unless it was specified in the my.cnf file at installation

--hostname STRING

Hostname where mysql is located. DEFAULT: localhost

--socket STRING

The socket your mysql is using. DEFAULT: /tmp/mysql.sock

--database STRING

Database contain UMLS DEFAULT: umls

--help

Displays the quick summary of program options.

--version

Displays the version information.

OUTPUT

List of CUIs that are associated with the input term

SYSTEM REQUIREMENTS

  • Perl (version 5.8.5 or better) - http://www.perl.org

AUTHOR

Bridget T. McInnes, University of Minnesota

COPYRIGHT

Copyright (c) 2007-2009,

Bridget T. McInnes, University of Minnesota
bthomson at cs.umn.edu
   
Ted Pedersen, University of Minnesota Duluth
tpederse at d.umn.edu

Siddharth Patwardhan, University of Utah, Salt Lake City
sidd@cs.utah.edu

Serguei Pakhomov, University of Minnesota Twin Cities
pakh0002@umn.edu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to:

The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA  02111-1307, USA.