NAME

create-icpropagation.pl - This program determines the probability of the CUIs in a specified set of sources and relations.

SYNOPSIS

This program determines the probability of the CUIs in a specified set of sources and relations.

USAGE

Usage: create-icpropagation.pl [OPTIONS] OUTPUTFILE ICFREQUENCY_FILE

OUTPUTFILE

File in which the probability of the CUIs will be stored.

The ouput file containing the probability of the CUIs has the following format:

SMOOTH :: <0|1>
SAB :: <sources>
REL :: <relations>
RELA :: <relas>  <- if any are specified in the config
CUI<>probability
CUI<>probability
...

ICFREQUENCY FILE

File containing the icfrequency counts

The input file contains frequency counts for CUIs in the following format:

SAB :: <sources>
CUI<>freq
CUI<>freq
...

Optional Arguments:

--smooth

Incorporate Laplace smoothing, where the frequency count of each of the concepts in the taxonomy is incremented by one. The advantage of doing this is that it avoides having a concept that has a probability of zero. The disadvantage is that it can shift the overall probability mass of the concepts from what is actually seen in the corpus.

--config FILE

This is the configuration file. The format of the configuration file is as follows:

SAB :: <include|exclude> <source1, source2, ... sourceN>

REL :: <include|exclude> <relation1, relation2, ... relationN>

For example, if we wanted to use the MSH vocabulary with only the RB/RN relations, the configuration file would be:

SAB :: include MSH REL :: include RB, RN

or

SAB :: include MSH REL :: exclude PAR, CHD

If you go to the configuration file directory, there will be example configuration files for the different runs that you have performed.

Note: You can use relations other than PAR/CHD and RB/RN for propagation but we do not recommend it. The PAR/CHD and RB/RN relations are considered the heirarchical relations in the UMLS which is required for propagation to perform correctly.

--disregard

This ignores the SAB configuration that the icfrequency file was created with

--precision N

Displays values upto N places of decimal.

--username STRING

Username is required to access the umls database on MySql

--password STRING

Password is required to access the umls database on MySql

--hostname STRING

Hostname where mysql is located. DEFAULT: localhost

--database STRING

Database contain UMLS DEFAULT: umls

--debug

Sets the UMLS-Interface debug flag on for testing

--help

Displays the quick summary of program options.

--version

Displays the version information.

PROPAGATION

The Information Content (IC) is defined as the negative log of the probability of a concept. The probability of a concept, c, is determine by summing the probability of the concept ocurring in some text plus the probability its decendants occuring in some text:

For more information on how this is calculated please see the README file.

SYSTEM REQUIREMENTS

  • Perl (version 5.8.5 or better) - http://www.perl.org

  • UMLS::Interface - http://search.cpan.org/dist/UMLS-Interface

  • UMLS::Similarity - http://search.cpan.org/dist/UMLS-Similarity

  • Text::NSP - http://search.cpan.org/dist/Text-NSP

  • MetaMap - http://mmtx.nlm.nih.gov/

CONTACT US

If you have any trouble installing and using CreatePropagationFile, 
please contact us via the users mailing list :
  
    umls-similarity@yahoogroups.com
   
You can join this group by going to:
  
    http://tech.groups.yahoo.com/group/umls-similarity/
   
You may also contact us directly if you prefer :
  
    Bridget T. McInnes: bthomson at cs.umn.edu 

    Ted Pedersen : tpederse at d.umn.edu

AUTHOR

Bridget T. McInnes, University of Minnesota

COPYRIGHT

Copyright (c) 2007-2009,

Bridget T. McInnes, University of Minnesota
bthomson at cs.umn.edu
   
Ted Pedersen, University of Minnesota Duluth
tpederse at d.umn.edu


Siddharth Patwardhan, University of Utah, Salt Lake City
sidd@cs.utah.edu

Serguei Pakhomov, University of Minnesota Twin Cities
pakh0002@umn.edu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to:

The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA  02111-1307, USA.