NAME
create-icpropagation.pl - This program determines the probability of the CUIs in a specified set of sources and relations.
SYNOPSIS
This program determines the probability of the CUIs in a specified set of sources and relations.
USAGE
Usage: create-icpropagation.pl [OPTIONS] OUTPUTFILE ICFREQUENCY_FILE
OUTPUTFILE
File in which the probability of the CUIs will be stored.
The ouput file containing the probability of the CUIs has the following format:
SMOOTH :: <0|1>
SAB :: <sources>
REL :: <relations>
RELA :: <relas> <- if any are specified in the config
CUI<>probability
CUI<>probability
...
ICFREQUENCY FILE
File containing the icfrequency counts
The input file contains frequency counts for CUIs in the following format:
SAB :: <sources>
CUI<>freq
CUI<>freq
...
Optional Arguments:
--smooth
Incorporate Laplace smoothing, where the frequency count of each of the concepts in the taxonomy is incremented by one. The advantage of doing this is that it avoides having a concept that has a probability of zero. The disadvantage is that it can shift the overall probability mass of the concepts from what is actually seen in the corpus.
--config FILE
This is the configuration file. The format of the configuration file is as follows:
SAB :: <include|exclude> <source1, source2, ... sourceN>
REL :: <include|exclude> <relation1, relation2, ... relationN>
For example, if we wanted to use the MSH vocabulary with only the RB/RN relations, the configuration file would be:
SAB :: include MSH REL :: include RB, RN
or
SAB :: include MSH REL :: exclude PAR, CHD
If you go to the configuration file directory, there will be example configuration files for the different runs that you have performed.
Note: You can use relations other than PAR/CHD and RB/RN for propagation but we do not recommend it. The PAR/CHD and RB/RN relations are considered the heirarchical relations in the UMLS which is required for propagation to perform correctly.
--disregard
This ignores the SAB configuration that the icfrequency file was created with
--precision N
Displays values upto N places of decimal.
--username STRING
Username is required to access the umls database on MySql
--password STRING
Password is required to access the umls database on MySql
--hostname STRING
Hostname where mysql is located. DEFAULT: localhost
--database STRING
Database contain UMLS DEFAULT: umls
--debug
Sets the UMLS-Interface debug flag on for testing
--help
Displays the quick summary of program options.
--version
Displays the version information.
PROPAGATION
The Information Content (IC) is defined as the negative log of the probability of a concept. The probability of a concept, c, is determine by summing the probability of the concept ocurring in some text plus the probability its decendants occuring in some text:
For more information on how this is calculated please see the README file.
SYSTEM REQUIREMENTS
Perl (version 5.8.5 or better) - http://www.perl.org
UMLS::Interface - http://search.cpan.org/dist/UMLS-Interface
UMLS::Similarity - http://search.cpan.org/dist/UMLS-Similarity
Text::NSP - http://search.cpan.org/dist/Text-NSP
MetaMap - http://mmtx.nlm.nih.gov/
CONTACT US
If you have any trouble installing and using CreatePropagationFile,
please contact us via the users mailing list :
umls-similarity@yahoogroups.com
You can join this group by going to:
http://tech.groups.yahoo.com/group/umls-similarity/
You may also contact us directly if you prefer :
Bridget T. McInnes: bthomson at cs.umn.edu
Ted Pedersen : tpederse at d.umn.edu
AUTHOR
Bridget T. McInnes, University of Minnesota
COPYRIGHT
Copyright (c) 2007-2009,
Bridget T. McInnes, University of Minnesota
bthomson at cs.umn.edu
Ted Pedersen, University of Minnesota Duluth
tpederse at d.umn.edu
Siddharth Patwardhan, University of Utah, Salt Lake City
sidd@cs.utah.edu
Serguei Pakhomov, University of Minnesota Twin Cities
pakh0002@umn.edu
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to:
The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.