NAME

Bio::ExtractNE - Extract biological named entities from PUBMED abstracts

SYNOPSIS

use Bio::ExtractNE;

SET DICTIONARY

Dictionary file defaults to /usr/local/Bio-ExtractNE/sprot.dict or dict/sprot.dict

Usually, dictionary object is automatically built for you unless you have dict files elsewhere.

set_dictionary();
set_dictionary('sprot.dict');

GET ABSTRACTS

Get abstracts by their PMID

$abst = get_abstract('15043991');

EXTRACT NAMES

use Data::Dumper;

Extract NE from abstract's body

print Dumper extractNE($abst->{text});

Extract NE from abstract's title and body.

print Dumper extractNE(join q/ /, @{$abst}{qw/title text/});

OPTIONS

Don't use dictionary. Default is YES.

$USE_DICTIONARY = 0;

Use the RPC service of GAPSCORE. Default is NO.

$USE_GAPSCORE_RPC = 1;

Use the self-implemented GAPSCORE. Default is NO.

$USE_GAPSCORE = 1;

DESCRIPTION

This module can extract Named Entities from PUBMED online abstracts. It recognizes protein names and gene names adapted from SWISSPROT database. And it also tries to resolve on-the-spot abbreviation reference, which means abbreviations in abstracts bracketed between parentheses will be extracted too using a naive looking-back method, though the technique is inelegant in some way now.

For now, this package use GAPSCORE as its name guessing module. Both an RPC interface and a self-implemented version are provided.

You can pass a PMID, a URL for abstract, an abstract passage, or an abstract file to the extractNE() function. It then returns a list of tokens, recognized named entities, abbreviations with their full names, and sentence numbers in which there are potential protein-protein interactions. But no categorical or synonym grouping information of named entities is provided. Other classes of words, such as disease names and virus names will be considered in future versions.

Any suggestion or comment is always welcome. Contributions and patches are wanted.

Email me if you have questions.

SEE ALSO

bene.pl, a command-line tool for <Bio::ExtractNE> (Not done yet)

For a similar function, see also Lingua::EN::NamedEntity, except it is written to deal with general English texts.

PUBMED, http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed

SwissProt, http://expasy.org/

Andy also, the related modules with this distribution:

Bio::ExtractNE::MakeDict

Bio::ExtractNE::Dict

Bio::ExtractNE::GetAbst

Bio::ExtractNE::GetSprot

COPYRIGHT AND LICENSE

Copyright (C) 2004 Yung-chung Lin (a.k.a. xern) <xern@cpan.org> and Chin-lin Peng

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself