NAME

Bio::DB::NextProt - Object interface to neXtProt REST API.

SYNOPSIS

my $np = Bio::DB::NextProt->new();

my @result_1 = $np->search_cv(-query => "kinase#");

my @result_2 = $np->get_isoform_info(-id => "NX_O00142-2");

my @result_3 = $np->get_protein_cv_info(-id => "PTM-0205", -format => "html");

DESCRIPTION

The module allows the dynamic retrieval of information from the neXtProt Database through its API service. All the information below was extracted from the API webpage. For the moment the results obtained from the API are in pure HTML, XML or JSON.

Search functionalities

Search Protein

Search proteins matching the query or search proteins for which the filter is true. Available filter values are: structure, disease, expression, mutagenesis or proteomics. Note: only one filter parameter at a time is possible for the moment.

@result = $np->search_protein(-query => "kinase");
@result = $np->search_protein(-query => "kinase", -filter => "disease");

Control Vocabulary Terms

Search control vocabulary terms matching the query or search control vocabulary terms in the category specified by the filter. Available filter values are: enzyme, go, mesh, cell, domain, family, tissue, metal, pathway, disease, keyword, ptm, subcell. Note: only one category at a time is possible.

@result = $np->search_cv(-query => "colon");
@result = $np->search_cv(-query => "colon", -filter => "keyword");

Format:

Output format maybe in JSON (default), HTML or XML.

@result = $np->search_protein(-query => "kinase", -filter => "disease", -format => "html");

Find information by protein entry

Protein ID

ID is neXtProt identifier. Retrieve gene name as well as main identifier and isoform sequences

@result = $np->get_protein_info(-query => "NX_P13051");

Post-translational modifications

For each isoform of the specified entry, retrieve all post-translational modifications.

@result = $np->get_protein_info(-query => "NX_P13051", -retrieve => "ptm");

Variant

For each isoform of the specified entry, retrieve all variants.

@result = $np->get_protein_info(-query => "NX_P13051", -retrieve => "variant");

Localisation

For each isoform of the specified entry, retrieve all subcellular location.

@result = $np->get_protein_info(-query => "NX_P13051", -retrieve => "localisation");

Expression

Retrieve all expression information by tissue for the specified entry.

@result = $np->get_protein_info(-query => "NX_P13051", -retrieve => "expression");

Format:

Output format maybe in JSON (default), HTML or XML.

@result = $np->get_protein_info(-query => "NX_P13051", -retrieve => "expression", -format => "html");

Find information by isoform

Protein ID

ID is neXtProt identifier. Retrieve gene name as well as main identifier and isoform sequences

@result = $np->get_isoform_info(-query => "NX_O00142-2");

Post-translational modifications

For each isoform of the specified entry, retrieve all post-translational modifications.

@result = $np->get_isoform_info(-query => "NX_P01116-2", -retrieve => "ptm");

Variant

For each isoform of the specified entry, retrieve all variants.

@result = $np->get_isoform_info(-query => "NX_P01116-2", -retrieve => "variant");

Localisation

For each isoform of the specified entry, retrieve all subcellular location.

@result = $np->get_isoform_info(-query => "NX_P01116-2", -retrieve => "localisation");

Find information by controlled vocabulary term

Protein ID

ID is neXtProt identifier. Retrieve the accession, the name and the category of the CV term.

@result = $np->get_protein_cv_info(-query => "PTM-0205");

Protein List

List all the proteins associated with the term in neXtProt.

@result = $np->get_protein_cv_info(-query => "PTM-0205", -retrieve => "proteins");

Format:

Output format maybe in JSON (default), HTML or XML.

@result = $np->get_protein_cv_info(-query => "PTM-0205", -retrieve => "proteins", -format => "html");

Retrieving Accession Lists

Allows the retrieval of all accession codes from individual chromossomes or from the entire NextProt database.

@result = $np->get_accession_list(-chromosome => "10");

@result = $np->get_accession_list(-chromosome => "all");

entries with a protein existence "at protein level" (PE 1)

@result = $np->get_accession_list(-evidence => "protein_level");

entries with a protein existence "at transcript level" (PE 2)

@result = $np->get_accession_list(-evidence => "transcript_level");

entries with a protein existence "by homology" (PE 3)

@result = $np->get_accession_list(-evidence => "homology");

entries with a protein existence "predicted" (PE 4)

@result = $np->get_accession_list(-evidence => "predicted");

entries with a protein existence "uncertain" (PE 5)

@result = $np->get_accession_list(-evidence => "uncertain");

Customized Report Files for the HUPO Human Proteome Project (HPP)

Individual report files for each chromosomes (1 to 22, X, Y and MT)

my @list = $np->get_hpp_report(-chromosome => 10);

Annotated phosphorylated residues per chromosome

my @list = $np->get_hpp_report(-phospho => "true");

Annotated N-Acetyl residues per chromosome

my @list = $np->get_hpp_report(-nacetyl => "true");

NextProt Mapping

Mapping of neXtProt accession numbers to external resources.

Ensembl gene identifiers

@list = $np->get_mapping(-map => "ensembl_gene");

Ensembl protein identifiers

@list = $np->get_mapping(-map => "ensembl_protein");

protein ids that cannot be mapped to any isoform in neXtProt

@list = $np->get_mapping(-map => "ensembl_unmapped");

Ensembl transcript identifiers

@list = $np->get_mapping(-map => "ensembl_transcript");

transcript is considered as coding by Ensembl that cannot be mapped to any isoform in neXtProt

@list = $np->get_mapping(-map => "ensembl_transcript_unmapped");

NCBI GeneID gene accession numbers

@list = $np->get_mapping(-map => "geneid");

HGNC gene accession numbers

@list = $np->get_mapping(-map => "hgnc");

MGI mouse gene accession numbers

@list = $np->get_mapping(-map => "mgi");

NCBI RefSeq gene accession numbers

@list = $np->get_mapping(-map => "refseq");

Chromosome Report

The module also allows the programatic access to chromosome information by accessing and formatting the chr_report tables from the nextprot ftp server. The retrieved structure is a hash of hashes, being the first key the NextProt Accession Number. The internal hashes have the following values:

* Gene Name * Chromosomal position * Start position * Stop position * Protein existence * Proteomics * Antibody * 3D * Disease * Isoforms * Variants * PTMs * Description

This is how the data is representes in the hashes:

NX_A7E2V4                   {
    antibody         "yes",
    description      "Zinc finger SWIM domain-containing protein 8",
    disease          "no",
    existence        "protein level",
    has_3d           "no",
    isoforms         5,
    gene_name        "ZSWIM8",
    position         "10q22.2",
    proteomics       "yes",
    ptms             6,
    start_position   75545340,
    stop_position    75561551,
    variants         67
}

Loading the Chromosome table.

Loas all the information from tha table.

my %data = $np->get_chromosome(-chromosome => 10);

Accessing Protein information:

say $data{NX_A7E2V4}->{isoforms};
say $data{NX_A7E2V4}->{proteomics};
say $data{NX_A7E2V4}->{description};

Counting the number of Proteins in the Chromosome

$sum = (keys %data);
say $sum;		 

Retrieve all Gene Names from a giving Chromosome

for my $prot (keys %data) {
	say $prot;
}

Mapping

The API also provides a series of mapping files between neXtProt and different databases. The following databases are available:

* ensg (Ensembl gene) * ensp (Ensembl protein) * ensp_unmapped (Ensembl unmapped) * enst (Ensembl transcript) * enst_unmapped (Ensembl transcript unmapped) * geneid (NCBI Gene ID) * hgnc (HUGO Gene Nomenclature) * IPI (International Protein Index) * mgi (Mouse Genome Index) * refseq (NCBI Refseq)

mapping

my %map = $np->mapping("refseq");

FEEDBACK

Mailing Lists

User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated.

bioperl-l@bioperl.org                  - General discussion
http://bioperl.org/wiki/Mailing_lists  - About the mailing lists

Support

Please direct usage questions or support issues to the author:

leprevost@cpan.org

rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible.

Reporting Bugs

Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the web:

https://github.com/bioperl/bioperl-live

AUTHOR - Felipe da Veiga Leprevost

Email leprevost@cpan.org