NAME

Bio::EBI::RNAseqAPI - A Perl interface to the EMBL-EBI RNA-seq analysis API.

DESCRIPTION

This module provides a Perl-based interface to the EMBL-EBI RNA-seq analysis API.

Functions are provided to access each endpoint provided by the API. The functions return the analysis information about each run found, based on the arguments passed to them.

Each function takes arguments in the form of a hash reference. These usually consist of one or more study or run accessions, plus a value for "minimum_mapped_reads". This value represents the minimum percentage of mapped reads to allow for each run in the results. Only information for runs with a percentage of mapped reads greater than or equal to this value will be returned. To get all available information, set "minimum_mapped_reads" to zero.

Analysis information for each run is returned in a hash reference. Some functions return array references with one hash reference per run found. See below for examples and more information about the results.

For more information about the API, see its documentation.

SYNOPSIS

use 5.10.0;
use Bio::EBI::RNAseqAPI;

my $rnaseqAPI = Bio::EBI::RNAseqAPI->new;

my $runInfo = $rnaseqAPI->get_runs_by_study(
   study => "E-MTAB-513" 
   minimum_mapped_reads => 0
);

METHODS

Analysis results per sequencing run

get_run

Accesses the API's "getRun" JSON endpoint and returns analysis information for a single run, passed in the arguments.

Arguments should be passed as a hash containing values for "run" and "minimum_mapped_reads", e.g.:

my $runInfo = $rnaseqAPI->get_run(
   run => "ERR030885",
   minimum_mapped_reads => 0
);

Run analysis information is returned in a hash reference. Returns undef (and logs errors) if errors are encountered.

An example of the hash returned is as follows:

{
   STUDY_ID => "SRP001371",
   SAMPLE_IDS => "SAMN00113441",
   BIOREP_ID => "SRR453391",
   RUN_IDS => "SRR453391",
   ORGANISM => "homo_sapiens",
   REFERENCE_ORGANISM => "homo_sapiens",
   STATUS => "Complete",
   ASSEMBLY_USED => "GRCh38",
   ENA_LAST_UPDATED => "Fri Jun 19 2015 20:33:07",
   LAST_PROCESSED_DATE => "Wed Jul 15 2015 08:35:58",
   CRAM_LOCATION => "ftp://ftp.ebi.ac.uk/pub/databases/arrayexpress/data/atlas/rnaseq/SRR453/SRR453391/SRR453391.cram",
   BEDGRAPH_LOCATION => "ftp://ftp.ebi.ac.uk/pub/databases/arrayexpress/data/atlas/rnaseq/SRR453/SRR453391/SRR453391.bedgraph",
   BIGWIG_LOCATION => "ftp://ftp.ebi.ac.uk/pub/databases/arrayexpress/data/atlas/rnaseq/SRR453/SRR453391/SRR453391.bw",
   MAPPING_QUALITY => 97
}
get_runs_by_list

This function takes an array reference of run accessions and sequentially accesses the API's "getRun" JSON endpoint to collect the analysis information for each run in the list provided.

my $runInfo = $rnaseqAPI->get_runs_by_list(
   runs => [ "ERR030885", "ERR030886" ],
   minimum_mapped_reads => 0
);

Run analysis information is returned as an array reference containing one hash reference per run (see get_run documentation for an example of what the hash reference looks like). Returns undef (and logs errors) if errors are encountered.

get_runs_by_study

Accesses the API's getRunsByStudy JSON endpoint, and returns an array reference containing a hash reference for each run found (see get_run docs for an example).

my $runInfo = $rnaseqAPI->get_runs_by_study(
   study => "E-MTAB-513",
   minimum_mapped_reads => 0
);

Study accession can be either an ArrayExpress experiment accession, or an ENA/SRA/DDBJ study accession. The example above uses an ArrayExpress experiment accession. Examples of ENA/SRA/DDBJ accessions are "ERP000546" or "SRP013533" or "DRP000391".

Returns undef (and logs errors) if errors are encountered.

get_runs_by_organism

Accesses the API's getRunsByOrganism JSON endpoint, and returns an array reference containing a hash reference for each run found.

my $runInfo = $rnaseqAPI->get_runs_by_organism(
   organism => "homo_sapiens",
   minimum_mapped_reads => 70
);

Value for "organism" attribute is a species scientific name, in lower case, with underscores instead of spaces. E.g. "homo_sapiens", "canis_lupus_familiaris", "oryza_sativa_japonica_group". To ensure your organism name is allowed, check against the organism_list attribute:

my $organism = "oryctolagus_cuniculus";
my $organismList = $rnaseqAPI->get_organism_list;
if( $organismList->{ $organism } ) {
    say "Found $organism!";
}

Results are returned as an array reference containing a hash reference for each run found. Returns undef (and logs errors) if errors are encountered.

get_runs_by_organism_condition

Accesses the API's getRunsByOrganismCondition JSON endpoint, and returns an array reference containing a hash reference for each run found. An organism name and a "condition" -- meaning a sample attribute -- are passed in the arguments. The condition must exist in EFO (http://www.ebi.ac.uk/efo); this can be checked via the EFO website or via the Ontology Lookup Service (OLS) API: http://www.ebi.ac.uk/ols/docs/api

my $runInfo = $rnaseqAPI->get_runs_by_organism_condition(
   organism => "homo_sapiens",
   condition => "central nervous system",
   minimum_mapped_reads => 70
);

See get_runs_by_organism docs for how to check organism name format and availability.

Returns undef (and logs errors) if errors are encountered.

Analysis results per study

get_study

Accesses the API's getStudy JSON endpoint. Single argument is a study accession (ENA, SRA, DDBJ, or ArrayExpress). Returns a hash reference containing the results for the matching study. Returns undef (and logs errors) if errors are encountered.

my $studyInfo = $rnaseqAPI->get_study( "SRP033494" );

An example of the hash reference returned is as follows:

{
   STUDY_ID => "SRP033494",
   ORGANISM => "arabidopsis_thaliana",
   REFERENCE_ORGANISM => "arabidopsis_thaliana",
   ASSEMBLY_USED => "TAIR10",
   GTF_USED => "Arabidopsis_thaliana.TAIR10.26.gtf.gz",
   STATUS => "Complete",
   GENES_FPKM_COUNTS_FTP_LOCATION => "ftp://ftp.ebi.ac.uk/pub/databases/arrayexpress/data/atlas/rnaseq/studies/ena/SRP033494/genes.rpkm.tsv",
   GENES_TPM_COUNTS_FTP_LOCATION => "ftp://ftp.ebi.ac.uk/pub/databases/arrayexpress/data/atlas/rnaseq/studies/ena/SRP033494/genes.tpm.tsv",
   GENES_RAW_COUNTS_FTP_LOCATION => "ftp://ftp.ebi.ac.uk/pub/databases/arrayexpress/data/atlas/rnaseq/studies/ena/SRP033494/genes.raw.tsv",
   EXONS_FPKM_COUNTS_FTP_LOCATION => "ftp://ftp.ebi.ac.uk/pub/databases/arrayexpress/data/atlas/rnaseq/studies/ena/SRP033494/exons.rpkm.tsv",
   EXONS_TPM_COUNTS_FTP_LOCATION => "ftp://ftp.ebi.ac.uk/pub/databases/arrayexpress/data/atlas/rnaseq/studies/ena/SRP033494/exons.tpm.tsv",
   EXONS_RAW_COUNTS_FTP_LOCATION => "ftp://ftp.ebi.ac.uk/pub/databases/arrayexpress/data/atlas/rnaseq/studies/ena/SRP033494/exons.raw.tsv",
   SOFTWARE_VERSIONS_FTP_LOCATION => "ftp://ftp.ebi.ac.uk/pub/databases/arrayexpress/data/atlas/rnaseq/studies/ena/SRP033494/irap.versions.tsv",
   LAST_PROCESSED_DATE => "Thu Jun 30 2016 19:55:56"
} 
get_studies_by_organism

Accesses the API's getStudiesByOrganism JSON endpoint. Single argument is the name of an organism (see the organism_list attribute for allowed names). Returns an array reference containing one hash reference per study found. See get_study docs for an example of a hash reference.

my $studies = $rnaseqAPI->get_studies_by_organism( "arabidopsis_thaliana" );

Sample attributes per run

get_sample_attributes_by_run

Accesses the API's getSampleAttributesByRun JSON endpoint. Single argument is the accession of the run. Returns an array reference containing one hash reference per sample attribute found.

my $sampleAttributes = $rnaseqAPI->get_sample_attributes_by_run( "SRR805786" );

An example of the results returned is as follows:

[
   {
       STUDY_ID => "SRP020492",
       RUN_ID => "SRR805786",
       TYPE => "cell type",
       VALUE => "peripheral blood mononuclear cells (PBMCs)",
       EFO_URL => "NA"
   },
   {
       STUDY_ID => "SRP020492",
       RUN_ID => "SRR805786",
       TYPE => "organism",
       VALUE => "Homo sapiens",
       EFO_URL => "http://purl.obolibrary.org/obo/NCBITaxon_9606"
   }
]

EFO_URL is not always present and will be "NA" if it is not.

Returns undef (and logs errors) if errors are encountered.

get_sample_attributes_per_run_by_study

Accesses the API's getSampleAttributesPerRunByStudy JSON endpoint. Single argument is a study accession. Returns an array ref containing one hash reference per sample attribute. See get_sample_attributes_by_run docs for an example. Returns undef (and logs errors) if errors are encountered.

my $sampleAttributes = $rnaseqAPI->get_sample_attributes_per_run_by_study( "DRP000391" );
get_sample_attributes_coverage_by_study

Accesses the API's getSampleAttributesCoverageByStudy endpoint. Single argument is a study accession. Returns an array reference containing one hash reference per sample attribute. Returns undef (and logs errors) if errors are encountered.

my $sampleAttributeCoverage = $rnaseqAPI->get_sample_attributes_coverage_by_study( "DRP000391" );

An example of the results is as follows:

[
   {
       'VALUE' => 'Nipponbare',
       'STUDY_ID' => 'DRP000391',
       'PCT_OF_ALL_RUNS' => 100,
       'NUM_OF_RUNS' => 28,
       'TYPE' => 'cultivar'
   },
   {
       'VALUE' => 'Oryza sativa Japonica Group',
       'STUDY_ID' => 'DRP000391',
       'PCT_OF_ALL_RUNS' => 100,
       'NUM_OF_RUNS' => 28,
       'TYPE' => 'organism'
   },
   {
       'VALUE' => '7 days after germination',
       'STUDY_ID' => 'DRP000391',
       'PCT_OF_ALL_RUNS' => 29,
       'NUM_OF_RUNS' => 8,
       'TYPE' => 'developmental stage'
   }
]

AUTHOR

Maria Keays <maria.keays@gmail.com>

The above email should be used for feedback about the Perl module only. All mail regarding the RNA-seq analysis API itself should be directed to <rnaseq@ebi.ac.uk>.