NAME
Bio::GMOD::StandardURLs - Discover and fetch Standard URLs from MODs
SYNPOSIS
my $mod = Bio::GMOD::StandardURLS->new(-mod => 'WormBase');
my @species = $mod->available_species;
This module provides a programmatic interface to the common URLs provided by Model Organism Databases. These URLs simplify the retrieval of common datasets from using standard URLs. The full specification is described at the end of this document.
PUBLIC METHODS
- $mod->available_species();
-
Fetch a list of available species available by the Standard URL mechanism at the current MOD. Called in array context, returns a list of species in the form "G_species" (e.g. C_elegans). These abbreviated binomial names conform to the specification for subsequent requests. Called in scalar context, this method returns the number of species available. If passed the optional "-expanded" parameters, this method returns a hash reference of full binomial names pointing to their abbreviated name.
This method is a programmatic equivalent to accessing the standard URL:
http://your.site/genome
- $mod->releases(-species=>'Caenorhabditis elegans',-status=>'available');
-
Fetch all of the available releases for a provided species. Called in array context, releases() returns an array of all available releases for the given species. Species can be either the full binomial name (e.g. "Caenorhabditis elegans") or the abbreviated short form (e.g. "C_elegans").
Provided with the optional '--expanded' method, this method returns an array of arrays containing the version, date released, and availability of the release. The optional '-status' parameter filters the returned releases. Options are 'available' to return only those that are currently available, 'unavailable' to return those no longer available. If not supplied, all known releases will be returned.
This method is a programmatic equivalent to accessing the standard URL:
http://your.site/genome/Binomial_name
- $mod->data_sets(-species=>$species,-release=>$release);
-
Fetch all of the available urls for a given species and data release. If release is not provided, defaults to the current release (or you may explictly request "current". Returns a hash reference where the keys are symbolic names of datasets and values are URLs to the dataset.
This method is a programmatic equivalent to accessing the standard URL:
http://your.site/genome/Binomial_name/release_name
- $mod->fetch(@options);
-
Fetch the specified dataset. Note: this could be a very large file! Available options.
Options: -url The full URL to the dataset OR specify a dataset with species, release, and dataset: -species The binomial name or abbreviated form of the species -release The version to fetch -dataset The symbolic name of the dataset (dna, mrna, etc)
This method is a programmatic equivalent to accessing the standard URL:
http://your.site/genome/Binomial_name/release_name/[dataset]
- $mod->supported_datasets();
-
Fetch a list of symbolic names of supported datasets. This typically will be a list like "dna", "mrna", "ncrna", "protein", and "feature".
Standard URL Specification
PHASE I
Substitutions:
your.site Host address, e.g. www.wormbase.org
Binomial_name NCBI Taxonomy scientific name, e.g.
Caenorhabditis_elegans
release_name Data release, in whatever is the local
format (e.g. release date, release number)
- http://your.site/genome/
-
Leads to index page for species. This should be an HTML-format page that contains links to each of the species whose genomes are available for download.
- http://your.site/genome/Binomial_name/
-
Leads to index for releases for species Binomial_name. This will be an HTML-format page containing links to each of the genome releases.
- http://your.site/genome/Binomial_name/release_name/
-
Leads to index for the named release. It should be an HTML-format page containing links to each of the data sets described below.
- http://your.site/genome/Binomial_name/current/
-
Leads to the index for the most recent release, symbolic link style.
- http://your.site/genome/Binomial_name/current/dna
-
Returns a FASTA file containing big DNA fragments (e.g. chromosomes). MIME type is application/x-fasta.
- http://your.site/genome/Binomial_name/current/mrna
-
Returns a FASTA file containing spliced mRNA transcript sequences. MIME type is application/x-fasta.
- http://your.site/genome/Binomial_name/current/ncrna
-
Returns a FASTA file containing non-coding RNA sequences. MIME type is application/x-fasta.
- http://your.site/genome/Binomial_name/current/protein
-
Returns a FASTA file containing all the protein sequences known to be encoded by the genome. MIME type is application/x-fasta
- http://your.site/genome/Binomial_name/current/feature
-
Returns a GFF3 file describing genome annotations. MIME type is application/x-gff3.
PHASE II
In the phase 2 URL scheme, we'll be able to attach ?format=XXXX to each of the URLs:
- http://your.site/genome/?format=HTML
-
Same as default for phase I.
- http://your.site/genome/?format=RSS
-
Return RSS feed indicating what species are available.
- http://your.site/genome/Binomial_name/?format=RSS
-
Return RSS feed indicating what releases are available.
- http://your.site/genome/Binomial_name/release_name/?format=RSS
-
Return RSS feed indicating what data sets are available.
- http://your.site/genome/Binomial_name/current/protein?format=XXX
-
Alternative formats for sequence data. E.g. XXX could be FASTA, RAW, or whatever (for further discussion).
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 340:
'=item' outside of any '=over'
=over without closing =back