NAME
Bio::Draw::FeatureStack - BioPerl module to generate GD images of stacked gene models
SYNOPSIS
use Bio::DB::SeqFeature::Store;
use Bio::Draw::FeatureStack;
# load GFF3-compliant features from GFF file
# features could be obtained from/with any other source/methods as well...
#---
my @features;
my $store = Bio::DB::SeqFeature::Store->new
(
-adaptor => 'memory',
-dsn => 'my_gff_file.gff3'
);
push(@features, $store->features(-name => 'gene1', -aliases => 1));
push(@features, $store->features(-name => 'gene2', -aliases => 1));
# create FeatureStack, passing features as array-ref
#---
my $feature_stack = new Bio::Draw::FeatureStack
(
-features => \@features, # array-ref of features to be rendered
-glyph => 'gene', # features will be rendered using this BioPerl glyph
-flip_minus => 1, # flip features on reverse strand (default is on)
-ignore_utr => 1, # do not show UTRs (default is off)
-panel_params => { # Bio::Graphics::Panel parameters
-width => 1024,
-pad_left => 80,
-pad_right => 20,
-grid => 1
},
-glyph_params => { # glyph-specific parameters (Bio::Graphics::Glyph::gene in this case)
-utr_color => 'white',
-label_position => 'left',
-label_transcripts => 1,
-description => 1
}
);
# output SVG, including HTML image map
#---
(my $svg, $map) = $feature_stack->svg(-image_map => 1);
# output PNG
#---
my $png = $feature_stack->png;
DESCRIPTION
FeatureStack creates GD images of vertically stacked gene models to facilitate visual comparison of gene structures. Compared genes can be clusters of orthologous genes, gene family members, or any other genes of interest. FeatureStack takes an array of BioPerl feature objects as input, projects them onto a common coordinate space, flips features from the negative strand (optional), left-aligns them by start coordinates (optional), sets a fixed intron size (optional), removes unwanted transcripts (optional), and then draws the so transformed features with a user-specified glyph. Internally, this transformation is achieved by cloning all input features into Bio::Graphics::Feature objects before the features get rendered by the specified glyph. Output images can be generated in SVG (scalable vectorized image) or PNG (rastered image) format.
FeatureStack was designed with the goal to retain maximum control of the rendering process. As such, the user can not only control how FeatureStack behaves using the FeatureStack parameters described below, but also can provide both panel- and glyph-specific parameters to fine-control all aspects of the rendered image.
Albeit FeatureStack can be used in combination with any glyph, it is particularly useful when used in combination with the Bio::Graphics::Glyph::decorated_gene glyph. This glyph is currently not distributed with BioPerl, but should install together with FeatureStack. Bio::Graphics::Glyph::decorated_gene can also be used and obtained independent from FeatureStack via CPAN. The decorated_gene glyph allows to highlight protein motifs such as signal peptides, transmembrane domains, or protein domains on top of gene models, which greatly faclitates the comparison of gene structures. Please refer to the documentation of Bio::Graphics::Glyph::decorated_gene for more details. If protein decorations are associated with gene features in the input data, FeatureStack can also automatically align gene models by a user-defined decoration type, such that for example gene models are aligned by a particularly well conserved protein motif.
FeatureStack requires GFF3-complient features. That is, features provided to FeatureStack need to have either a two-tier 'mRNA'->'CDS' or three-tier 'gene'->'mRNA'->'CDS' level structure. Here is an example gene structure in GFF3 format compatible with FeatureStack:
MAL10 test gene 1596486 1597604 . + . ID=PF10_0392;Name=PF10_0392
MAL10 test mRNA 1596486 1597604 . + . ID=rna_PF10_0392-1;Name=PF10_0392-1;Parent=PF10_0392
MAL10 test CDS 1596486 1596554 . + . ID=cds_PF10_0392-1;Parent=rna_PF10_0392-1
MAL10 test CDS 1596747 1597604 . + . ID=cds_PF10_0392-2;Parent=rna_PF10_0392-1
FeatureStack can display multiple transcripts (isoforms) per gene if the specified glyph supports this as well (for example the 'gene' or the 'decorated_gene' glyph).
In addition to drawing a set of gene models on top of each other, FeatureStack can intermingle gene models with alternative tracks that display additional features associated with these genes. This can be used for example to display regulatory elements or sequence variants (SNPs, indels) alongside gene model. There is currently no limitation of how these alternative features are displayed, and any BioPerl glyph can be used for this purpose. In the input data, alternative features must be specified one level below the gene or transcript feature that is passed to FeatureStack. Here is an example GFF that shows how a regulatory motif (associated with the gene) and a SNP (associated with a transcript) can be specified:
CHR_I test gene 5100769 5101677 . + . ID=Gene:Y110A7A.20;Name=ift-20
CHR_I test promoter 5100709 5100722 . + . ID=Promoter:Y110A7A.20;Note=GTCTCTATAGCAAC;Parent=Gene:Y110A7A.20
CHR_I test mRNA 5100769 5101677 . + . ID=Transcript:Y110A7A.20;Parent=Gene:Y110A7A.20
CHR_I test SNP 5100888 5100888 . + . ID=SNP123456;Parent=Transcript:Y110A7A.20;Note=C>T
CHR_I test CDS 5100769 5101423 . + . ID=CDS:Y110A7A.20:1;Parent=Transcript:Y110A7A.20
CHR_I test CDS 5101468 5101677 . + . ID=CDS:Y110A7A.20:2;Parent=Transcript:Y110A7A.20
OPTIONS
Option Description Default
------ ----------- -------
-features none
Array reference (mandatory). BioPerl features to be
displayed. Currently, features can be either of type
'mRNA' or 'gene'.
-glyph 'generic'
String (optional). Name of glyph to be used to render
features. The glyph specified here should be suitable
for rendering the provided features (e.g., use
'processed_transcript' glyph for features of type 'mRNA'
and 'gene' glyph for features of type 'gene'). The
'decorated_gene' or 'decorated_transcript' glyph
can also be used for highlighting protein features on
top of gene models (see description above).
If no glyph is specified, the 'generic' glyph will
be used.
-glyph_params none
Hash reference (optional). Glyph-specific parameters.
Will be passed unmodified to the glyph. Parameters
can include callback functions for fine-grained control
of the rendering process. Please refer to the
documentation of the glyph for a description of which
glyph parameters are available.
-panel_params none
Hash reference (optional). Panel parameters. Will be
passed unmodified to the L<Bio::Graphics::Panel> instance
that is internally created by FeatureStack.
Typical parameters here include -width, -pad_left,
-pad_right, or -grid (see L<Bio::Graphics::Panel> for
more information).
-ignore_utr false
Boolean (optional). If true, gene models will be drawn
without untranslated regions (UTRs).
-flip_minus true
Boolean (optional). By default, features on the negative
(reverse) strand are drawn flipped, such that the
5' end of features is always on the left side. This
behaviour can be turned off by setting this parameter to
0 (false).
-intron_size undef
Integer (optional). Intron size in base-pairs. If specified,
introns of gene models will be transformed to have
this specified size. This is useful when comparing gene
models of vastly different sizes due to very large
introns (for example, when comparing protist genes with human
genes). By default, gene models are drawn to scale with
original intron sizes. This parameter does not affect
the length of exons, which are always drawn to scale.
-feature_offsets undef
Hash reference or string (optional). This parameter allows
you to control the horizontal alignment of features. By
default, all features are left-aligned by their start
coordinate.
If a hash reference is specified here, it is assumed that
keys correspond to feature IDs and values to offsets in bp.
This way the alignment of individual features can be
manually fine-controlled.
If 'start_codon' is specified, features will be aligned
by their smallest CDS coordinate, assuming that this
will be the translation start site.
Any other value here will be interpreted as the name of
a protein decoration. In this case, FeatureStack will
attempt to use L<Bio::Graphics::Glyph::decorated_transcript>
to map this protein decoration to nucleotide space and
will then left-align the feature by this mapped
coordinate. This way, features can for example be
automatically aligned by their most conserved protein
domain. If no protein decoration with this name is found
for a feature, then this feature will not be aligned.
Please refer to the documentation of the
decorated_transcript glyph to see how protein decorations
can be specified for transcripts.
-transcripts_to_skip none
Array reference (optional). Contains transcript IDs not to
be included in the output image. This parameter can be used
if a gene feature passed to FeatureStack has multiple
isoforms but only a subset of these isoforms should appear
in the output.
-alt_feature_type none
String (optional). Type and source of alternative features
(e.g., 'SNP:mpileup') to be outputted alongside gene models.
FeatureStack looks for features of this type/source one
level below the specified gene/transcript feature. If found,
alternative features are drawn in a separate track above
the gene track. The appearance of alternative features
can be controlled using the -alt_glyph and -alt_glyph_params
parameters.
FeatureStack will automatically compute the distance of
alternative features (in bp) to the associated main features's
start coordinate and adds this distance as a feature tag
(tag name 'start_dist'). This tag can later be read
by the glyph that displays alternative features.
This can e.g. be useful for labeling regulatory features
with their distance from the transcription start site
(UTRs visible) or from the translation start site
(UTRs ignored).
-alt_glyph none
String (optional). Name of glyph to be used to draw
alternative features specified with -alt_feature_type.
-alt_glyph_params none
Hash reference (optional). Glyph-specific parameters for
glyph specified with -alt_glyph. Parameters will be passed
unmodified to the glyph. Parameters can include callback
functions for fine-grained control of the rendering process.
-ruler true
Boolean (optional). If true, a ruler indicating distances
in base-pairs will be drawn on top of the image. The ruler
will automatically adjust to feature offsets; that is,
the origin of the ruler will be placed at the
point where features are align, showing negative
coordinates left of this point and positive coordinates
right of this point.
-span [auto]
Integer (optional). Span of the output image in bp. By
default, the span is the length of the longest feature.
If one wants to generate an image that shows only the
5' portion of features (for example to visualize only
the first exon of genes and their associated promoters),
one can set a smaller, fixed value here, effectively
clipping the right part of the image at this coordinate.
-separator false
Boolean (optional). If true, draw horizontal line between
gene models. This might be useful if alternative tracks
are visible to know which alternative track belongs to
which gene model track.
EXPORT
None by default.
BUGS
Please report all errors.
SEE ALSO
Bio::Graphics::Panel, Bio::Graphics::Glyph, Bio::Graphics::Glyph::gene, Bio::Graphics::Glyph::processed_transcript, Bio::Graphics::Glyph::decorated_gene, Bio::Graphics::Glyph::decorated_transcript, Bio::DB::SeqFeature::Store
It is recommended to study test cases shipped with this module to get additional information of how to use this module.
AUTHOR
Christian Frech <frech.christian@gmail.com>
COPYRIGHT AND LICENSE
Copyright (C) 2012 by Christian Frech
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.