NAME
Bio::ViennaNGS::Expression - An object oriented interface for read-count based gene expression
SYNOPSIS
use Bio::ViennaNGS::Expression;
my $expression = Bio::ViennaNGS::Expression->new();
# parse read counts from an extended BED12 file
$expression->>parse_readcounts_bed12("$bed12");
# compute normalized expression of ith sample in Transcript per Million (TPM)
$expression->computeTPM($i, $readlength);
# write extended BED12 file with TPM for each condition past
# the 12th column
$expression->>write_expression_bed12("TPM", $dest, $basename);
DESCRIPTION
This module provides a Moose interface for computation of gene / transcript expression from read counts.
METHODS
- parse_readcounts_bed12
-
Title : parse_readcounts_bed12
Usage : $obj->parse_readcounts_bed12($file)
Function : Parses a bedtools multicov (multiBamCov) file, i.e. an extended BED12 file, into an Array of Hash of Hashes data structure (
@{$self-
data}>).Args :
$file
is the input file, i.e. and extended BED12 file where each column past the 12th lists read counts for this bedline's feature(s) for a specific sample/condition.Returns :
Notes : This method evaluates the number of samples/conditions present in the input, i.e. the number of columns extending the canonical BED12 columns in the input multicov file and populates
$self-
conds>. Also populates$self-
nr_features> with the number of genes/features present in the input (evidently, this should be the same for each sample/condition in the input). - computeTPM
-
Title : computeTPM
Usage : $obj->computeTPM($sample, $readlength)
Function : Computes expression of each gene/feature present in
$self-
data> in Transcript per Million (TPM) [Wagner et.al. Theory Biosci. (2012)]. is a reference to a Hash of Hashes data straucture where keys are feature names and values hold a hash that must at least contain length and raw read counts. Practically,$featCount_sample
is represented by _one_ element of@featCount
, which is populated from a multicov file byparse_multicov()
.Args : C<$sample> is the sample index of C<@{$self->data}>. This is especially handy if one is only interested in computing normalized expression values for a specific sample, rather than all samples in multicov BED12 file. C<$readlength> is the read length of the RNA-seq sequencing experiment.
Returns : Returns the mean TPM of the processed sample, which is invariant among samples. (TPM models relative molar concentration and thus fulfills the invariant average criterion.)
- write_expression_bed12
-
Title : write_expression_bed12
Usage : $obj->write_expression_bed12($measure, $dest, $basename)
Function : Writes normalized expression data to a bedtools multicov (multiBamCov)-type BED12 file.
Args :
$measure
specifies the type in which normalized expression data from@{$self-
data}> is dumped, i.e. TPM or RPKM. These values must have been computed and inserted into@{self-
data}> beforehand by e.g.$self-
computeTPM()>.$dest
and$base_name
give path and base name of the output file, respectively.Returns : None. The output is position-sorted extended BED12 file.
DEPENDENCIES
SEE ALSO
AUTHOR
Michael T. Wolfinger, <michael@wolfinger.eu>
COPYRIGHT AND LICENSE
Copyright (C) 2015 by Michael T. Wolfinger
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.16.3 or, at your option, any later version of Perl 5 you may have available.
This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.