NAME
Bio::CUA::CUB::Calculator -- A module to calculate codon usage bias (CUB) indice for protein-coding sequences
SYNOPSIS
use Bio::CUA::CUB::Calculator;
my $calc = Bio::CUA::CUB::Calculator->new(
-codon_table => 1,
-tAI_values => 'tai.out' # from Bio::CUA::CUB::Builder
);
# calculate tAI for each sequence
my $io = Bio::CUA::SeqIO->new(-file => "seqs.fa");
or
my $io = Bio::CUA::SeqIO->new(-file => "seqs.fa", -format => 'fasta');
while(my $seq = $io->next_seq)
{
my $tai = $calc->tai($seq);
printf("%10s: %.7f\n", $seq->id, $tai);
}
DESCRIPTION
Codon usage bias (CUB) can be represented at two levels, codon and sequence. The latter is often computed as the geometric means of the sequence's codons. This module caculates CUB metrics at sequence level.
Supported CUB metrics include CAI (codon adaptation index), tAI (tRNA adaptation index), Fop (Frequency of optimal codons), ENC (Effective Number of Codons) and their variants. See the methods below for details.
METHODS
new
Title : new
Usage : my $calc=Bio::CUA::CUB::Calculator->new(@args);
Function: initialize the calculator
Returns : an object of this class
Args : a hash with following acceptable keys:
B<Mandatory options>:
-codon_table
-
the genetic code table applied for following sequence analyses. It can be specified by an integer (genetic code table id), an object of L<Bio::CUA::CodonTable>, or a map-file. See the method L<Bio::CUA::Summarizer/new> for details.
B<options needed by FOP method>
-optimal_codons
-
a file contains all the optimal codons, one codon per line. Or a hashref with keys being the optimal codons
B<options needed by CAI method>
-CAI_values
-
a file containing CAI values for each codon, excluding 3 stop codons, so 61 lines with each line containing a codon and its value separated by space or tab. or a hashref with each key being a codon and each value being CAI index for the codon.
B<options needed by tAI method>
B<options needed by ENC method>
-base_background
-
optional. an arrayref containing base frequency of 4 bases (in the order A,T,C, and G) derived from background data such as introns. Or one of the following values: 'seq', 'seq3', which will lead to estimating base frequencies from each analyzed sequence in whol or its 3rd codon position, respectively. It can also be specified for each analyzed sequence with the methods L</encp> and L</encp_r>
sequence input
all the following methods accept one of the following formats as sequence input
-
string of nucleotide sequence with length of 3N,
-
sequence object which has a method I<seq> to get the sequence string,
-
a sequence file in fasta format
-
reference to a codon count hash, like $codons = { AGC => 50, GTC => 124, ... ... }.
cai
Title : cai
Usage : $caiValue = $self->cai($seq);
Function: calculate the CAI value for the sequence
Returns : a number, or undef if failed
Args : see L</"sequence input">
Note: codons without synonymous competitors are excluded in
calculation.
fop
Title : fop
Usage : $fopValue = $self->fop($seq[,$withNonDegenerate]);
Function: calculate the fraction of optimal codons in the sequence
Returns : a number, or undef if failed
Args : for sequence see L</"sequence input">.
if optional argument '$withNonDegenerate' is true, then
non-degenerate codons (those do not have synonymous partners) are
included in calculation. Default is excluding these codons.
tai
Title : tai
Usage : $taiValue = $self->tai($seq);
Function: calculate the tAI value for the sequence
Returns : a number, or undef if failed
Args : for sequence see L</"sequence input">.
Note: codons which do not have tAI values are ignored from input
sequence
enc
Title : enc
Usage : $encValue = $self->enc($seq,[$minTotal]);
Function: calculate ENC for the sequence using the original method
I<Wright, 1990, Gene>
Returns : a number, or undef if failed
Args : for sequence see L</"sequence input">.
Optional argument I<minTotal> specifies minimal count
for an amino acid; if observed count is smaller than this count, this
amino acid's F will not be calculated but inferred. Deafult is 5.
Note: when the F of a redundancy group is unavailable due to lack of
sufficient data, it will be estimated from other groups following Wright's
method, that is, F3=(F2+F4)/2, and for others, F=1/r where r is the
degeneracy degree of that group.
enc_r
Title : enc_r
Usage : $encValue = $self->enc_r($seq,[$minTotal]);
Function: similar to the method L</enc>, except that missing F values
are estimated in a different way.
Returns : a number, or undef if failed
Args : for sequence see L</"sequence input">.
Optional argument I<minTotal> specifies minimal count
for an amino acid; if observed count is smaller than this count, this
amino acid's F will not be calculated but inferred. Deafult is 5.
Note: for missing Fx of degeneracy class 'x', we first estimated the
ratio (1/Fx-1)/(x-1) by averaging the ratios of other degeneracy
classes with known F values. Then Fx is obtained by solving the simple
equation.
encp
Title : encp
Usage : $encpValue = $self->encp($seq,[$minTotal,[$A,$T,$C,$G]]);
Function: calculate ENC for the sequence using the updated method
by Novembre I<2002, MBE>, which corrects the background nucleotide
composition.
Returns : a number, or undef if failed
Args : for sequence see L</"sequence input">.
Optional argument I<minTotal> specifies minimal count
for an amino acid; if observed count is smaller than this count, this
amino acid's F will not be calculated but inferred. Deafult is 5.
another optional argument gives the background nucleotide composition
in the order of A,T,C,G in an array, if not provided, it will use the
default one provided when calling the method L</new>. If stil
unavailable, error occurs.
encp_r
Title : encp_r
Usage : $encpValue =
$self->encp_r($seq,[$minTotal,[$A,$T,$C,$G]]);
Function: similar to the method L</encp>, except that missing F values
are estimated using a different way.
Returns : a number, or undef if failed
Args : for sequence see L</"sequence input">.
Optional argument I<minTotal> specifies minimal count
for an amino acid; if observed count is smaller than this count, this
amino acid's F will not be calculated but inferred. Deafult is 5.
another optional argument gives the background nucleotide composition
in the order of A,T,C,G in an array, if not provided, it will use the
default one provided when calling the method L</new>. If stil
unavailable, error occurs.
Note: for missing Fx of degeneracy class 'x', we first estimated the
ratio (1/Fx-1)/(x-1) by averaging the ratios of other degeneracy
classes with known F values. Then Fx is obtained by solving the simple
equation.
estimate_base_composition
Title : estimate_base_composition
Usage : @baseComp = $self->estimate_base_composition($seq,[$pos])
Function: estimate base compositions in the sequence
Returns : an array of numbers in the order of A,T,C,G, or its
reference if in the scalar context
Args : a sequence string or a reference of hash containing codons
and their counts (eg., AGG => 30), and optionally an integer; the integer
specifies which codon position's nucleotide will be used instead of
all three codon positions.
gc_fraction
Title : gc_fraction
Usage : $frac = $self->gc_fraction($seq,[$pos])
Function: get fraction of GC content in the sequence
Returns : a floating number between 0 and 1.
Args : a sequence string or a reference of hash containing codons
and their counts (eg., AGG => 30), and optionally an integer; the integer
specifies which codon position's nucleotide will be used for
calculation (i.e., 1, 2, or 3), instead of all three positions.
expect_codon_freq
Title : expect_codon_freq
Usage : $codonFreq = $self->expect_codon_freq($base_composition)
Function: return the expected frequency of codons
Returns : reference to a hash in which codon is hash key, and
fraction is hash value
Args : reference to an array of base compositions in the order of
[A, T, C, G], represented as either counts or fractions
AUTHOR
Zhenguo Zhang, <zhangz.sci at gmail.com>
BUGS
Please report any bugs or feature requests to bug-bio-cua at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Bio-CUA. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Bio::CUA::CUB::Calculator
You can also look for information at:
RT: CPAN's request tracker (report bugs here)
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
ACKNOWLEDGEMENTS
LICENSE AND COPYRIGHT
Copyright 2015 Zhenguo Zhang.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.