NAME

Bio::CUA::CUB::Calculator -- A module to calculate codon usage bias (CUB) indice for protein-coding sequences

SYNOPSIS

use Bio::CUA::CUB::Calculator;

my $calc = Bio::CUA::CUB::Calculator->new(
           -codon_table => 1,
		   -tAI_values  => 'tai.out' # from Bio::CUA::CUB::Builder
		   );

# calculate tAI for each sequence
my $io = Bio::CUA::SeqIO->new(-file => "seqs.fa");
or
my $io = Bio::CUA::SeqIO->new(-file => "seqs.fa", -format => 'fasta');

while(my $seq = $io->next_seq)
{
	my $tai = $calc->tai($seq);
	printf("%10s: %.7f\n", $seq->id, $tai);
}

DESCRIPTION

Codon usage bias (CUB) can be represented at two levels, codon and sequence. The latter is often computed as the geometric means of the sequence's codons. This module caculates CUB metrics at sequence level.

Supported CUB metrics include CAI (codon adaptation index), tAI (tRNA adaptation index), Fop (Frequency of optimal codons), ENC (Effective Number of Codons) and their variants. See the methods below for details.

METHODS

new

Title   : new
Usage   : my $calc=Bio::CUA::CUB::Calculator->new(@args);
Function: initialize the calculator
Returns : an object of this class
Args    : a hash with following acceptable keys:

B<Mandatory options>:
-codon_table
the genetic code table applied for following sequence analyses. It
can be specified by an integer (genetic code table id), an object of
L<Bio::CUA::CodonTable>, or a map-file. See the method
L<Bio::CUA::Summarizer/new> for details.
B<options needed by FOP method>
-optimal_codons
a file contains all the optimal codons, one codon per line. Or a
hashref with keys being the optimal codons
B<options needed by CAI method>
-CAI_values
a file containing CAI values for each codon, excluding 3
stop codons, so 61 lines with each line containing a codon and its
value separated by space or tab.
or
a hashref with each key being a codon and each value being CAI index
for the codon.
B<options needed by tAI method>
-tAI_values
similar to C<-CAI_values>, a file or a hash containing tAI value 
for each codon.
B<options needed by ENC method>
-base_background
optional. 
an arrayref containing base frequency of 4 bases (in the order 
A,T,C, and G) derived from background data such as introns. 
Or one of the following values: 'seq', 'seq3', which will lead to
estimating base frequencies from each analyzed sequence in whol or
its 3rd codon position, respectively.

It can also be specified for each analyzed sequence with the methods
L</encp> and L</encp_r>

sequence input

all the following methods accept one of the following formats as sequence input

  1. string of nucleotide sequence with length of 3N, 
  2. sequence object which has a method I<seq> to get the sequence string,
  3. a sequence file in fasta format
  4.    reference to a codon count hash, like
       $codons = { 
    	   AGC => 50, 
           GTC => 124,
    	   ...    ...
    	   }.

cai

Title   : cai
Usage   : $caiValue = $self->cai($seq);
Function: calculate the CAI value for the sequence
Returns : a number, or undef if failed
Args    : see L</"sequence input">
Note: codons without synonymous competitors are excluded in
calculation.

fop

Title   : fop
Usage   : $fopValue = $self->fop($seq[,$withNonDegenerate]);
Function: calculate the fraction of optimal codons in the sequence
Returns : a number, or undef if failed
Args    : for sequence see L</"sequence input">.
if optional argument '$withNonDegenerate' is true, then
non-degenerate codons (those do not have synonymous partners) are
included in calculation. Default is excluding these codons.

tai

Title   : tai
Usage   : $taiValue = $self->tai($seq);
Function: calculate the tAI value for the sequence
Returns : a number, or undef if failed
Args    : for sequence see L</"sequence input">.

Note: codons which do not have tAI values are ignored from input
sequence

enc

Title   : enc
Usage   : $encValue = $self->enc($seq,[$minTotal]);
Function: calculate ENC for the sequence using the original method 
I<Wright, 1990, Gene>
Returns : a number, or undef if failed
Args    : for sequence see L</"sequence input">.
Optional argument I<minTotal> specifies minimal count 
for an amino acid; if observed count is smaller than this count, this
amino acid's F will not be calculated but inferred. Deafult is 5.

Note: when the F of a redundancy group is unavailable due to lack of
sufficient data, it will be estimated from other groups following Wright's
method, that is, F3=(F2+F4)/2, and for others, F=1/r where r is the
degeneracy degree of that group.

enc_r

Title   : enc_r
Usage   : $encValue = $self->enc_r($seq,[$minTotal]);
Function: similar to the method L</enc>, except that missing F values
are estimated in a different way.
Returns : a number, or undef if failed
Args    : for sequence see L</"sequence input">.
Optional argument I<minTotal> specifies minimal count 
for an amino acid; if observed count is smaller than this count, this
amino acid's F will not be calculated but inferred. Deafult is 5.

Note: for missing Fx of degeneracy class 'x', we first estimated the
ratio (1/Fx-1)/(x-1) by averaging the ratios of other degeneracy
classes with known F values. Then Fx is obtained by solving the simple
equation.

encp

Title   : encp
Usage   : $encpValue = $self->encp($seq,[$minTotal,[$A,$T,$C,$G]]);
Function: calculate ENC for the sequence using the updated method 
by Novembre I<2002, MBE>, which corrects the  background nucleotide 
composition.
Returns : a number, or undef if failed
Args    : for sequence see L</"sequence input">.

Optional argument I<minTotal> specifies minimal count 
for an amino acid; if observed count is smaller than this count, this
amino acid's F will not be calculated but inferred. Deafult is 5.

another optional argument gives the background nucleotide composition
in the order of A,T,C,G in an array, if not provided, it will use the
default one provided when calling the method L</new>. If stil
unavailable, error occurs.

encp_r

Title   : encp_r
Usage   : $encpValue =
$self->encp_r($seq,[$minTotal,[$A,$T,$C,$G]]);
Function: similar to the method L</encp>, except that missing F values
are estimated using a different way.
Returns : a number, or undef if failed
Args    : for sequence see L</"sequence input">.

Optional argument I<minTotal> specifies minimal count 
for an amino acid; if observed count is smaller than this count, this
amino acid's F will not be calculated but inferred. Deafult is 5.

another optional argument gives the background nucleotide composition
in the order of A,T,C,G in an array, if not provided, it will use the
default one provided when calling the method L</new>. If stil
unavailable, error occurs.

Note: for missing Fx of degeneracy class 'x', we first estimated the
ratio (1/Fx-1)/(x-1) by averaging the ratios of other degeneracy
classes with known F values. Then Fx is obtained by solving the simple
equation.

estimate_base_composition

Title   : estimate_base_composition
Usage   : @baseComp = $self->estimate_base_composition($seq,[$pos])
Function: estimate base compositions in the sequence
Returns : an array of numbers in the order of A,T,C,G, or its
reference if in the scalar context
Args    : a sequence string or a reference of hash containing codons
and their counts (eg., AGG => 30), and optionally an integer; the integer
specifies which codon position's nucleotide will be used instead of
all three codon positions.

gc_fraction

Title   : gc_fraction
Usage   : $frac = $self->gc_fraction($seq,[$pos])
Function: get fraction of GC content in the sequence
Returns : a floating number between 0 and 1.
Args    : a sequence string or a reference of hash containing codons
and their counts (eg., AGG => 30), and optionally an integer; the integer
specifies which codon position's nucleotide will be used for
calculation (i.e., 1, 2, or 3), instead of all three positions.

expect_codon_freq

Title   : expect_codon_freq
Usage   : $codonFreq = $self->expect_codon_freq($base_composition)
Function: return the expected frequency of codons
Returns : reference to a hash in which codon is hash key, and
fraction is hash value
Args    : reference to an array of base compositions in the order of
[A, T, C, G], represented as either counts or fractions

AUTHOR

Zhenguo Zhang, <zhangz.sci at gmail.com>

BUGS

Please report any bugs or feature requests to bug-bio-cua at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Bio-CUA. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Bio::CUA::CUB::Calculator

You can also look for information at:

ACKNOWLEDGEMENTS

LICENSE AND COPYRIGHT

Copyright 2015 Zhenguo Zhang.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.