NAME
calculate_CUB.pl - a program to calculate sequence codon usage bias metrics and other sequence parameters.
VERSION
VERSION: 0.11
SYNOPSIS
This program computes CUB indices for each sequence; the types of computed CUB indices depend on the provided options (see below).
In addition to CUB indices, the program also computes some other features such as counts of amino acids, GC-content of the whole sequence and the 3rd codon positions.
# compute ENC, ENC_r, CAI, and tAI for each sequence in file cds.fa
summarize_cds_stat.pl --cai CAI_param.top_200 --tai tAI_param \
--enc enc,enc_r --seq cds.fa -o CUB_indice.tsv
# the same as above but not output GC content, AA counts and protein
# lengths
summarize_cds_stat.pl --cai CAI_param.top_200 --tai tAI_param \
--enc enc,enc_r --seq cds.fa -o CUB_indice.tsv --lite
OPTIONS
Mandatory options
- -s/--seq-file
-
file containing sequences in fasta format, from which each sequence's CUB indices are computed.
Auxiliary options
- -g/--gc-id
-
ID of genetic code table used for identifying amino acid encoded by each codon. Default is 1, i.e., standard code. See NCBI Genetic Code for valid IDs.
- -t/--tai-param
-
file containing tAI value for each codon in the format 'codon<tab>tAI_value', which can be produced by build_tai_param.pl. If not given, tAI values would not be computed.
- -c/--cai-param
-
similar to --tai-param, except that CAI values are provided in the same format. This file may be produced by build_cai_param.pl. If not given, CAI values would not be computed.
- -f/--fop-param
-
a file containing pre-defined optimal codons, one codon per line. Optimal codons can be selected using different ways, such as selecting high-tAI codons or those preferred in highly expressed genes.
- -e/--enc-methods
-
methods for ENC calculations. Available values are enc, enc_r, encp, and encp_r. encp* versions corrects background GC-content in calculations. *_r versions uses a new method to estimate missing F values. Check module Bio::CUA::CUB::Calculator to see details of these methods. Default is enc. Multiple methods can be specified as comma-separated string such as 'enc,encp,enc_r'.
- -b/--base-comp
-
This option is needed when computing encp* versions of ENC. The base compositions are used as background base compositions to calculate expected codon frequency in the sequences. It may be helpful for one to exclude the effect of mutational bias on codon usage. This option has no effect unless encp* version methods are specified in --enc-methods.
Acceptable values are either a file or four numbers separated by comma. When provided is a file, it is assumed that sequence-specific background base compositions are given in the format like:
seq_id1 #A #T #C #G seq_id2 #A #T #C #G ... ...
where #A/#T/#C/#G are counts or fractions of each base type in background data (e.g., introns) for each sequence. For sequences without background base composition information, 'NA' will be returned from encp* methods.
When provided are numbers, it should be like
0.2,0.3,0.3,0.2
giving the frequency of A/T/C/G in order.
- -l/--lite
-
A switch option. In default, the program outputs counts of amino acids, GC content, and protein lengths. If this option is set these parameters will not be output.
- -o/--out-file
-
the file to store the results. Default is to standard output, usually screen.
- -h/--help
-
show the brief help message.
AUTHOR
Zhenguo Zhang, <zhangz.sci at gmail.com>
BUGS
Please report any bugs or feature requests to bug-bio-cua at rt.cpan.org
or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Bio-CUA. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this class with the perldoc command.
perldoc Bio::CUA
You can also look for information at:
RT: CPAN's request tracker (report bugs here)
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
ACKNOWLEDGEMENTS
UPDATES
0.11 - Thu May 21 16:00:28 EDT 2015
1. modify/add option --base-comp and --lite.
LICENSE AND COPYRIGHT
Copyright 2015 Zhenguo Zhang.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.