NAME
calculate_CUB.pl - a program to calculate sequence codon usage bias indices and other sequence parameters.
VERSION
VERSION: 0.01
SYNOPSIS
This program computes CUB indices for each sequence; the types of computed CUB indices depend on the provided options (see below).
In addition to CUB indices, the program also computes some other features such as counts of amino acids, GC-content of the whole sequence and the 3rd codon positions.
# compute ENC, ENC_r, CAI, and tAI for each sequence in file cds.fa
summarize_cds_stat.pl --cai CAI_param.top_200 --tai tAI_param \
--enc enc,enc_r --seq cds.fa -o CUB_indice.tsv
OPTIONS
Mandatory options
- -s/--seq-file
-
file containing sequences in fasta format, from which each sequence's CUB indices are computed.
Auxiliary options
- -g/--gc-id
-
ID of genetic code table used for identifying amino acid encoded by each codon. Default is 1, i.e., standard code. See NCBI Genetic Code for valid IDs.
- -t/--tai-param
-
file containing tAI value for each codon in the format 'codon<tab>tAI_value', which can be produced by build_tai_param.pl. If not given, tAI values would not be computed.
- -c/--cai-param
-
similar to --tai-param, except that CAI values are provided in the same format. This file may be produced by build_cai_param.pl. If not given, CAI values would not be computed.
- -e/--enc-methods
-
methods for ENC calculations. Available values are enc, enc_r, encp, and encp_r. encp* versions corrects background GC-content in calculations. *_r versions uses a new method to estimate missing F values. Check module Bio::CUA::CUB::Calculator to see details of these methods. Default is enc. Multiple methods can be specified as comma-separated string such as 'enc,encp,enc_r'.
- -b/--base-comp
-
background base compositions used for correcting GC content in ENC calculations. This option has no effect unless encp* version methods are specified in --enc-methods.
the format is like this:
seq_id1 #A #T #C #G seq_id2 #A #T #C #G ... ...
where #A/#T/#C/#G are counts or fractions of each base type in background data (e.g., introns) for each sequence. For sequences without background base composition information, 'NA' will be returned for encp* methods.
- -o/--out-file
-
the file to store the results. Default is to standard output, usually screen.
- -h/--help
-
show the brief help message.
AUTHOR
Zhenguo Zhang, <zhangz.sci at gmail.com>
BUGS
Please report any bugs or feature requests to bug-bio-cua at rt.cpan.org
or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Bio-CUA. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this class with the perldoc command.
perldoc Bio::CUA
You can also look for information at:
RT: CPAN's request tracker (report bugs here)
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
ACKNOWLEDGEMENTS
LICENSE AND COPYRIGHT
Copyright 2015 Zhenguo Zhang.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.