NAME
BioX::Seq::Utils - miscellaneous sequence-related functions
SYNOPSIS
if ( is_nucleic($seq) ) {
$seq = rev_com( $seq );
}
my @orfs = all_orfs(
$seq,
3, # ORF mode
200, # min length
);
my $re = build_ORF_regex(
0, # ORF mode
300, # min length
);
DESCRIPTION
BioX::Seq::Utils
contain a number of sequence-related functions. They are general functions that are used often enough to warrant inclusion in a library but not often enough to warrant addition to the core BioX::Seq
class. They may also include commonly-used functions that do not make sense to include as BioX::Seq
methods, as well as functions that mirror BioX::Seq
methods but can be used on raw strings. They act on simple scalars and arrays rather than objects.
NOTE: Use of this module is considered deprecated. It is retained within the <BioX::Seq> package as a number of existing software tools rely on it, but at some point in the future these functions will likely find a new home elsewhere.
FUNCTIONS
rev_com
my $re = rev_com($seq);
Takes a single scalar argument and returns a scalar containing the reverse complement. Throws an exception if the input value doesn't look like a nucleic acid sequence.
is_nucleic
if ( is_nucleic($seq) ) {
# do something
}
Takes a single scalar argument and returns a boolean value indicating whether the scalar "looks like" a nucleic acid string (i.e. contains no characters but valid IUPAC nucleic acid codes).
all_orfs
my @orfs = all_orfs(
$seq,
2, # ORF mode
100, # min length
);
for my $orf (@orfs) {
my ($seq, $start, $end) = @{$orf};
}
Takes one required argument (a sequence string) and two optional arguments (ORF mode and minimum length) and returns an array of array references representing all ORFs in all reading frames of the sequence. Each reference contains three values: the sequence, the start position, and the stop position. The strand can be determined by comparing start and stop position (ORFs on the reverse strand will have start > stop). See bulid_ORF_regex()
for an explanation for the possible values for ORF mode.
build_ORF_regex
my $re = build_ORF_regex(
3,
300,
);
Builds a regular expression for matching opening reading frames in a nucleic acid sequence string. Takes two required arguments that are used for building the regular expression:
mode - an integer from 0-3 defining the type of open reading frame detected.
0 - any set of codons not containing a start codon
1 - must end with stop codon
2 - must begin with start codon
3 - must begin with start codon and end with stop codon
C min_len - an integer representing the minimum number of nucleic acids an open reading frame must contain to be returned (not including the stop codon)
The return value is a compiled expression that can be used to search a sequence string. The pos()
function should be used on the string to set the frame to be searched (0-2) prior to applying the regex.
CAVEATS AND BUGS
Please reports bugs to the author.
AUTHOR
Jeremy Volkening <jeremy *at* base2bio.com>
COPYRIGHT AND LICENSE
Copyright 2014 Jeremy Volkening
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.