NAME
BioX::Seq::Fetch - Fetch records from indexed FASTA non-sequentially
SYNOPSIS
use BioX::Seq::Fetch;
my $parser = BioX::Seq::Fetch->new($filename);
my $seq = $parser->fetch('seq_ABC');
my $sub = $parser->fetch('seq_XYZ', 8 => 15);
DESCRIPTION
BioX::Seq::Fetch
provides non-sequential access to records from indexed sequence files. Currently only FASTA files indexed using samtoools faidx
or another compatible method are supported. The module will now create samtools-compatible index files automatically if they are missing.
CONSTRUCTOR
new
my $parser = BioX::Seq::Fetch->new(
$filename,
with_descriptions => 1,
);
Create a new BioX::Seq::Fetch
parser. Requires an input filename (STDIN or open filehandles are not supported, as a filename is needed to find the corresponding index file and to ensure than seek()
-ing is supported). Takes one optional boolean argument ('with_descriptions') indicating whether to enable backtracking to find and include any sequence description present (normally this is absent as the FASTA index includes the offset to the sequence itself and not the defline). This option is currently experimental and may slow down sequence fetches, so it is turned off by default.
METHODS
fetch_seq
my $seq = $parser->fetch_seq(
$name,
$start,
$end,
);
Returns the requested sequence as a BioX::Seq
object, or undef if no matching sequence is found. Requires a valid sequence identifier and optionally 1-based start and end coordinates to retrieve a substring (the entire sequence is returned by default). A fatal error is thrown if the provided coordinates are outside the range of [1-length(sequence)].
write_index
$parser->write_index();
$parser->write_index( 'path/to/file.fa.fai' );
Writes a samtools-compatible index file for the underlying sequence file. Accepts one optional argument specifying the path of the file to create (the default, which should usually not be changed, is the same as the underlying sequence file with a '.fai' extension added).
This method is now called automatically if a FASTA file is opened with no index file present.
ids
my @seq_ids = $parser->ids;
Returns an array of sequence IDs, ordered by their occurence in the underlying file.
length
my $len = $parser->length( $seq_id );
Returns the length of the sequence given by $seq_id
. May be marginally faster than fetching the sequence object and then finding the length.
COMPRESSION
BioX::Seq::Fetch
supports files compressed with blocked gzip (BGZIP), typically using the bgzip
utility. This allows for pseudo-random access without the need for full file decompression. The Compress::BGZIP
module is required for this functionality.
CAVEATS AND BUGS
Please report any bugs or feature requests to the issue tracker at https://github.com/jvolkening/p5-BioX-Seq.
AUTHOR
Jeremy Volkening <jeremy *at* base2bio.com>
COPYRIGHT AND LICENSE
Copyright 2014-2017 Jeremy Volkening
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.