The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Bio::Gonzales::Seq::IO - fast utility functions for sequence IO

SYNOPSIS

use Bio::Gonzales::Seq::IO qw( faslurp faspew fahash fasubseq faiterate )

DESCRIPTION

SUBROUTINES

@seqs = faslurp(@filenames)
$seqsref = faslurp(@filenames)

faslurp reads in all sequences from @filenames and returns an array in list or an arrayref in scalar context of the read sequences. The sequences are stored as FAlite2::Entry objects.

$iterator = faiterate($filename)

Allows you to create an iterator for the fasta file $filename. This iterator can be used to loop over the sequence file w/o reading in all content at once. Iterator usage:

while(my $sequence_object = $iterator->()) {
    #do something with the sequence object
}
$seqs = fasubseq($file, \@ids_with_locations, \%c)
$seqs = fasubseq($file, \@id_list, \%c)
#ARRAY OF ARRAYS
@ids_with_locations = (
    [ $id, $begin, $end, $strand ],
    ...
);

Config options can be:

%c = (
    keep_id => 1, # keeps the original id of the sequence
    wrap => 1, # see further down
    relaxed_range => 1, # substitute 0 or undef for $begin with '^' and for $end with '$'
);

There are several possibilities for $begin and $end:

GGCAAAGGA ATGATGGTGT GCAGGCTTGG CATGGGAGAC
^..........^                                (1,11) OR ('^', 11)
   ^.....................................^  (4,'$')
                      ^..............^      (21,35) { with wrap on: OR (-19,35) OR (-19, -5) }
                      ^..................^  (21,35) { with wrap on: OR (-19,'$') }

wrap: The default is to limit all negative values to the sequence boundaries, so a negative begin would be equal to 1 or '^' and a negative end would be equal to '$'.

$sref = fahash(@filenames)
%seqs = fahash(@filenames)

Does the same as faslurp, but returns an hash with the sequence ids as keys and the sequence objects as values.

faspew($file, $seq1, $seq2, ...)

"spew" out the given sequences to a file. Every $seqN argument can be an hash reference with FAlite2::Entry objects as values or an array reference of FAlite2::Entry objects or just plain FAlite2::Entry objects.

$iterator = faspew_iterate($filename)
$iterator = faspew_iterate($fh)

Creates an iterator that writes the sequences to the given $filename or $fh.

for my $sequence_object (@sequences) {
    $iterator->($sequence_object)
}
#DO NOT FORGET THIS, THIS CALL WILL CLOSE THE FILEHANDLE
$iterator->();

#this is equal to:

$iterator->(@sequences);
$iterator->();
#or
$iterator->(\@sequences);
$iterator->();


#DO NOT DO THIS:

$iterator->();

The filehandle will not be closed in case one supplies not a $filename but a $fh handle.

ADVANCED

change the output format
$Bio::Gonzales::Seq::IO::WIDTH = 60; #sequence width in fasta output

#but only if set to 'all_pretty' ('all' is default)
$Bio::Gonzales::Seq::IO::SEQ_FORMAT = 'all_pretty'; 

SEE ALSO

AUTHOR

jw bargsten, <joachim.bargsten at wur.nl>