The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Bio::SeqAlignment::Components::Sundry::IOHelpers - Helper functions for reading and writing (simple) sequence files

VERSION

version 0.01

SYNOPSIS

use Bio::SeqAlignment::Components::Sundry::IOHelpers qw(read_fastx_sequences write_fastx_sequences);

my $bioseq_objects = read_fastx_sequences($fastafile);
write_fastx_sequences($bioseq_objects, $fastafile, $outdir);

DESCRIPTION

This module provides helper functions for reading and writing (simple) sequence files. By simple sequence files, we mean files that contain only sequence data, such as FASTA (and some time in the future FASTQ) files. The module uses the BioX::Seq module to parse sequences. This is a simple module that provides a lightweight object-oriented interface to sequence data. It is also wickedly fast.

EXPORTS

read_fastx_sequences

my $bioseq_objects = read_fastx_sequences($fastafile);

Read sequences from a FASTA file into an array of BioX::Seq objects. The function returns a reference to the array of BioX::Seq objects.

split_fastx_sequences

split_fastx_sequences($fastafiles_ref, $max_sequences_per_file);

Split a set of FASTA files into smaller files with a maximum number of sequences per file. The function takes a reference to an array of FASTA files and the maximum number of sequences per file. The function writes the split files to the current directory. The function does not return anything.

write_fastx_sequences

write_fastx_sequences($fastafile, $bioseq_objects);

Write sequences from an array of BioX::Seq objects to a FASTA file. The function writes the sequences to the file. Nothing too fancy, but it saves one from writing the same boilerplate code over and over again.

SEE ALSO

  • BioX::Seq

    BioX::Seq is a simple sequence class that can be used to represent biological sequences. It was designed as a compromise between using simple strings and hashes to hold sequences and using the rather bloated objects of Bioperl. Benchmarking by the author of the present module, shows that its performance for sequence IO under the fast mode is nearly x2 the speed of the BioPerl SeqIO modules and 1.5x the speed of the FAST modules. The speed is rather comparable to the Biopython SeqIO module.

  • FAST

    FAST is a collection of modules that provide a simple and fast interface to sequence data. It is designed to be lightweight and fast and it is somewhat faster than BioPerl itself

AUTHOR

Christos Argyropoulos <chrisarg *at* cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2024 by Christos Argyropoulos.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.