NAME

Bio::MUST::Core::Ali::Temporary - Thin wrapper for a temporary mapped Ali written on disk

VERSION

version 0.242020

SYNOPSIS

#!/usr/bin/env perl

use Modern::Perl '2011';
# same as:
# use strict;
# use warnings;
# use feature qw(say);

use Bio::MUST::Core;
use aliased 'Bio::MUST::Core::Ali::Temporary';

# build Ali::Temporary object from existing ALI file
my $temp_db = Temporary->new( seqs => 'database.ali' );

# get properties
my $db = $temp_db->filename;
my $dbtype = $temp_db->type;

# pass it to external program
system("makeblastdb -in $db -dbtype $dbtype");

# alternative constructor call
# build Ali::Temporary object from existing Ali object
use aliased 'Bio::MUST::Core::Ali';
my $ali = Ali->load('queries.ali');
my $temp_qu = Temporary->new( seqs => $ali );

# pass it to external program
use File::Temp;
my $query = $temp_qu->filename;
my $out = File::Temp->new( UNLINK => 0, SUFFIX => '.blastp' );
system("blastp -query $query -db $db -out $out");
say "report: $out";

# later... when parsing the BLAST report
# let's say $id is a BLAST hit in database.ali
my $id = 'seq2';
my $long_id = $temp_db->long_id_for($id);
say "hit id: $long_id";
# ...

# more alternative constructor calls
# build Ali::Temporary object from list of Seq objects
my @seqs = $ali->filter_seqs( sub { $_->seq_len >= 500 } );
my $temp_ls = Temporary->new( seqs => \@seqs );

# build Ali::Temporary object preserving gaps in Seq objects
# (and persistent associated FASTA file)
my $temp_gp = Temporary->new(
    seqs => \@seqs,
    args => { degap => 0, persistent => 1 }
);
my $filename = $temp_gp->filename;
# later...
unlink $filename;

DESCRIPTION

This module implements a class representing a temporary FASTA file where sequence ids are automatically abbreviated (seq1, seq2...) for maximum compatibility with external programs. To this end, it combines an internal Bio::MUST::Core::Ali object and a Bio::MUST::Core::IdMapper object.

An Ali::Temporary can be built from an existing ALI (or FASTA) file or on-the-fly from a list (ArrayRef) of Bio::MUST::Core::Seq objects (see the SYNOPSIS for examples).

Its sequences can be aligned or not but by default sequences are degapped before writing the associated temporary FASTA file. If gaps are to be preserved, this behavior can be altered via the optional args attribute.

ATTRIBUTES

seqs

Bio::MUST::Core::Ali object (required)

This required attribute contains the Bio::MUST::Core::Seq objects that are written in the associated temporary FASTA file. It can be specified either as a path to an ALI/FASTA file or as an Ali object or as an ArrayRef of Seq objects (see the SYNOPSIS for examples).

For now, it provides the following methods: count_comments, all_comments, get_comment, guessing, all_seq_ids, has_uniq_ids, is_protein, is_aligned, get_seq, get_seq_with_id, first_seq, all_seqs, filter_seqs and count_seqs (see Bio::MUST::Core::Ali).

args

HashRef (optional)

When specified this optional attribute is passed to the temp_fasta method of the internal Ali object. Its purpose is to allow the fine-tuning of the format of the associated temporary FASTA file.

By default, its contents is <clean = 1>> and <degap = 1>>, so as to generate a FASTA file of degapped sequences where ambiguous and missing states are replaced by X.

Additionally, if you want to keep your temporary files around for debugging purposes, you can pass the option <persistent = 1>>. This will disable the autoremoval of the file on object destruction.

file

Path::Class::File object (auto)

This attribute is automatically initialized with the path of the associated temporary FASTA file. Thus, it cannot be user-specified.

It provides the following methods: remove and filename (see below).

mapper

Bio::MUST::Core::IdMapper object (auto)

This attribute is automatically initialized with the mapper associating the long ids of the internal Ali object to the abbreviated ids used in the associated temporary FASTA file. Thus, it cannot be user-specified.

It provides the following methods: all_long_ids, all_abbr_ids, long_id_for and abbr_id_for (see Bio::MUST::Core::IdMapper).

ACCESSORS

filename

Returns the stringified filename of the associated temporary FASTA file.

This method does not accept any arguments.

type

Returns the type of the sequences in the internal Ali object using BLAST denomination (prot or nucl). See Bio::MUST::Core::Seq::is_protein for the exact test performed.

This method does not accept any arguments.

MISC METHODS

remove

Remove (unlink) the associated temporary FASTA file.

Since this method is in principle automatically invoked on object destruction, users should not need it. Note that persistent temporary files (see object constructor) have to be removed manually, which requires to get and store their filename before object destruction.

AUTHOR

Denis BAURAIN <denis.baurain@uliege.be>

COPYRIGHT AND LICENSE

This software is copyright (c) 2013 by University of Liege / Unit of Eukaryotic Phylogenomics / Denis BAURAIN.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.