NAME
Bio::SeqAlignment::Components::SeqMapping::Mapper::Generic Generic Sequence Mapper
VERSION
version 0.03
SYNOPSIS
To use the module one must do *at least* the following:
use Module::Find;
use Moose::Util qw( apply_all_roles );
use Bio::SeqAlignment::Components::SeqMapping::Mapper::Generic;
The rest depends on the dataflow role applied to the mapper The example under USAGE utilizes the LinearLinearGeneric role to demonstrate the usage of the Generic module with the most code that needs to be implemented by the user.
DESCRIPTION
This module loads all the components that can actually map sequences to a reference database. If you don't want to nuke your namespace with all the components, you can load them as needed by using the specific component name, e.g.:
use Bio::SeqAlignment::Components::SeqMapping::Mapper::ComponentName;
where ComponentName is the name of the component you need. If you choose violence, you can load all the components at once by using:
use Bio::SeqAlignment::Components::SeqMapping::Mapper;
USAGE
use Module::Find;
use Moose::Util qw( apply_all_roles );
use Bio::SeqAlignment::Components::SeqMapping::Mapper::Generic;
## apply the LinearLinearGeneric role to the Generic module
use Bio::SeqAlignment::Components::SeqMapping::Dataflow::LinearLinearGeneric;
use Bio::SeqAlignment::Components::SeqMapping::Mapper::Generic;
my $mapper = Bio::SeqAlignment::Components::SeqMapping::Mapper::Generic->new(
create_refDB => \&create_db,
use_refDB => \&use_refDB,
init_sim_search => \&init_sim_search,
seq_align => \&seq_align,
extract_sim_metric => \&extract_sim_metric,
reduce_sim_metric => \&reduce_sim_metric
);
## db_location : where to create the database
## dbname : name of the database
## dbfiles : array ref of the files that hold the reference sequences
$mapper->create_refDB->( $db_location, $dbname, \@dbfiles );
my $ref_DB = $mapper->use_refDB->( $db_location, 'Hsapiens.cds.sample' );
$mapper->sim_search_params( { ... } );
## apply the LinearLinearGeneric Dataflow role to the mapper
apply_all_roles( $mapper,
'Bio::SeqAlignment::Components::SeqMapping::Dataflow::LinearLinearGeneric'
);
## workload : array ref of the sequences to be mapped
## max_workers : number of workers to use for process level parallelism through MCE
my @workload = ... ;
$results = $mapper->sim_seq_search( \@workload, max_workers => 4 );
## a combo of the the following methods will be required to be implemented
## by the user depending on the dataflow role applied to the mapper
sub init_sim_search {
my ( $self, %params ) = @_;
...
}
sub reduce_sim_metric {
my ( $self, $sim_metric, %params ) = @_;
...
}
sub seq_align {
my ( $self, $query_fname ) = @_;
...
}
sub extract_sim_metric {
my ( $self, $seq_align ) = @_;
...
}
sub create_db {
my ( $self, $dbloc, $dbname, $files_aref ) = @_;
...
}
sub use_refDB {
my $self = shift;
...
}
ATTRIBUTES
cleanup
A code reference that cleans up the data after the mapping process. This method is required to be implemented by the user. If you don't implement it, the default is a coderef to an empty subroutine. If you leave it unimplemented, you probably don't need this.
create_refDB
A code reference that creates the reference database. This method is required to be implemented by the user. If you don't implement it, the default is a coderef to an empty subroutine. If you leave it unimplemented, you probably don't need this, e.g. you have already created your database of reference sequences somehow. Perhaps the database was created by someone else, or you are using a pre-existing database, or you created the database through this interface at some time in the past. If none of the above hold true, then you probably DO need to provide the code for this method, through the created_refDB attribute.
extract_sim_metric
A code reference that extracts the similarity metric from the search. This method is required to be implemented by the user. If you don't implement it, the default is a coderef to an empty subroutine. If you leave it unimplemented, you probably don't need this, e.g. you are NOT using the LinearLinearGeneric dataflow role.
extract_sim_metric_params
A hash reference that holds the parameters for extracting the similarity metric. This attribute is required to be set by the user. If you don't set it, the default is an empty hash reference. If you leave it unimplemented, you probably don't need this, e.g. your similarity extraction is not parameterized, and the extraction is hardwired into the extract_sim_metric method. However, DO consider scenarios in which the extraction is parameterized, e.g. you have to filter out some sequences from the search results based on some criteria, that can be controlled by the user.
init_sim_search
A code reference that initializes the similarity search. This method is required to be implemented by the user. If not implemented, the default is a coderef to an empty subroutine. If you leave it unimplemented, you probably don't need this.
reduce_sim_metric
A code reference that reduces the similarity metric to a single value. This method is required to be implemented by the user. If you don't implement it, the default is a coderef to an empty subroutine. If you leave it unimplemented, you probably don't need this, e.g. you are NOT using the LinearLinearGeneric dataflow role.
refDB_access_params
A hash reference that holds the parameters for accessing the reference database. This attribute is required to be set by the user. If you don't set it, the default is an empty hash reference. If you leave it unimplemented, you probably don't need this, i.e. you will not be (for example) accessing the database over the network.
seq_align
A code reference that performs the sequence alignment. This method is required to be implemented by the user. If you don't implement it, the default is a coderef to an empty subroutine. If you leave it unimplemented, you probably don't mind having a non-working code.
seqmap_params
A hash reference that holds the parameters for the sequence mapping. This attribute is required to be set by the user. If you don't set it, the default is an empty hash reference. If you leave it unimplemented, you probably don't need this, e.g. you have hardwired the mapping parameters into the seq_align method. However, DO consider scenarios in which the mapping is parameterized, e.g. you have to filter out some sequences before the search based on some criteria, that can be controlled by the user. Another scenario that you may (or rather SHOULD) use this parameter for is to control the similarity search parameters, e.g. provide match/mismatch scores, gap penalties, etc. that are used by the similarity search algorithm.
sim_search_params
A hash reference that holds the parameters for the similarity search. This attribute is required to be set by the user. If you don't set it, the default is an empty hash reference. If you leave it unimplemented, you probably don't need this, e.g. you are performing a similarity search with a default you have somehow hardwired into the similarity search code (that would be the init_sim_search or seq_align methods).
sim_seq_search
This is a role method that is applied to the generic mapper. It is provided by a dataflow role that is composed into the generic mapper. This method is used to perform the sequence mapping. This method is implemented in one of the generic mapper's dataflow roles, e.g. LinearLinearGeneric, LinearGeneric, etc. The user only needs to apply this to the generic mapper, and then call it with the appropriate arguments. See under USAGE for how to do this in general, and under the EnhancingEdlib example for a specific example.
use_refDB
A code reference that accesses the reference database. This method is required to be implemented by the user. If you don't implement it, the default is a coderef to an empty subroutine. If you leave it unimplemented, you probably don't mind have a code that does not work.
METHODS
_nondefault_set
A method that sets the _has_nondefault_value attribute. This method is used internally to keep track of what has been explicitly set by the user.
_code_for
A method that sets the _code_for attribute. This method is used internally to keep track of the code for each external function (or program) that one may want to interface with.
SEE ALSO
Bio::SeqAlignment::Components::SeqMapping::Dataflow::LinearLinearGeneric
LinearLinear Generic Dataflow role that can be composed into the Generic Mapper.
Bio::SeqAlignment::Components::SeqMapping::Dataflow::LinearGeneric
Linear Generic Dataflow role that can be composed into the Generic Mapper.
Bio::SeqAlignment::Examples::EnhancingEdlib
Example of how to use the Generic Mapper with the LinearLinearGeneric and the LinearGeneric Dataflow roles, along with the Edlib alignment library.
AUTHOR
Christos Argyropoulos <chrisarg@cpan.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2024 by Christos Argyropoulos.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.