The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

LICENSE

Copyright [1999-2015] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute Copyright [2016-2024] EMBL-European Bioinformatics Institute

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

CONTACT

  Please email comments or questions to the public Ensembl
  developers list at <http://lists.ensembl.org/mailman/listinfo/dev>.

  Questions may also be sent to the Ensembl help desk at
  <http://www.ensembl.org/Help/Contact>.

NAME

Bio::EnsEMBL::DBSQL::AssemblyMapperAdaptor

SYNOPSIS

  use Bio::EnsEMBL::Registry;

  Bio::EnsEMBL::Registry->load_registry_from_db(
    -host => 'ensembldb.ensembl.org',
    -user => 'anonymous'
  );

  $asma = Bio::EnsEMBL::Registry->get_adaptor( "human", "core",
    "assemblymapper" );

  $csa = Bio::EnsEMBL::Registry->get_adaptor( "human", "core",
    "coordsystem" );

  my $chr33_cs = $csa->fetch_by_name( 'chromosome', 'NCBI33' );
  my $chr34_cs = $csa->fetch_by_name( 'chromosome', 'NCBI34' );
  my $ctg_cs   = $csa->fetch_by_name('contig');
  my $clone_cs = $csa->fetch_by_name('clone');

  my $chr_ctg_mapper =
    $asma->fetch_by_CoordSystems( $chr33_cs, $ctg_cs );

  my $ncbi33_ncbi34_mapper =
    $asm_adptr->fetch_by_CoordSystems( $chr33, $chr34 );

  my $ctg_clone_mapper =
    $asm_adptr->fetch_by_CoordSystems( $ctg_cs, $clone_cs );

DESCRIPTION

Adaptor for handling Assembly mappers. This is a Singleton class. ie: There is only one per database (DBAdaptor).

This is used to retrieve mappers between any two coordinate systems whose makeup is described by the assembly table. Currently one step (explicit) and two step (implicit) pairwise mapping is supported. In one-step mapping an explicit relationship between the coordinate systems is defined in the assembly table. In two-step 'chained' mapping no explicit mapping is present but the coordinate systems must share a common mapping to an intermediate coordinate system.

METHODS

new

  Arg [1]    : Bio::EnsEMBL::DBAdaptor $dbadaptor the adaptor for
               the database this assembly mapper is using.
  Example    : my $asma = new Bio::EnsEMBL::AssemblyMapperAdaptor($dbadaptor);
  Description: Creates a new AssemblyMapperAdaptor object
  Returntype : Bio::EnsEMBL::DBSQL::AssemblyMapperAdaptor
  Exceptions : none
  Caller     : Bio::EnsEMBL::DBSQL::DBAdaptor
  Status     : Stable

cache_seq_ids_with_mult_assemblys

  Example    : $self->adaptor->cache_seq_ids_with_mult_assemblys();
  Description: Creates a hash of the component seq region ids that
               map to more than one assembly from the assembly table.
  Retruntype : none
  Exceptions : none
  Caller     : AssemblyMapper, ChainedAssemblyMapper
  Status     : At Risk

fetch_by_CoordSystems

  Arg [1]    : Bio::EnsEMBL::CoordSystem $cs1
               One of the coordinate systems to retrieve the mapper
               between
  Arg [2]    : Bio::EnsEMBL::CoordSystem $cs2
               The other coordinate system to map between
  Description: Retrieves an Assembly mapper for two coordinate
               systems whose relationship is described in the
               assembly table.

               The ordering of the coodinate systems is arbitrary.
               The following two statements are equivalent:
               $mapper = $asma->fetch_by_CoordSystems($cs1,$cs2);
               $mapper = $asma->fetch_by_CoordSystems($cs2,$cs1);
  Returntype : Bio::EnsEMBL::AssemblyMapper
  Exceptions : wrong argument types
  Caller     : general
  Status     : Stable

register_assembled

  Arg [1]    : Bio::EnsEMBL::AssemblyMapper $asm_mapper
               A valid AssemblyMapper object
  Arg [2]    : integer $asm_seq_region
               The dbID of the seq_region to be registered
  Arg [3]    : int $asm_start
               The start of the region to be registered
  Arg [4]    : int $asm_end
               The end of the region to be registered
  Description: Declares an assembled region to the AssemblyMapper.
               This extracts the relevant data from the assembly
               table and stores it in Mapper internal to the $asm_mapper.
               It therefore must be called before any mapping is
               attempted on that region. Otherwise only gaps will
               be returned.  Note that the AssemblyMapper automatically
               calls this method when the need arises.
  Returntype : none
  Exceptions : throw if the seq_region to be registered does not exist
               or if it associated with multiple assembled pieces (bad data
               in assembly table)
  Caller     : Bio::EnsEMBL::AssemblyMapper
  Status     : Stable

register_component

  Arg [1]    : Bio::EnsEMBL::AssemblyMapper $asm_mapper
               A valid AssemblyMapper object
  Arg [2]    : integer $cmp_seq_region
               The dbID of the seq_region to be registered
  Description: Declares a component region to the AssemblyMapper.
               This extracts the relevant data from the assembly
               table and stores it in Mapper internal to the $asm_mapper.
               It therefore must be called before any mapping is
               attempted on that region. Otherwise only gaps will
               be returned.  Note that the AssemblyMapper automatically
               calls this method when the need arises.
  Returntype : none
  Exceptions : throw if the seq_region to be registered does not exist
               or if it associated with multiple assembled pieces (bad data
               in assembly table)
  Caller     : Bio::EnsEMBL::AssemblyMapper
  Status     : Stable

register_chained

  Arg [1]    : Bio::EnsEMBL::ChainedAssemblyMapper $casm_mapper
               The chained assembly mapper to register regions on
  Arg [2]    : string $from ('first' or 'last')
               The direction we are registering from, and the name of the
               internal mapper.
  Arg [3]    : string $seq_region_name
               The name of the seqregion we are registering on
  Arg [4]    : listref $ranges
               A list  of ranges to register (in [$start,$end] tuples).
  Arg [5]    : (optional) $to_slice
               Only register those on this Slice.
  Description: Registers a set of ranges on a chained assembly mapper.
               This function is at the heart of the chained mapping process.
               It retrieves information from the assembly table and
               dynamically constructs the mappings between two coordinate
               systems which are 2 mapping steps apart. It does this by using
               two internal mappers to load up a third mapper which is
               actually used by the ChainedAssemblyMapper to perform the
               mapping.

               This method must be called before any mapping is
               attempted on regions of interest, otherwise only gaps will
               be returned.  Note that the ChainedAssemblyMapper automatically
               calls this method when the need arises.
  Returntype : none
  Exceptions : throw if the seq_region to be registered does not exist
               or if it associated with multiple assembled pieces (bad data
               in assembly table)

               throw if the mapping between the coordinate systems cannot
               be performed in two steps, which means there is an internal
               error in the data in the meta table or in the code that creates
               the mapping paths.
  Caller     : Bio::EnsEMBL::AssemblyMapper
  Status     : Stable

_register_chained_special

  Arg [1]    : Bio::EnsEMBL::ChainedAssemblyMapper $casm_mapper
               The chained assembly mapper to register regions on
  Arg [2]    : string $from ('first' or 'last')
               The direction we are registering from, and the name of the
               internal mapper.
  Arg [3]    : string $seq_region_name
               The name of the seqregion we are registering on
  Arg [4]    : listref $ranges
               A list  of ranges to register (in [$start,$end] tuples).
  Arg [5]    : (optional) $to_slice
               Only register those on this Slice.
  Description: Registers a set of ranges on a chained assembly mapper.
               This function is at the heart of the chained mapping process.
               It retrieves information from the assembly table and
               dynamically constructs the mappings between two coordinate
               systems which are 2 mapping steps apart. It does this by using
               two internal mappers to load up a third mapper which is
               actually used by the ChainedAssemblyMapper to perform the
               mapping.

               This method must be called before any mapping is
               attempted on regions of interest, otherwise only gaps will
               be returned.  Note that the ChainedAssemblyMapper automatically
               calls this method when the need arises.
  Returntype : none
  Exceptions : throw if the seq_region to be registered does not exist
               or if it associated with multiple assembled pieces (bad data
               in assembly table)

               throw if the mapping between the coordinate systems cannot
               be performed in two steps, which means there is an internal
               error in the data in the meta table or in the code that creates
               the mapping paths.
  Caller     : Bio::EnsEMBL::AssemblyMapper
  Status     : Stable

register_all

  Arg [1]    : Bio::EnsEMBL::AssemblyMapper $mapper
  Example    : $mapper = $asm_mapper_adaptor->fetch_by_CoordSystems($cs1,$cs2);

               # make cache large enough to hold all of the mappings
               $mapper->max_pair_count(10e6);
               $asm_mapper_adaptor->register_all($mapper);

               # perform mappings as normal
               $mapper->map($slice->seq_region_name(), $sr_start, $sr_end,
                            $sr_strand, $cs1);
               ...
  Description: This function registers the entire set of mappings between
               two coordinate systems in an assembly mapper.
               This will use a lot of memory but will be much more efficient
               when doing a lot of mapping which is spread over the entire
               genome.
  Returntype : none
  Exceptions : none
  Caller     : specialised prograhsm
  Status     : Stable

register_all_chained

  Arg [1]    : Bio::EnsEMBL::ChainedAssemblyMapper $casm_mapper
  Example    : $mapper = $asm_mapper_adaptor->fetch_by_CoordSystems($cs1,$cs2);

               # make the cache large enough to hold all of the mappings
               $mapper->max_pair_count(10e6);
               # load all of the mapping data
               $asm_mapper_adaptor->register_all_chained($mapper);

               # perform mappings as normal
               $mapper->map($slice->seq_region_name(), $sr_start, $sr_end,
                            $sr_strand, $cs1);
               ...
  Description: This function registers the entire set of mappings between
               two coordinate systems in a chained mapper.  This will use a lot
               of memory but will be much more efficient when doing a lot of
               mapping which is spread over the entire genome.
  Returntype : none
  Exceptions : throw if mapper is between coord systems with unexpected
               mapping paths
  Caller     : specialised programs doing a lot of genome-wide mapping
  Status     : Stable

seq_regions_to_ids

  Arg [1]    : Bio::EnsEMBL::CoordSystem $coord_system
  Arg [2]    : listref of strings $seq_regions
  Example    : my @ids = @{$asma->seq_regions_to_ids($coord_sys, \@seq_regs)};
  Description: Converts a list of seq_region names to internal identifiers
               using the internal cache that has accumulated while registering
               regions for AssemblyMappers. If any requested regions are
               not  found in the cache an attempt is made to retrieve them
               from the database.
  Returntype : listref of ints
  Exceptions : throw if a non-existant seqregion is provided
  Caller     : general
  Status     : Stable

seq_ids_to_regions

  Arg [1]    : listref of   seq_region ids
  Example    : my @ids = @{$asma->ids_to_seq_regions(\@seq_ids)};
  Description: Converts a list of seq_region ids to seq region names
               using the internal cache that has accumulated while registering
               regions for AssemblyMappers. If any requested regions are
               not  found in the cache an attempt is made to retrieve them
               from the database.
  Returntype : listref of strings
  Exceptions : throw if a non-existant seq_region_id is provided
  Caller     : general
  Status     : Stable

delete_cache

 Description: Delete all the caches for the mappings/seq_regions
 Returntype : none
 Exceptions : none
 Caller     : General
 Status     : At risk