The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

ReadAnnotation - A tool for easy annotation of a reads on the Ensembl API.

SYNOPSIS

    annoteTagsOnEnsembl.pl -g genome -i mapping_file -k gene_point -o output -l factor_length -m max_query_simultaneous [--config file.cfg] [--registry PATH_TO_EnsemblRegistry.pm] [--verbose] |--version]
    -g genome : genome which is associated with tags.
             'Homo Sapiens' for Human (default),
             'Pan troglodytes' for Chimpanzee,
             'Mus musculus' for Mouse,
             'Macaca mulatta' for Macaque,
             'Pongo pygmaeus' for Orangutan,
              etc (cf http://www.ensembl.org/info/about/species.html)
    -i mapping_file : a file with mapping information for each tag, 
                      one tag by line and one location by tag (format tag|chr,strand  position).    
    -l factor_length : the tag length. 
    -k gene_point : the distance to find the first gene in the 5' neighborhood and the 3' neighborhood of the tag (DISTANCE_MAX by default).
    -o output : output file.
    -m max_query_simultaneous : number of simultaneous queries to run on Ensembl GB database

    --config file.cfg : use file.cfg instead of default config file.
    --registry PATH_TO_EnsemblRegistry.pm
    --verbose (optional) : we can use this option to print more details of the procedure in the log file.
    --version : Ensembl API version

OPTIONS

    output :
              output                  : a csv file with tabulated format for the annotation features for each tag.

           Each line contains two level of information: 
             
              1) In case of a tag cross a protein_coding gene or a pseudogene, a relative annotation by tag is saved: 
                 3PRIM_UTR_sense      : it is in a 3PRIM_UTR of a transcript.  
                 3PRIM_UTR_antisense  : it is antisense of a 3PRIM_UTR of a transcript.  
                 CDS_sense            : it is in a CDS of a transcript.  
                 CDS_antisense        : it is antisense of a CDS of a transcript.  
                 5PRIM_UTR_sense      : it is in a 5PRIM_UTR of a transcript.  
                 5PRIM_UTR_antisense  : it is antisense of a 5PRIM_UTR of a transcript.  
                 INXON_sense          : it is overlap an exon and an intron.
                 INXON_antisense      : it is antisense of an overlaping between an exon and an intron.
                 INTRON_sense         : it is in an INTRON of a transcript.    
                 INTRON_antisense     : it is antisense of an INTRON of a transcript.  
                 INTER_PROXIMAL       : it is in an intergenic region but near of a 5' gene. 
                                        (before the gene_point value) and it can be considered as a 3'UTR variant.  
                 INTER_DISTAL         : it is in an intergenic region and it can be considered as a new transcript.  
                 INTER_DISTAL_EST     : it is intergenic distal but some EST overlap the tag.
    
              2) In case of a tag cross a non_coding gene, a relative annotation by tag is saved:
                 small_ncRNA          : it is included inside a small non_coding RNA (miRNA, snRNA, snoRNA, rRNA, Mt_rRNA, Mt_tRNA
                                                                                 , tRNA, scRNA, ncRNA, 3prime_overlapping_ncrna)
                 lincRNA              : it is included inside a long intergenic non_coding RNA
                 other_lncRNA         : it is included inside an other long non_coding RNA (antisense, sense_intronic, processed_transcript)
                 other_noncodingRNA   : it is included inside an other non_coding RNA (misc_RNA, ncRNA_host, sense_overlapping
                                                                                      , retained_intron, processed_pseudogene
                                                                                      , unprocessed_pseudogene
                                                                                      , transcribed_processed_pseudogene
                                                                                      , transcribed_unprocessed_pseudogene
                                                                                      , retrotransposed, unitary_pseudogene)

DESCRIPTION

An easy tool to annotate reads using the Ensembl API.

PUBLIC METHODS

new

  Arg [TAG_LENGTH]:   Integer - Tag_length to annotate (same as k in CRAC).
  Arg [SPECIES]:      (Optional) String - Genome to use for querying Ensembl
                      Default : 'Human'
  Arg [DISTANCE_MAX]: (Optional) String - Distance max constatn.
  Arg [INTERGENIC_THRESHOLD]:   (Optional) String - Gene point constant/

  Exemple     : $annotator = ReadAnnotation->new();
  Description : Create a new ReadAnnotation object
  ReturnType  : ReadAnnotation
  Exceptions  : none

getAnnotation

  Arg [1]     : String - Chromosome
  Arg [2]     : Integer - Strand
  Arg [3]     : Integer - position
  Arg [4]     : (Optional) String ['before | 'after'] - Sense (are we looking for a tag before this position or after)
                Default : 'after'
  Arg [5]     : (Optional) String - Tag (sequence of the tag to process unit tests) 

  Exemple     : my $slice = readAnnotation->getAnnotation();
  Description : Create an annotation hash for the given position.
  ReturnType  : Annotation hash of a tag : 
                %annotation = ( tag       => 'value',
                                priority  => 'value',
                                annot     => 'value',
                                hugo      => 'value',
                                id        => 'value',
                                desc      => 'value',
                                hugo_non_noding => 'value',
                                id_non_coding   => 'value',
                                desc_non_coding => 'value',
                                hugo_3prim => 'value',
                                id_3prim   => 'value',
                                desc_3prim => 'value',
                                hugo_5prim => 'value',
                                id_5prim   => 'value',
                                desc_5prim => 'value',
                              );
  Exceptions  : none

PRIVATE METHODS

Please, do not try to use these methods outside this package.

annoteTagOnGenome

  Arg [1]     : Bio::EnsEMBL::Slice - Slice object use for annotation
  Arg [2]     : String - Chromosome
  Arg [3]     : String - Strand
  Arg [4]     : String - Position

  Exemple     : my $annotation = ReadAnnotation->annoteTagOnGenome();
  Description : Create an annotation hash for the given tag.
  ReturnType  : Annotation hash of a tag : 
                %annotation = ( tag       => 'value',
                                priority  => 'value',
                                annot     => 'value',
                                hugo      => 'value',
                                id        => 'value',
                                desc      => 'value',
                                hugo_non_noding => 'value',
                                id_non_coding   => 'value',
                                desc_non_coding => 'value',
                                hugo_3prim => 'value',
                                id_3prim   => 'value',
                                desc_3prim => 'value',
                                distance_of_3prim_gene => 'value',
                                hugo_5prim => 'value',
                                id_5prim   => 'value',
                                desc_5prim => 'value',
                                distance_of_5prim_gene => 'value',
                              );
  Exceptions  : none