NAME

ReadAnnotation - A tool for easy annotation of a reads on the Ensembl API.

SYNOPSIS

annoteTagsOnEnsembl.pl -g genome -i mapping_file -k gene_point -o output -l factor_length -m max_query_simultaneous [--config file.cfg] [--registry PATH_TO_EnsemblRegistry.pm] [--verbose] |--version]
-g genome : genome which is associated with tags.
         'Homo Sapiens' for Human (default),
         'Pan troglodytes' for Chimpanzee,
         'Mus musculus' for Mouse,
         'Macaca mulatta' for Macaque,
         'Pongo pygmaeus' for Orangutan,
          etc (cf http://www.ensembl.org/info/about/species.html)
-i mapping_file : a file with mapping information for each tag, 
                  one tag by line and one location by tag (format tag|chr,strand  position).    
-l factor_length : the tag length. 
-k gene_point : the distance to find the first gene in the 5' neighborhood and the 3' neighborhood of the tag (DISTANCE_MAX by default).
-o output : output file.
-m max_query_simultaneous : number of simultaneous queries to run on Ensembl GB database

--config file.cfg : use file.cfg instead of default config file.
--registry PATH_TO_EnsemblRegistry.pm
--verbose (optional) : we can use this option to print more details of the procedure in the log file.
--version : Ensembl API version

OPTIONS

output :
          output                  : a csv file with tabulated format for the annotation features for each tag.

       Each line contains two level of information: 
         
          1) In case of a tag cross a protein_coding gene or a pseudogene, a relative annotation by tag is saved: 
             3PRIM_UTR_sense      : it is in a 3PRIM_UTR of a transcript.  
             3PRIM_UTR_antisense  : it is antisense of a 3PRIM_UTR of a transcript.  
             CDS_sense            : it is in a CDS of a transcript.  
             CDS_antisense        : it is antisense of a CDS of a transcript.  
             5PRIM_UTR_sense      : it is in a 5PRIM_UTR of a transcript.  
             5PRIM_UTR_antisense  : it is antisense of a 5PRIM_UTR of a transcript.  
             INXON_sense          : it is overlap an exon and an intron.
             INXON_antisense      : it is antisense of an overlaping between an exon and an intron.
             INTRON_sense         : it is in an INTRON of a transcript.    
             INTRON_antisense     : it is antisense of an INTRON of a transcript.  
             INTER_PROXIMAL       : it is in an intergenic region but near of a 5' gene. 
                                    (before the gene_point value) and it can be considered as a 3'UTR variant.  
             INTER_DISTAL         : it is in an intergenic region and it can be considered as a new transcript.  
             INTER_DISTAL_EST     : it is intergenic distal but some EST overlap the tag.

          2) In case of a tag cross a non_coding gene, a relative annotation by tag is saved:
             small_ncRNA          : it is included inside a small non_coding RNA (miRNA, snRNA, snoRNA, rRNA, Mt_rRNA, Mt_tRNA
                                                                             , tRNA, scRNA, ncRNA, 3prime_overlapping_ncrna)
             lincRNA              : it is included inside a long intergenic non_coding RNA
             other_lncRNA         : it is included inside an other long non_coding RNA (antisense, sense_intronic, processed_transcript)
             other_noncodingRNA   : it is included inside an other non_coding RNA (misc_RNA, ncRNA_host, sense_overlapping
                                                                                  , retained_intron, processed_pseudogene
                                                                                  , unprocessed_pseudogene
                                                                                  , transcribed_processed_pseudogene
                                                                                  , transcribed_unprocessed_pseudogene
                                                                                  , retrotransposed, unitary_pseudogene)

DESCRIPTION

An easy tool to annotate reads using the Ensembl API.

PUBLIC METHODS

new

Arg [TAG_LENGTH]:   Integer - Tag_length to annotate (same as k in CRAC).
Arg [SPECIES]:      (Optional) String - Genome to use for querying Ensembl
                    Default : 'Human'
Arg [DISTANCE_MAX]: (Optional) String - Distance max constatn.
Arg [INTERGENIC_THRESHOLD]:   (Optional) String - Gene point constant/

Exemple     : $annotator = ReadAnnotation->new();
Description : Create a new ReadAnnotation object
ReturnType  : ReadAnnotation
Exceptions  : none

getAnnotation

Arg [1]     : String - Chromosome
Arg [2]     : Integer - Strand
Arg [3]     : Integer - position
Arg [4]     : (Optional) String ['before | 'after'] - Sense (are we looking for a tag before this position or after)
              Default : 'after'
Arg [5]     : (Optional) String - Tag (sequence of the tag to process unit tests) 

Exemple     : my $slice = readAnnotation->getAnnotation();
Description : Create an annotation hash for the given position.
ReturnType  : Annotation hash of a tag : 
              %annotation = ( tag       => 'value',
                              priority  => 'value',
                              annot     => 'value',
                              hugo      => 'value',
                              id        => 'value',
                              desc      => 'value',
                              hugo_non_noding => 'value',
                              id_non_coding   => 'value',
                              desc_non_coding => 'value',
                              hugo_3prim => 'value',
                              id_3prim   => 'value',
                              desc_3prim => 'value',
                              hugo_5prim => 'value',
                              id_5prim   => 'value',
                              desc_5prim => 'value',
                            );
Exceptions  : none

PRIVATE METHODS

Please, do not try to use these methods outside this package.

annoteTagOnGenome

Arg [1]     : Bio::EnsEMBL::Slice - Slice object use for annotation
Arg [2]     : String - Chromosome
Arg [3]     : String - Strand
Arg [4]     : String - Position

Exemple     : my $annotation = ReadAnnotation->annoteTagOnGenome();
Description : Create an annotation hash for the given tag.
ReturnType  : Annotation hash of a tag : 
              %annotation = ( tag       => 'value',
                              priority  => 'value',
                              annot     => 'value',
                              hugo      => 'value',
                              id        => 'value',
                              desc      => 'value',
                              hugo_non_noding => 'value',
                              id_non_coding   => 'value',
                              desc_non_coding => 'value',
                              hugo_3prim => 'value',
                              id_3prim   => 'value',
                              desc_3prim => 'value',
                              distance_of_3prim_gene => 'value',
                              hugo_5prim => 'value',
                              id_5prim   => 'value',
                              desc_5prim => 'value',
                              distance_of_5prim_gene => 'value',
                            );
Exceptions  : none