NAME
ReadAnnotation - A tool for easy annotation of a reads on the Ensembl API.
SYNOPSIS
annoteTagsOnEnsembl.pl -g genome -i mapping_file -k gene_point -o output -l factor_length -m max_query_simultaneous [--config file.cfg] [--registry PATH_TO_EnsemblRegistry.pm] [--verbose] |--version]
-g genome : genome which is associated with tags.
'Homo Sapiens' for Human (default),
'Pan troglodytes' for Chimpanzee,
'Mus musculus' for Mouse,
'Macaca mulatta' for Macaque,
'Pongo pygmaeus' for Orangutan,
etc (cf http://www.ensembl.org/info/about/species.html)
-i mapping_file : a file with mapping information for each tag,
one tag by line and one location by tag (format tag|chr,strand position).
-l factor_length : the tag length.
-k gene_point : the distance to find the first gene in the 5' neighborhood and the 3' neighborhood of the tag (DISTANCE_MAX by default).
-o output : output file.
-m max_query_simultaneous : number of simultaneous queries to run on Ensembl GB database
--config file.cfg : use file.cfg instead of default config file.
--registry PATH_TO_EnsemblRegistry.pm
--verbose (optional) : we can use this option to print more details of the procedure in the log file.
--version : Ensembl API version
OPTIONS
output :
output : a csv file with tabulated format for the annotation features for each tag.
Each line contains two level of information:
1) In case of a tag cross a protein_coding gene or a pseudogene, a relative annotation by tag is saved:
3PRIM_UTR_sense : it is in a 3PRIM_UTR of a transcript.
3PRIM_UTR_antisense : it is antisense of a 3PRIM_UTR of a transcript.
CDS_sense : it is in a CDS of a transcript.
CDS_antisense : it is antisense of a CDS of a transcript.
5PRIM_UTR_sense : it is in a 5PRIM_UTR of a transcript.
5PRIM_UTR_antisense : it is antisense of a 5PRIM_UTR of a transcript.
INXON_sense : it is overlap an exon and an intron.
INXON_antisense : it is antisense of an overlaping between an exon and an intron.
INTRON_sense : it is in an INTRON of a transcript.
INTRON_antisense : it is antisense of an INTRON of a transcript.
INTER_PROXIMAL : it is in an intergenic region but near of a 5' gene.
(before the gene_point value) and it can be considered as a 3'UTR variant.
INTER_DISTAL : it is in an intergenic region and it can be considered as a new transcript.
INTER_DISTAL_EST : it is intergenic distal but some EST overlap the tag.
2) In case of a tag cross a non_coding gene, a relative annotation by tag is saved:
small_ncRNA : it is included inside a small non_coding RNA (miRNA, snRNA, snoRNA, rRNA, Mt_rRNA, Mt_tRNA
, tRNA, scRNA, ncRNA, 3prime_overlapping_ncrna)
lincRNA : it is included inside a long intergenic non_coding RNA
other_lncRNA : it is included inside an other long non_coding RNA (antisense, sense_intronic, processed_transcript)
other_noncodingRNA : it is included inside an other non_coding RNA (misc_RNA, ncRNA_host, sense_overlapping
, retained_intron, processed_pseudogene
, unprocessed_pseudogene
, transcribed_processed_pseudogene
, transcribed_unprocessed_pseudogene
, retrotransposed, unitary_pseudogene)
DESCRIPTION
An easy tool to annotate reads using the Ensembl API.
PUBLIC METHODS
new
Arg [TAG_LENGTH]: Integer - Tag_length to annotate (same as k in CRAC).
Arg [SPECIES]: (Optional) String - Genome to use for querying Ensembl
Default : 'Human'
Arg [DISTANCE_MAX]: (Optional) String - Distance max constatn.
Arg [INTERGENIC_THRESHOLD]: (Optional) String - Gene point constant/
Exemple : $annotator = ReadAnnotation->new();
Description : Create a new ReadAnnotation object
ReturnType : ReadAnnotation
Exceptions : none
getAnnotation
Arg [1] : String - Chromosome
Arg [2] : Integer - Strand
Arg [3] : Integer - position
Arg [4] : (Optional) String ['before | 'after'] - Sense (are we looking for a tag before this position or after)
Default : 'after'
Arg [5] : (Optional) String - Tag (sequence of the tag to process unit tests)
Exemple : my $slice = readAnnotation->getAnnotation();
Description : Create an annotation hash for the given position.
ReturnType : Annotation hash of a tag :
%annotation = ( tag => 'value',
priority => 'value',
annot => 'value',
hugo => 'value',
id => 'value',
desc => 'value',
hugo_non_noding => 'value',
id_non_coding => 'value',
desc_non_coding => 'value',
hugo_3prim => 'value',
id_3prim => 'value',
desc_3prim => 'value',
hugo_5prim => 'value',
id_5prim => 'value',
desc_5prim => 'value',
);
Exceptions : none
PRIVATE METHODS
Please, do not try to use these methods outside this package.
annoteTagOnGenome
Arg [1] : Bio::EnsEMBL::Slice - Slice object use for annotation
Arg [2] : String - Chromosome
Arg [3] : String - Strand
Arg [4] : String - Position
Exemple : my $annotation = ReadAnnotation->annoteTagOnGenome();
Description : Create an annotation hash for the given tag.
ReturnType : Annotation hash of a tag :
%annotation = ( tag => 'value',
priority => 'value',
annot => 'value',
hugo => 'value',
id => 'value',
desc => 'value',
hugo_non_noding => 'value',
id_non_coding => 'value',
desc_non_coding => 'value',
hugo_3prim => 'value',
id_3prim => 'value',
desc_3prim => 'value',
distance_of_3prim_gene => 'value',
hugo_5prim => 'value',
id_5prim => 'value',
desc_5prim => 'value',
distance_of_5prim_gene => 'value',
);
Exceptions : none