NAME

GO::AnnotationProvider::AnnotationParser

SYNOPSIS

GO::AnnotationProvider::AnnotationParser - reads a Gene Ontology gene associations file, and provides methods by which to retrieve the GO annotations for the an annotated entity.

    my $annotationParser = GO::AnnotationProvider::AnnotationParser->new(annotationFile => "data/gene_association.sgd");

    my $geneName = "AAT2";

    print "GO associations for gene: ", join (" ", $annotationParser->goIdsByName(name   => $geneName,
										  aspect => 'P')), "\n";

    print "Database ID for gene: ", $annotationParser->databaseIdByName($geneName), "\n";

    print "Database name: ", $annotationParser->databaseName(), "\n";

    print "Standard name for gene: ", $annotationParser->standardNameByName($geneName), "\n";

    my $i;

    my @geneNames = $annotationParser->allStandardNames();

    foreach $i (0..10) {

        print "$geneNames[$i]\n";

    }

DESCRIPTION

GO::AnnotationProvider::AnnotationParser is a concrete subclass of GO::AnnotationProvider, and creates a data structure mapping gene names to GO annotations by parsing a file of annotations provided by the Gene Ontology Consortium.

This package provides object methods for retrieving GO annotations that have been parsed from a 'gene associations' file, provided by the gene ontology consortium. The format for the file is:

Lines beginning with a '!' character are comment lines.

Column  Cardinality   Contents          
------  -----------   -------------------------------------------------------------
    0       1         Database abbreviation for the source of annotation (eg SGD)
    1       1         Database identifier of the annotated entity
    2       1         Standard name of the annotated entity
    3       0,1       NOT (if a gene is specifically NOT annotated to the term)
    4       1         GOID of the annotation     
    5       1,n       Reference(s) for the annotation 
    6       1         Evidence code for the annotation
    7       0,n       With or From (a bit mysterious)
    8       1         Aspect of the Annotation (C, F, P)
    9       0,1       Name of the product being annotated
   10       0,n       Alias(es) of the annotated product
   11       1         type of annotated entity (one of gene, transcript, protein)
   12       1,2       taxonomic id of the organism encoding and/or using the product
   13       1         Date of annotation YYYYMMDD
   14       1         Assigned_by : The database which made the annotation

Columns are separated by tabs. For those entries with a cardinality greater than 1, multiple entries are pipe , |, delimited.

Further details can be found at:

http://www.geneontology.org/doc/GO.annotation.html#file

The following assumptions about the file are made (and should be true):

1.  All aliases appear for all entries of a given annotated product
2.  The database identifiers are unique, in that two different
    entities cannot have the same database id.

TODO

Also see the TODO list in the parent, GO::AnnotationProvider.

1.  Add in methods that will allow retrieval of evidence codes with
    the annotations for a particular entity.

2.  Add in methods that return all the annotated entities for a
    particular GOID.

3.  Add in the ability to request only annotations either including
    or excluding particular evidence codes.  Such evidence codes
    could be provided as an anonymous array as the value of a named
    argument.

4.  Same as number 3, except allow the retrieval of annotated
    entities for a particular GOID, based on inclusion or exclusion
    of certain evidence codes.

These first four items will require a reworking of how data are
stored on the backend, and thus the parsing code itself, though it
should not affect any of the already existing API.

5.  Instead of 'use'ing Storable, 'require' it instead, only at the
    point of use, which will mean that AnnotationParser can be
    happily used in the absence of Storable, just without those
    functions that need it.

6.  Extend the ValidateFile class method to check that an entity
    should never be annotated to the same node twice, with the same
    evidence, with the same reference.

7.  An additional checker, that uses an AnnotationProvider in
    conjunction with an OntologyProvider, would be useful, that
    checks that some of the annotations themselves are valid, ie
    that no entities are annotated to the 'unknown' node in a
    particular aspect, and also to another node within that same
    aspect.  Can annotations be redundant? ie, if an entity is
    annotated to a node, and an ancestor of the node, is that
    annotation redundant?  Does it depend on the evidence codes and
    references.  Or are such annotations reinforcing?  These things
    are useful to consider when formulating the confidence which can
    be attributed to an annotation.

Class Methods

Usage

This class method simply prints out a usage statement, along with an error message, if one was passed in.

Usage :

GO::AnnotationProvider::AnnotationParser->Usage();

ValidateFile

This class method reads an annotation file, and returns a reference to an array of errors that are present within the file. The errors are simply strings, each beginning with "Line $lineNo : " where $lineNo is the number of the line in the file where the error was found.

Usage:

my $errorsRef = GO::AnnotationProvider::AnnotationParser->ValidateFile(annotationFile => $file);

Constructor

new

This is the constructor for an AnnotationParser object.

The constructor expects one of two arguments, either a 'annotationFile' argument, or and 'objectFile' argument. When instantiated with an annotationFile argument, it expects it to correspond to an annotation file created by one of the GO consortium members, according to their file format. When instantiated with an objectFile argument, it expects to open a previously created annotationParser object that has been serialized to disk (see the serializeToDisk method).

Usage:

my $annotationParser = GO::AnnotationProvider::AnnotationParser->new(annotationFile => $file);

my $annotationParser = GO::AnnotationProvider::AnnotationParser->new(objectFile => $file);

Public instance methods

Some methods dealing with ambiguous names

Because there are many names by which an annotated entity may be referred to, that are non-unique, there exist a set of methods for determining whether a name is ambiguous, and to what database identifiers such ambiguous names may refer.

nameIsAmbiguous

This public method returns a boolean to indicate whether a name is ambiguous, ie whether the name might map to more than one entity (and therefore more than one databaseId)

Usage:

if ($annotationParser->nameIsAmbiguous($name)){

    do something useful....or not....

}

databaseIdsForAmbiguousName

This public method returns an array of database identifiers for an ambiguous name. If the name is not ambiguous, an empty list will be returned.

Usage:

my @databaseIds = $annotationParser->databaseIdsForAmbiguousName($name);

ambiguousNames

This method returns an array of names, which from the annotation file have been deemed to be ambiguous.

Usage:

my @ambiguousNames = $annotationParser->ambiguousNames();

Methods for retrieving GO annotations for entities

goIdsByDatabaseId

This public method returns a reference to an array of GOIDs that are associated with the supplied databaseId for a specific aspect. If no annotations are associated with that databaseId in that aspect, then a reference to an empty array will be returned. If the databaseId is not recognized, then undef will be returned.

Usage:

    my $goidsRef = $annotationParser->goIdsByDatabaseId(databaseId => $databaseId,
							aspect     => <P|F|C>);

goIdsByStandardName

This public method returns a reference to an array of GOIDs that are associated with the supplied standardName for a specific aspect. If no annotations are associated with the entity with that standard name in that aspect, then a a reference to an empty list will be returned. If the supplied name is not used as a standard name, then undef will be returned.

Usage:

    my $goidsRef = $annotationParser->goIdsByStandardName(standardName => $databaseId,
							  aspect       => <P|F|C>);

goIdsByName

This public method returns a reference to an array of GO IDs that are associated with the supplied name for a specific aspect. If there are no GO associations for the entity corresponding to the supplied name in the provided aspect, then a reference to an empty list will be returned. If the supplied name does not correspond to any entity, then undef will be returned. Because the name can be any of the databaseId, the standard name, or any of the aliases, it is possible that the name might be ambiguous. Clients of this object should first test whether the name they are using is ambiguous, using the nameIsAmbiguous() method, and handle it accordingly. If an ambiguous name is supplied, then it will die.

Usage:

    my $goidsRef = $annotationParser->goIdsByName(name   => $name,
						  aspect => <P|F|C>);

Methods for mapping different types of name to each other

standardNameByDatabaseId

This method returns the standard name for a database id.

Usage:

my $standardName = $annotationParser->standardNameByDatabaseId($databaseId);

databaseIdByStandardName

This method returns the database id for a standard name.

Usage:

my $databaseId = $annotationParser->databaseIdByStandardName($standardName);

databaseIdByName

This method returns the database id for any identifier for a gene (eg by databaseId itself, by standard name, or by alias). If the used name is ambiguous, then the program will die. Thus clients should call the nameIsAmbiguous() method, prior to using this method. If the name does not map to any databaseId, then undef will be returned.

Usage:

my $databaseId = $annotationParser->databaseIdByName($name);

standardNameByName

This public method returns the standard name for the the gene specified by the given name. Because a name may be ambiguous, the nameIsAmbiguous() method should be called first. If an ambiguous name is supplied, then it will die with an appropriate error message. If the name does not map to a standard name, then undef will be returned.

Usage:

my $standardName = $annotationParser->standardNameByName($name);

Other methods relating to names

nameIsStandardName

This method returns a boolean to indicate whether the supplied name is used as a standard name.

Usage :

    if ($annotationParser->nameIsStandardName($name)){

	# do something

    }

nameIsDatabaseId

This method returns a boolean to indicate whether the supplied name is used as a database id.

Usage :

    if ($annotationParser->nameIsDatabaseId($name)){

	# do something

    }

nameIsAnnotated

This method returns a boolean to indicate whether the supplied name has any annotations, either when considered as a databaseId, a standardName, or an alias. If an aspect is also supplied, then it indicates whether that name has any annotations in that aspect only.

Usage :

    if ($annotationParser->nameIsAnnotated(name => $name)){

	# blah

    }

or:

    if ($annotationParser->nameIsAnnotated(name   => $name,
					   aspect => $aspect)){

	# blah

    }

Other public methods

databaseName

This method returns the name of the annotating authority from the file that was supplied to the constructor.

Usage :

my $databaseName = $annotationParser->databaseName();

numAnnotatedGenes

This method returns the number of entities in the annotation file that have annotations in the supplied aspect. If no aspect is provided, then it will return the number of genes with an annotation in at least one aspect of GO.

Usage:

my $numAnnotatedGenes = $annotationParser->numAnnotatedGenes();

my $numAnnotatedGenes = $annotationParser->numAnnotatedGenes($aspect);

allDatabaseIds

This public method returns an array of all the database identifiers

Usage:

my @databaseIds = $annotationParser->allDatabaseIds();

allStandardNames

This public method returns an array of all standard names.

Usage:

my @standardNames = $annotationParser->allStandardNames();

Methods to do with files

file

This method returns the name of the file that was used to instantiate the object.

Usage:

my $file = $annotationParser->file;

serializeToDisk

This public method saves the current state of the Annotation Parser Object to a file, using the Storable package. The data are saved in network order for portability, just in case. The name of the object file is returned. By default, the name of the original file will be used to make the name of the object file (including the full path from where the file came), or the client can instead supply their own filename.

Usage:

my $fileName = $annotationParser->serializeToDisk;

my $fileName = $annotationParser->serializeToDisk(filename => $filename);

AUTHORS

Elizabeth Boyle, ell@mit.edu

Gavin Sherlock, sherlock@genome.stanford.edu