NAME
GO::OntologyProvider::OntologyParser - Provides API for retrieving data from Gene Ontology files
SYNOPSIS
use GO::OntologyProvider::OntologyParser;
my $ontology = GO::OntologyProvider::OntologyParser->new(ontologyFile => "process.ontology");
print "The ancestors of GO:0006177 are:\n";
my $node = $ontology->nodeFromId("GO:0006177");
foreach my $ancestor ($node->ancestors){
print $ancestor->goid, " ", $ancestor->term, "\n";
}
$ontology->printOntology();
DESCRIPTION
GO::OntologyProvider::OntologyParser implements the interface defined by GO::OntologyProvider, and parses a gene ontology file (GO) in plain text (not XML) format. These files can be obtained from the Gene Ontology Consortium web site, http://www.geneontology.org/. From the information in the file, it creates a directed acyclic graph (DAG) structure in memory. This means that GO terms are arranged into tree-like structures where each GO node can have multiple parent nodes and multiple child nodes.
This data structure can be used in conjunction with files in which certain genes are annotated to corresponding GO nodes.
Each GO ID (e.g. "GO:1234567") has associated with it a GO node. That GO node contains the name of the GO term, a list of the nodes directly above the node ("parent nodes"), and a list of the nodes directly below the current node ("child nodes"). The "ancestor nodes" of a certain node are all of the nodes that are in a path from the current node to the root of the ontology, with all repetitions removed.
The format of the GO file is as follows (taken from http://www.geneontology.org/doc/GO.doc.html)
Comment lines:
Lines that begin ! are comment lines.
$ lines:
Line in which the first non-space character is a $ either reflect the domain and aspect of the ontology (i.e. $text) or the end of file (i.e. the $ character on a line by itself).
Versioning:
The first lines of each file after any html header information (in *.html files) always carry information about the version, the date of last update, (optionally) the source of the file, the name of the database, the domain of the file and the editors of the file, e.g.:
! !Gene Ontology ![domain of file] ! !editors: Michael Ashburner (FlyBase), Midori Harris (GO), Judith Blake (MGD) !Leonore Reiser (TAIR), Karen Christie (SGD) and colleagues !with software by Suzanna Lewis (FlyBase Berkeley).
Syntax:
Parent-child relationships between terms are represented by indentation:
parent_term
child_term
Instance relationship:
%term0
%term1 % term2
To be read as term1 being an instance of term0 and also an instance of term2. Part of relationship:
%term0
%term1 < term2 < term3
To be read as term1 being an instance of term0 and also a part-of of term2 and term3.
Line syntax (showing the order in which items appear on a line; * indicates optional item):
< | % term [; db cross ref]* [; synonym:text]* [ < | % term]*
Instance Constructor
new
This is the constructor for an OntologyParser object. The constructor expects one of two arguments, either an 'ontologyFile' argument, or an 'objectFile' argument. When instantiated with an ontologyFile argument, it expects it to correspond to an ontology file created by the GO consortium, according to their file format. When instantiated with an objectFile argument, it expects to open a previously created ontologyParser object that has been serialized to disk (see serializeToDisk).
Usage:
my $ontology = GO::OntologyProvider::OntologyParser->new(ontologyFile => $ontologyFile);
my $ontology = GO::OntologyProvider::OntologyParser->new(objectFile => $objectFile);
Instance Methods
printOntology
This prints out the ontology, with redundancies, to STDOUT. It does not yet print out all of the ontology information (like relationship type etc). This method will be likely be removed in a future version, so should not be relied upon.
Usage:
$ontologyParser->printOntology;
allNodes
This method returns an array of all the GO:Nodes that have been created.
Usage:
my @nodes = $ontologyParser->allNodes;
rootNode
This returns the root node in the ontology.
my $rootNode = $ontologyParser->rootNode;
nodeFromId
This public method takes a GOID and returns the GO::Node that it corresponds to.
Usage :
my $node = $ontologyParser->nodeFromId($goid);
If the GOID does not correspond to a GO node, then undef will be returned. Note if you try to call any methods on an undef, you will get a fatal runtime error, so if you can't guarantee all GOIDs that you supply are good, you should check that the return value from this method is defined.
numNodes
This public method returns the number of nodes that exist with the ontology
Usage :
my $numNodes = $ontologyParser->numNodes;
serializeToDisk
Saves the current state of the Ontology Parser Object to a file, using the Storable package. Saves in network order for portability, just in case. Returns the name of the file. If no filename is provided, then the name of the file (and its directory, if one was provided) used for object construction, will be used, with .obj appended. If the object was instantiated from a file with a .obj suffix, then the same filename would be used, if none were provided.
This method currently causes a segfault on MacOSX (at least 10.1.5 -> 10.2.3), with perl 5.6, and Storable 1.0.14, when trying to store the process ontology. This failure occurs using either store, or nstore, and is manifested by a segmentation fault. It has not been investigated whether this is a perl problem, or a Storable problem (which has large amounts of C-code). This does not cause a segmentation on Solaris, using perl 5.6.1 and Storable 1.0.13. This does not make it clear whether it is a MacOSX problem or a perl problem or not. It should be noted that newer versions of both perl and Storable exist, and the code should be tested with those as well.
Usage:
my $objectFile = $ontologyParser->serializeToDisk(filename=>$filename);
Authors
Gavin Sherlock; sherlock@genome.stanford.edu
Elizabeth Boyle; ell@mit.edu