NAME
Bio::GFF3::LowLevel::Parser - a fast, low-level gff3 parser
SYNOPSIS
my $p = Bio::GFF3::LowLevel::Parser->open( $file_or_fh );
while( my $i = $p->next_item ) {
if( ref $i eq 'ARRAY' ) {
## $i is an arrayref of feature lines that have the same ID,
## in the same format as returned by
## Bio::GFF3::LowLevel::gff3_parse_feature
for my $f (@$i) {
# for each location of this feature
# do something with it
}
}
elsif( $i->{directive} ) {
if( $i->{directive} eq 'FASTA' ) {
my $fasta_filehandle = $i->{filehandle};
## parse the FASTA in the filehandle with BioPerl or
## however you want. or ignore it.
}
elsif( $i->{directive} eq 'gff-version' ) {
print "it says it is GFF version $i->{value}\n";
}
elsif( $i->{directive} eq 'sequence-region' ) {
print( "found a sequence-region, sequence $i->{seq_id},",
" from $i->{start} to $i->{end}\n"
);
}
}
elsif( $i->{comment} ) {
## this is a comment in your GFF3 file, in case you want to do
## something with it.
print "that comment said: '$i->{comment}'\n";
}
else {
die 'this should never happen!';
}
}
DESCRIPTION
This is a fast, low-level parser for Generic Feature Format, version 3 (GFF3). It is a low-level parser, it only returns dumb hashrefs. It does reconstruct feature hierarchies, however, using features' ID
, Parent
, and Derives_from
attributes, and it does group together lines with the same ID (i.e. features that have multiple locations).
Features
Features are returned as arrayrefs containing one or more (never zero) feature lines parsed in the same format as "gff3_parse_feature" in Bio::GFF3::LowLevel. Each has some additional keys for related features: child_features
and derived_features
, each of which is a (possibly empty) arrayref of features (i.e. arrayrefs) that refer to this one as a Parent
or claim that they Derives_from
it.
Note that, to make code that uses this parser easier to write, all features have child_features
and derived_features
arrayrefs. This means you don't have to check for the existence of these before seeing if they have anything in them.
Directives
Directives are returned as hashrefs, in the same format as "gff3_parse_directive" in Bio::GFF3::LowLevel.
Comments
Comments are parsed into a hashref of the form:
{ comment => 'text of the comment, not including the hash mark(s) and ending newline' }
FUNCTIONS
open( $file_or_filehandle, ... )
Make a new parser object that will parse the GFF3 from all of the files or filehandles that you give it, as if they were all a single stream.
max_lookback( $features )
Set a maximum number of features the parser will keep buffered in case there are features later in the file referring to it. By default, there is no limit, with the parser instead relying on the presence of '###' marks in the GFF3 file.
new
Returns a wrapped copy of this parser that returns data that is backward-compatible with what the 1.0 version of this parser returned. Do not use in new code.
next_item()
Iterate through all of the items (features, directives, and comments) in the file(s) given to the parser. Features are arrayrefs of hashrefs, and directives and comments are hashrefs.
AUTHOR
Robert Buels <rmb32@cornell.edu>
COPYRIGHT AND LICENSE
This software is copyright (c) 2012 by Robert Buels.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.