NAME

Bio::GFF3::LowLevel::Parser - a fast, low-level gff3 parser

SYNOPSIS

my $p = Bio::GFF3::LowLevel::Parser->open( $file_or_fh );

while( my $i = $p->next_item ) {

    if( ref $i eq 'ARRAY' ) {
        ## $i is an arrayref of feature lines that have the same ID,
        ## in the same format as returned by
        ## Bio::GFF3::LowLevel::gff3_parse_feature
        for my $f (@$i) {
           # for each location of this feature
           # do something with it
        }
    }
    elsif( $i->{directive} ) {
        if( $i->{directive} eq 'FASTA' ) {
            my $fasta_filehandle = $i->{filehandle};
            ## parse the FASTA in the filehandle with BioPerl or
            ## however you want.  or ignore it.
        }
        elsif( $i->{directive} eq 'gff-version' ) {
            print "it says it is GFF version $i->{value}\n";
        }
        elsif( $i->{directive} eq 'sequence-region' ) {
            print( "found a sequence-region, sequence $i->{seq_id},",
                   " from $i->{start} to $i->{end}\n"
                 );
        }
    }
    elsif( $i->{comment} ) {
        ## this is a comment in your GFF3 file, in case you want to do
        ## something with it.
        print "that comment said: '$i->{comment}'\n";
    }
    else {
        die 'this should never happen!';
    }

}

DESCRIPTION

This is a fast, low-level parser for Generic Feature Format, version 3 (GFF3). It is a low-level parser, it only returns dumb hashrefs. It does reconstruct feature hierarchies, however, using features' ID, Parent, and Derives_from attributes, and it does group together lines with the same ID (i.e. features that have multiple locations).

Features

Features are returned as arrayrefs containing one or more (never zero) feature lines parsed in the same format as "gff3_parse_feature" in Bio::GFF3::LowLevel. Each has some additional keys for related features: child_features and derived_features, each of which is a (possibly empty) arrayref of features (i.e. arrayrefs) that refer to this one as a Parent or claim that they Derives_from it.

Note that, to make code that uses this parser easier to write, all features have child_features and derived_features arrayrefs. This means you don't have to check for the existence of these before seeing if they have anything in them.

Directives

Directives are returned as hashrefs, in the same format as "gff3_parse_directive" in Bio::GFF3::LowLevel.

Comments

Comments are parsed into a hashref of the form:

{ comment => 'text of the comment, not including the hash mark(s) and ending newline' }

FUNCTIONS

open( $file_or_filehandle, ... )

Make a new parser object that will parse the GFF3 from all of the files or filehandles that you give it, as if they were all a single stream.

max_lookback( $features )

Set a maximum number of features the parser will keep buffered in case there are features later in the file referring to it. By default, there is no limit, with the parser instead relying on the presence of '###' marks in the GFF3 file.

new

Returns a wrapped copy of this parser that returns data that is backward-compatible with what the 1.0 version of this parser returned. Do not use in new code.

next_item()

Iterate through all of the items (features, directives, and comments) in the file(s) given to the parser. Features are arrayrefs of hashrefs, and directives and comments are hashrefs.

AUTHOR

Robert Buels <rmb32@cornell.edu>

COPYRIGHT AND LICENSE

This software is copyright (c) 2012 by Robert Buels.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.