NAME

Bio::GFF3::LowLevel - fast, low-level functions for parsing and formatting GFF3

SYNOPSIS

use Bio::GFF3::LowLevel qw/ gff3_parse_feature /;

open my $gff3_fh, 'myfile.gff3' or die;
while( <$gff3_fh> ) {
  next if /^#/;
  my $feat = gff3_parse_feature( $_ );
}

DESCRIPTION

These are low-level, fast functions for parsing GFF version 3 files. All they do is convert back and forth between low-level Perl data structures and GFF3 text.

Sometimes this is what you need when you are just doing simple transformations on GFF3. I found myself writing these functions over and over again, until I finally got fed up enough to just package them up properly.

These functions do no validation, do not reconstruct feature hierarchies, or anything like that. If you want that, use Bio::FeatureIO.

All of the functions in this module are EXPORT_OK, meaning that you can add their name after using this module to make them available in your namespace.

FUNCTIONS

gff3_parse_feature( $line )

Given a string containing a GFF3 feature line (i.e. not a comment), parses it and returns a hashref of its information, of the form:

{
    seq_id => 'chr02',
    source => 'AUGUSTUS',
    type   => 'transcript',
    start  => '23486',
    end    => '48209',
    score  => '0.02',
    strand => '+',
    phase  => undef,
    attributes => {
        ID => [
            'chr02.g3.t1'
          ],
        Parent => [
            'chr02.g3'
          ],
      },
}

Note that all values are simple scalars, except for attributes, which is a hashref as returned by "gff3_parse_attributes" below.

Unescaping is performed according to the GFF3 specification.

gff3_parse_attributes( $attr_string )

Given a GFF3 attribute string, parse it and return a hashref of its data, of the form:

{
  'attribute_name' => [ value, value, ... ],
  ...
}

Always returns a hashref. If the passed attribute string is undefined, or ".", the hashref returned will be empty. Attribute values are always arrayrefs, even if they have only one value.

gff3_parse_directive( $line )

Parse a GFF3 directive/metadata line. Returns a hashref as:

{  directive => 'directive-name',
   value     => 'the contents of the directive'
}

Or nothing if the line could not be parsed as a GFF3 directive.

In addition, sequence-region and genome-build directives are parsed further. sequence-region hashrefs have additional seq_id, start, and end keys, and genome-build hashrefs have additional source and buildname keys

gff3_format_feature( \%fields )

Given a hashref of feature information in the same format returned by "gff3_parse_feature" above, constructs a correctly-escaped line of GFF3 encoding that information.

The line ends with a single newline character, a UNIX-style line ending, regardless of the local operating system.

gff3_format_attributes( \%attrs )

Given a hashref of GFF3 attributes in the same format returned by "gff3_parse_attributes" above, returns a correctly formatted and escaped GFF3 attribute string (the 9th column of a GFF3 feature line) encoding those attributes.

For convenience, single-valued attributes can have simple scalars as values in the passed hashref. For example, if a feature has only one ID attribute (as it should), you can pass { ID => 'foo' } instead of { ID => ['foo'] }}.

gff3_escape( $string )

Given a string, escapes special characters in that string according to the GFF3 specification.

gff3_unescape( $string )

Unescapes a GFF3-escaped string.

AUTHOR

Robert Buels <rmb32@cornell.edu>

COPYRIGHT AND LICENSE

This software is copyright (c) 2012 by Robert Buels.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.