NAME
Bio::GFF3::LowLevel - fast, low-level functions for parsing and formatting GFF3
SYNOPSIS
use Bio::GFF3::LowLevel qw/ gff3_parse_feature /;
open my $gff3_fh, 'myfile.gff3' or die;
while( <$gff3_fh> ) {
next if /^#/;
my $feat = gff3_parse_feature( $_ );
}
DESCRIPTION
These are low-level, fast functions for parsing GFF version 3 files. All they do is convert back and forth between low-level Perl data structures and GFF3 text.
Sometimes this is what you need when you are just doing simple transformations on GFF3. I found myself writing these functions over and over again, until I finally got fed up enough to just package them up properly.
These functions do no validation, do not reconstruct feature hierarchies, or anything like that. If you want that, use Bio::FeatureIO.
All of the functions in this module are EXPORT_OK, meaning that you can add their name after using this module to make them available in your namespace.
FUNCTIONS
gff3_parse_feature( $line )
Given a string containing a GFF3 feature line (i.e. not a comment), parses it and returns a hashref of its information, of the form:
{
seq_id => 'chr02',
source => 'AUGUSTUS',
type => 'transcript',
start => '23486',
end => '48209',
score => '0.02',
strand => '+',
phase => undef,
attributes => {
ID => [
'chr02.g3.t1'
],
Parent => [
'chr02.g3'
],
},
}
Note that all values are simple scalars, except for attributes
, which is a hashref as returned by "gff3_parse_attributes" below.
Unescaping is performed according to the GFF3 specification.
gff3_parse_attributes( $attr_string )
Given a GFF3 attribute string, parse it and return a hashref of its data, of the form:
{
'attribute_name' => [ value, value, ... ],
...
}
Always returns a hashref. If the passed attribute string is undefined, or ".", the hashref returned will be empty. Attribute values are always arrayrefs, even if they have only one value.
gff3_parse_directive( $line )
Parse a GFF3 directive/metadata line. Returns a hashref as:
{ directive => 'directive-name',
value => 'the contents of the directive'
}
Or nothing if the line could not be parsed as a GFF3 directive.
In addition, sequence-region
and genome-build
directives are parsed further. sequence-region
hashrefs have additional seq_id
, start
, and end
keys, and genome-build
hashrefs have additional source
and buildname
keys
gff3_format_feature( \%fields )
Given a hashref of feature information in the same format returned by "gff3_parse_feature" above, constructs a correctly-escaped line of GFF3 encoding that information.
The line ends with a single newline character, a UNIX-style line ending, regardless of the local operating system.
gff3_format_attributes( \%attrs )
Given a hashref of GFF3 attributes in the same format returned by "gff3_parse_attributes" above, returns a correctly formatted and escaped GFF3 attribute string (the 9th column of a GFF3 feature line) encoding those attributes.
For convenience, single-valued attributes can have simple scalars as values in the passed hashref. For example, if a feature has only one ID
attribute (as it should), you can pass { ID => 'foo' }
instead of { ID => ['foo'] }}
.
gff3_escape( $string )
Given a string, escapes special characters in that string according to the GFF3 specification.
gff3_unescape( $string )
Unescapes a GFF3-escaped string.
AUTHOR
Robert Buels <rmb32@cornell.edu>
COPYRIGHT AND LICENSE
This software is copyright (c) 2012 by Robert Buels.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.