NAME
Text::Tidx - Index a delimited text file containing start-stop positions
SYNOPSIS
use Text::Tidx;
Text::Tidx::build("annot.txt");
$idx = Text::Tidx->new("annot.txt");
print $idx->query("chr1",240034);
FUNCTION
new(FILE)
Loads an index from a file.
query(CHR, POS)
Query a loaded index, returning an array of text lines corresponding to the specified chr string and integer pos.
build(FILE [, option1=>value1, ...])
Builds an index. Default is to index on the first 3 columns.
The following options may be used:
- sep
-
Field separator, default to a tab
- chr
-
1-based index of the string key field, can be -1 for "Not applicable", default is 1
- beg
-
1-based index of the field containing the start of the integer numeric range, default is 2
- end
-
1-based index of the field containing the end of the integer numeric range, default is 3
- skip
-
If an integer, then it is the number of rows to skip. If it's a character, then skips all rows beginning with that character. Default is '#', skipping comment chars (compatible with gffs, vcfs, etc.)
- sub_e
-
If nonzero, then the "end" of the range is not included in the range, ie: one is subtracted from the end positions.
DESCRIPTION
Text:Tidx allows you to index any text file using a key field and range coordinates, and, later, use that index for O(log(n)) range-lookups into the file.
This was written because it was, for me significantly faster, for very large files (>100k rows) and many searches ( > 10k), then entering all of the information into a database and doing range querys, even faster than SQLITE's rtree extension, or the "tabix" program both of which are do similar things and do them rather well.
Although it was designed for chromosome, stop, start indexing, it is not genome specific, and can index any delimited text file.
Indexes are loaded into RAM. If you only have a few lookups to do perl instance, this is expensive, and a database will be faster.
AUTHOR
Erik Aronesty, <earonesty@cpan.org<gt>
COPYRIGHT AND LICENSE
Copyright (C) 2012 by Erik Aronesty
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.1 or, at your option, any later version of Perl 5 you may have available.