NAME
File::FlexSort - Perl extension for sorting distributed, ordered files
SYNOPSIS
use File::FlexSort;
my $sort = new PeopleLink::Sort(\@file_list, \&index_extract_function, [\&comparison_function]));
my $line = $sort->next_line;
print "$line\n";
DESCRIPTION
File::FlexSort is a simple solution for returning ordered data which has been that is distributed among several ordered files. An example might be applic- ation server logs which record events from a computing cluster. FlexSort is an easy way to merge / parse / analyze files in this situation. It was built with the usual PERLish thoughts ... ease, intuition, FLEXIBLITY and speed.
Here's how it works ...
As arguments, FlexSort takes a reference to an array of filepaths/names and a reference to a subroutine. The files are the targets of the sort objects and the with the subroutine determining the sorting sort order. When passed a line (i.e. a scalar) from one of the files, the user supplied subroutine must return a numeric index / key value associated with that line. This value determines the sort order of the files. The files with the
More detail ...
For each file FlexSort opens a IO::File or IO::Zlib object. It then examines the first line of each file and uses the subroutine to extracting an index associated with the line. It creates a stack based on these values sorted by these values.
When 'next_line' is called, FlexSort returns the line with the lowest index value. FlexSort then replenishes the stack, reads a new line from the corresponding file and places it in the proper position for the next call to 'next_line'.
Additional Notes: - By default a single file is read until its index is no longer the lowest value. - If the file ends in .z or .gz then the file is opened with IO::Zlib, instead.
EXAMPLE
# This program does looks at files found
# in /logfiles, returns the records of the
# files sorted by the date in mm/dd/yyyy
# format
use File::Recurse;
use File::FlexSort;
recurse { push(@files, $_) } "/logfiles";
my $fs = new File::FlexSort(\@files, \&index_sub);
while (my $line = $fs->next_line) {
.
. some operations on $line
.
}
sub index_sub{
# Use this to extract a date of
# the form mm-dd-yyyy.
my $line = shift;
# Be cautious that only the date will be
# extracted.
$line =~ /(\d{2})-(\d{2})-(\d{4})/;
return "$3$1$2"; # Index is an interger, yyyymmdd
# lower number will be read first.
}
TODO
Install a generic comparison function rather than relying on <.
EXPORT
None by default.
AUTHOR
Chris Brown, <chris.brown@alum.calberkeley.edu<gt>
Copyright(c) 2001 Christopher Brown. All rights reserved. This program is free software; you can redistribute it and/or modify it under the terms of the License, distributed with PERL or until I say otherwise. Not intended for evil purposes. Yadda, yadda, yadda ...