NAME

File::FlexSort - Perl extension for sorting distributed, ordered files

SYNOPSIS

use File::FlexSort;

my $sort = new PeopleLink::Sort(\@file_list, \&index_extract_function, [\&comparison_function]));

my $line = $sort->next_line;
print "$line\n";

DESCRIPTION

File::FlexSort is a simple solution for returning ordered data which has been that is distributed among several ordered files. An example might be applic- ation server logs which record events from a computing cluster. FlexSort is an easy way to merge / parse / analyze files in this situation. It was built with the usual PERLish thoughts ... ease, intuition, FLEXIBLITY and speed.

Here's how it works ...

As arguments, FlexSort takes a reference to an array of filepaths/names and a reference to a subroutine. The files are the targets of the sort objects and the with the subroutine determining the sorting sort order. When passed a line (i.e. a scalar) from one of the files, the user supplied subroutine must return a numeric index / key value associated with that line. This value determines the sort order of the files. The files with the

More detail ...

For each file FlexSort opens a IO::File or IO::Zlib object. It then examines the first line of each file and uses the subroutine to extracting an index associated with the line. It creates a stack based on these values sorted by these values.

When 'next_line' is called, FlexSort returns the line with the lowest index value. FlexSort then replenishes the stack, reads a new line from the corresponding file and places it in the proper position for the next call to 'next_line'.

Additional Notes: - By default a single file is read until its index is no longer the lowest value. - If the file ends in .z or .gz then the file is opened with IO::Zlib, instead.

EXAMPLE

   # This program does looks at files found 
   # in /logfiles, returns the records of the
   # files sorted by the date  in mm/dd/yyyy
   # format

   use File::Recurse;
   use File::FlexSort;

   recurse { push(@files, $_) } "/logfiles";

   my $fs = new File::FlexSort(\@files, \&index_sub);
	
   while (my $line = $fs->next_line) {
   		.
		.	some operations on $line
		.
   }


   sub index_sub{

      # Use this to extract a date of
      # the form mm-dd-yyyy.
	 
      my $line = shift;

	 # Be cautious that only the date will be
	 # extracted. 
	 $line =~ /(\d{2})-(\d{2})-(\d{4})/;
	 
	 return "$3$1$2";		# Index is an interger, yyyymmdd
						# lower number will be read first.

   }	
	

TODO

Install a generic comparison function rather than relying on <.

EXPORT

None by default.

AUTHOR

Chris Brown, <chris.brown@alum.calberkeley.edu<gt>

Copyright(c) 2001 Christopher Brown. All rights reserved. This program is free software; you can redistribute it and/or modify it under the terms of the License, distributed with PERL or until I say otherwise. Not intended for evil purposes. Yadda, yadda, yadda ...

SEE ALSO

perl. IO::File. IO::Zlib. Compress::Zlib.