NAME
Iterator::Merger - an iterator to efficiently mergesort iterators
SYNOPSIS
use Iterator::Merger qw(imerge);
# $ite1 and $ite2 are iterators generating sorted output
# These can be code references or filehandlers
my $merger = imerge($ite1, $ite2);
while (my $next = $merger->()) {
print $next, "\n";
}
# example with two files and a sub:
open(my $fh1, '<', 'foo.sorted') or die $!;
open(my $fh2, '<', 'bar.sorted') or die $!;
my $sub = do {
my $i = 0;
sub {
++$i<100_000_000 && sprintf("%08d", $i)
}
};
my $merger = imerge($fh1, $fh2, $sub);
while (my $next = $merger->()) {
print $next, "\n";
}
DESCRIPTION
Iterator::Merger
generates the fastest possible iterator to merge sorted iterators (merge sort).
Source iterators can be code references or filehandlers, and must produce an ascending order sorted output. The generated iterator is always a code reference.
See the IMPLEMENTATION section for details.
- imerge ITERATORS
-
ITERATORS is a list of either CODE or GLOB references that must generate an ascending sorted output. This function returns a CODE reference that generates the sort-merged output of all ITERATORS. Sorting is done using the
lt
comparison operator.CODE iterators are evaluated in scalar context until undef is returned.
GLOB iterators are evaluated in scalar context, relying on the $/ separator, until EOF.
- imerge_num ITERATORS
-
Same as
imerge
, but sorting is done using the<
comparison operator. - imerge_raw ITERATORS
-
Same as
imerge
, but no sorting is done, nor expected in source iterators. Iteratos are read in order until undef is returned or EOF is reached.
EXPORTS
Nothing by default. Functions can be imported explicitly
use Iterator::Merger qw(imerge imerge_num imerge_raw);
IMPLEMENTATION
imerge
and imerge_num
are very fast as the Perl comparison code is generated on the fly, but the price to pay is that any first call with a given number of iterators will take some time and memory to generate and evaluate the code. Subsequent calls with the same number of source iterators will reuse the same code.
The cost of this overhead has to be evaluated in comparison to the number of sorted elements to merge.
The size of the generated code doubles for each additional source iterator, up to a limit.
For example the comparison code for 9 iterators represents about 15KiB of minified Perl code.
When the number of source iterators exceeds 9, imerge
and imerge_num
resort to using the very fast Array::Heap module, when present. This limit was empirically chosen as a safe point where using a heap is almost as efficient as using "brute-force" generated comparison, without any generation overhead.
When Array::Heap is not present, this limit is raised to 12.
This limit can be changed using the $Iterator::Merger::Max_generate variable.
SEE ALSO
Sort::Merge, generally much slower but can be an interesting alternative when a small number of elements is to be merged, as it does not have any generation overhead.
AUTHOR
Thomas Drugeon, <tdrugeon@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2008-2021 by Thomas Drugeon
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.32 or, at your option, any later version of Perl 5 you may have available.