NAME
Iterator::Merger - an iterator to efficiently merge sorted iterators
SYNOPSIS
use Iterator::Merger qw(imerge);
# $ite1 and $ite2 are iterators generating sorted output
# These can be code references or filehandlers
my $merger = imerge($ite1, $ite2);
while (my $next = $merger->()) {
print $next, "\n";
}
# example with two files and a sub:
open(my $fh1, '<', 'foo.sorted') or die $!;
open(my $fh2, '<', 'bar.sorted') or die $!;
my $sub = do {
my $i = 0;
sub {
++$i<100_000_000 && sprintf("%08d", $i)
}
};
my $merger = imerge($fh1, $fh2, $sub);
while (my $next = $merger->()) {
print $next, "\n";
}
DESCRIPTION
Iterator::Merger
generates the fastest possible iterator to merge sorted iterators (merge sort). Source iterators can be code references or filehandlers, and must produce a sorted output. The generated iterator is always a code reference.
See the IMPLEMENTATION section for more details.
- imerge ITERATORS
-
ITERATORS is a list of either CODE or GLOB references that must generate an ascending sorted output. This function returns a CODE reference that generates the sort-merged output of all ITERATORS. Sorting is done using the cmp comparison operator.
CODE iterators are evaluated in scalar context until undef is returned.
GLOB iterators are evaluated in scalar context, relying on the $/ separator, until EOF.
- imerge_num ITERATORS
-
Same as
imerge
, but sorting is done using the <=> comparison operator. - imerge_raw ITERATORS
-
Same as
imerge
, but no sorting is done, nor expected in source iterators. Iteratos are read in order until undef is returned or EOF is reached.
EXPORTS
Nothing by default. Functions can be imported explicitely
use Iterator::Merger qw(imerge imerge_num imerge_raw);
IMPLEMENTATION
imerge
and imerge_num
are very fast as the PErl comparison code is generated on the fly, but the price to pay is that any first call with a given number of iterators will take some time and memory to generate and evaluate the code. Subsequent calls with the same number of source iterators will reuse the same code.
The cost of this overhead has to be evaluated in comparison to the number of sorted elements to merge.
The size of the generated code doubles for each additionnal source iterator, up to a limit.
For example the comparison code for 9 iterators represents about 15KiB of minified Perl code.
When the number of source iterators exceeds 9, imerge
and imerge_num
ressort to using the very fast Array::Heap module. This limit was empirically chosen as a safe point where using a heap is almost as efficient as using "brute-force" generated comparison, without any generation overhead.
This limit can be changed using the $Iterator::Merger::Max_generate variable.
TODO
XS version with on the fly C code generation?
SEE ALSO
Sort::Merge, generally much slower but can be an interesting alternative when a small number of elements is to be merged, as it does not have any generation overhead.
AUTHOR
Thomas Drugeon, <tdrugeon@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2021 by Thomas Drugeon
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.32 or, at your option, any later version of Perl 5 you may have available.
2 POD Errors
The following errors were encountered while parsing the POD:
- Around line 65:
You forgot a '=back' before '=head1'
- Around line 72:
=back without =over