NAME

Proc::ParallelLoop - Parallel looping constructs for Perl programs

SYNOPSIS

use Proc::ParallelLoop

pardo sub{loop_test}, sub{loop_update}, sub{
   loop_body
};

pareach array_ref, sub{
   loop_body
};

DESCRIPTION

This module provides a way to easily write for loops and foreach loops that run with a controlled degree of parallelism. One very nice feature is that bufferring is used when necessary such that the output from STDERR and STDOUT looks exactly as if it was produced by running your subroutine on each parameter in plain old sequential fashion. Return status from each loop iteration is also preserved.

USAGE

The degree of parallelism defaults to 5. No more than that many subprocesses will be allowed to run at any time. The default can be overridden by setting {"Max_Workers"=>n} after a loop body.

There are two interfaces to this package: pardo and pareach. The first approximates the semantics of a typical for loop. pareach is more like a typical foreach loop in Perl. (Actually, for and foreach are synonyms in Perl, so I emphasize "typical" because they're usually used as if they have different semantics.)

LOOP CONTROL

The Perl keywords "next" and "last" do not work inside a pardo loop. However, can simulate a "next" statement by using a "return" or "exit n" statement instead. "exit n" will end the current loop iteration, and the value of integer n will be preserved for possible use outside of the pardo loop. "return" has the same effect as "exit 0".

There is no approximation for "last", since it would not really make sense in the context of parallel loop iterations.

BUGS

Signal handlers in Perl are documented to be unreliable. Proc::ParallelLoop avoids relying on signals by making the assumption that a child process closing its output descriptors means the child is finished, and that an IO event will be observable via select when this happens. It remains to be seen whether this will turn out to be a more reliable approach, though it seems to be holding up so far.

AUTHOR

Byron C. Darrah
bdarrah@pacbell.net

COPYRIGHT

Copyright (c) 2002 Byron C. Darrah. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

SEE ALSO

perl(1).

PUBLIC METHODS

You pass pardo() three args: a loop test, an update function, and a loop body. It behaves mostly like a for loop but be careful that your loop test and update functions don't assume sequential execution.

For example:

for (my $i=0; $i<100; $i++) {
   ...
}

can be parallelized as:

{ my $i=0; pardo sub{ $i<100 }, sub{ $i++ }, sub{
   ...
};}

You pass pareach() two args: A subroutine reference and an array of parameters. The subroutine will be called once for each item in the array, with the item passed as the arg.

For example:

foreach my $i ( @stuff ) {
   ...
}

can be parallelized as:

pareach [ @stuff ], sub{
   my $i=shift;
   ...
};

Both pardo and pareach return an array containing the return statuses of each iteration of the loop body, in order as if the loop had been executed sequentially.

PRIVATE METHODS

And of course, here are all the methods you should never call.

wait_for_all_jobs_to_finish

Usage     : wait_for_all_jobs_to_finish()
Purpose   : Wait for pending jobs to finish.
Returns   : N/A.
Argument  : None.
Throws    : No exceptions.
Comments  : Call this just before returning from a pardo-like loop.

init_state

Usage     : init_state()
Purpose   : Initialize global loop state.
Returns   : N/A.
Argument  : None.
Throws    : No exceptions.
Comments  : Note that even though pardo loops may nest, or be used by
          : modules that know nothing of each other, it is safe to
          : use global variables to store the loop state, because:
          :    1.  pardo is a synchronous function which does not
          :        return until it no longer needs the state
          :        information.
          :    2.  Child processes do not depend on the state
          :        variables.
          :    3.  ParallelLoop is not recursive and even if the outer
          :        program calling it is, each pardo task executes in an
          :        isolated subprocess.
          : Of course, pardo is not re-entrant or thread-safe, but if
          : you are doing anything in Perl that could try to invoke
          : pardo from a signal handler or a (non-process) thread,
          : you probably need to see the BOFH about increasing your
          : disk quota.

dispatch

Usage     : dispatch($subroutine, $parm)
Purpose   : Assign a worker process to execute a loop body.
Returns   : N/A.
Argument  : A subroutine representing a loop body, and a parameter to
          : be passed to the loop body as $_[0].
Throws    : No exceptions.
Comments  : If a loop body throws an exception, it will go uncaught.

wait_for_available_queue

Usage     : wait_for_available_queue($slots)
Purpose   : Sleep until we are allowed to start a new subprocess.
Returns   : N/A.
Argument  : Number of queue slots that must be available before returning.
Throws    : No exceptions.

check_for_death()

Usage     : check_for_death()
Purpose   : Nonblocking check and handling for death of any worker
          : process.
Returns   : N/A.
Argument  : Nothing.
Throws    : No exceptions.

See Also : waitpid

handle_event()

Usage     : handle_event()
Purpose   : Wait for a child to die or for output to be available.
Returns   : N/A.
Argument  : None.
Throws    : No exceptions.
Comments  : Makes the assumption that child process death will cause
          : an IO event on that process's output descriptors.

cleanup_worker

Usage     : cleanup_worker($worker_index)
Purpose   : Clean up after a worker has been reaped.
Returns   : N/A.
Argument  : The index of the Proc_Order and other hashes.
Throws    : No exceptions.

reclaim_worker_io

Usage     : reclaim_worker_io($worker_index)
Purpose   : Reclaim resources no longer needed for a long-dead worker process.
Returns   : N/A.
Argument  : Index of the dead worker in Proc_Order and other hashes.
Throws    : No exceptions.

gather_all_output

Usage     : gather_all_output()
Purpose   : Gather any output that may have been produced by child
          : processes and flush the output buffers of the current
          : process.
Returns   : N/A.
Argument  : None.
Throws    : No exceptions.

gather_proc_output

Usage     : gather_proc_output($worker_index)
Purpose   : Collect error and standard output from a worker process.
Returns   : N/A.
Argument  : Index of a worker process in Proc_Order and other hashes.
Throws    : No exceptions.

make_names

Usage     : make_names()
Purpose   : Make up some names for use as file handles.
Returns   : A list of three names.
Argument  : N/A.
Throws    : No exceptions.
Comments  : Reuse reclaimed names when possible, so we don't bloat
          : the symbol table needlessly.