NAME
Parallel::Forker - Parallel job forking and management
SYNOPSIS
use Parallel::Forker;
$Fork = new Parallel::Forker (use_sig_child=>1);
$SIG{CHLD} = sub { Parallel::Forker::sig_child($Fork); };
$SIG{TERM} = sub { $Fork->kill_tree_all('TERM') if $Fork && $Fork->in_parent; die "Quitting...\n"; };
$Fork->schedule
(run_on_start => sub {print "child work here...";},
# run_on_start => \&child_subroutine, # Alternative: call a named sub.
run_on_finish => sub {print "parent cleanup here...";},
)->run();
$Fork->wait_all(); # Wait for all children to finish
# More processes
my $p1 = $Fork->schedule(...)->ready();
my $p2 = $Fork->schedule(..., run_after=>[$p1])->ready();
$Fork->wait_all(); # p1 will complete before p2 starts
# Other functions
$Fork->poll(); # Service any active children
foreach my $proc ($Fork->running()) { # Loop on each running child
while ($Fork->is_any_left) {
$Fork->poll;
usleep(10*1000);
}
DESCRIPTION
Parallel::Forker manages parallel processes that are either subroutines or system commands. Forker supports most of the features in all the other little packages out there, with the addition of being able to specify complicated expressions to determine which processes run after others, or run when others fail.
Function names are loosely based on Parallel::ForkManager.
The unique property of Parallel::Forker is the ability to schedule processes based on expressions that are specified when the processes are defined. For example:
my $p1 = $Fork->schedule(..., label=>'p1');
my $p2 = $Fork->schedule(..., label=>'p2');
my $p3 = $Fork->schedule(..., run_after => ["p1 | p2"]);
my $p4 = $Fork->schedule(..., run_after => ["p1 & !p2"]);
Process p3 is specified to run after process p1 *or* p2 have completed successfully. Process p4 will run after p1 finishes successfully, and process p2 has completed with bad exit status.
For more examples, see the tests.
METHODS
- $self->find_proc_name (<name>)
-
Returns one or more Parallel::Forker::Process objects for the given name (one object returned) or label (one or more objects returned). Returns undef if no processes are found.
- $self->in_parent
-
Return true if and only if called from the parent process (the one that created the Forker object).
- $self->is_any_left
-
Return true if any processes are running, or runnable (need to run).
- $self->kill_all (<signal>)
-
Send a signal to all running children. You probably want to call this only from the parent process that created the Parallel::Forker object, wrap the call in "if ($self->in_parent)."
- $self->kill_tree_all (<signal>)
-
Send a signal to all running children and their subchildren.
- $self->max_proc (<number>)
-
Specify the maximum number of processes that the poll method will run at any one time. Defaults to undef, which runs all possible jobs at once. Max_proc takes effect when you schedule processes and mark them "ready," then rely on Parallel::Forker's poll method to move the processes from the ready state to the run state. (You should not call ->run yourself, as this starts a new process immediately, ignoring max_proc.)
- $self->new (<parameters>)
-
Create a new manager object. There may be more than one manager in any application, but applications taking advantage of the sig_child handler should call every manager's
sig_child
method in the application'sSIGCHLD
handler.Parameters are passed by name as follows:
- $self->poll
-
See if any children need work, and service them. Start up to max_proc processes that are "ready" by calling their run method. Non-blocking; always returns immediately.
- $self->process (<process_name>)
-
Return Parallel::Forker::Process object for the specified process name, or undef if none is found. See also find_proc_name.
- $self->processes
-
Return Parallel::Forker::Process objects for all processes.
- $self->processes_sorted
-
Return Parallel::Forker::Process objects for all processes, sorted by name.
- $self->ready_all
-
Mark all processes as ready for scheduling.
- $self->reap_processes
-
Reap all processes which have no other processes waiting for them, and the process is is_done or is_parerr. Returns list of processes reaped. This reclaims memory for when a large number of processes are being created, run, and destroyed.
- $self->running
-
Return Parallel::Forker::Process objects for all processes that are currently running.
- $self->schedule (<parameters>)
-
Register a new process perhaps for later running. Returns a Parallel::Forker::Process object. Parameters are passed by name as follows:
- label
-
Optional name to use in
run_after
commands. Unlikename
, this may be reused, in which caserun_after
will wait on all commands with the given label. Labels must contain only [a-zA-Z0-9_]. - name
-
Optional name to use in
run_after
commands. Note that names MUST be unique! When not specified, a unique number will be assigned automatically. - run_on_start
-
Subroutine reference to execute when the job begins, in the forked process. The subroutine is called with one argument, a reference to the Parallel::Forker::Process that is starting.
If your callback is going to fork, you'd be advised to have the child:
$SIG{ALRM} = 'DEFAULT'; $SIG{CHLD} = 'DEFAULT';
This will prevent the child from inheriting the parent's handlers, and possibly confusing any child calls to waitpid.
- run_on_finish
-
Subroutine reference to execute when the job ends, in the master process. The subroutine is called with two arguments, a reference to the Parallel::Forker::Process that is finishing, and the exit status of the child process. Note the exit status will only be correct if a CHLD signal handler is installed.
- run_after
-
A list reference of processes that must be completed before this process can be runnable. You may pass a process object (from schedule), a process name, or a process label. You may use "|" or "&" in a string to run this process after ANY processes exit, or after ALL exit (the default.) ! in front of a process name indicates to run if that process fails with bad exit status. ^ in front of a process indicates to run if that process succeeds OR fails.
- $self->sig_child
-
Must be called in a
$SIG{CHLD}
handler by the parent process ifuse_sig_child
was called with a "true" value. If there are multiple Parallel::Forker objects each of theirsig_child
methods must be called in the$SIG{CHLD}
handler. - $self->state_stats
-
Return hash containing statistics with keys of state names, and values with number of processes in each state.
- $self->use_sig_child ( 0 | 1 )
-
This should always be called with a 0 or 1. If you install a
$SIG{CHLD}
handler which calls your Parallel::Forker object'ssig_child
method, you should also turn onuse_sig_child
, by calling it with a "true" argument. Then, calls topoll()
will do less work when there are no children processes to be reaped. If not using the handler call with 0 to prevent a warning. - $self->wait_all
-
Wait until there are no running or runable jobs left.
- $self->write_tree (filename => <filename>)
-
Print a dump of the execution tree.
DISTRIBUTION
The latest version is available from CPAN and from http://www.veripool.org/.
Copyright 2002-2010 by Wilson Snyder. This package is free software; you can redistribute it and/or modify it under the terms of either the GNU Lesser General Public License Version 3 or the Perl Artistic License Version 2.0.
AUTHORS
Wilson Snyder <wsnyder@wsnyder.org>