NAME

Parallel::Forker - Parallel job forking and management

SYNOPSIS

use Parallel::Forker;
$Fork = new Parallel::Forker (use_sig_child=>1);
$SIG{CHLD} = sub { Parallel::Forker::sig_child($Fork); };
$SIG{TERM} = sub { $Fork->kill_tree_all('TERM') if $Fork && $Fork->in_parent; die "Quitting...\n"; };

$Fork->schedule
   (run_on_start => sub {print "child work here...";},
    # run_on_start => \&child_subroutine,  # Alternative: call a named sub.
    run_on_finish => sub {print "parent cleanup here...";},
    )->run;

$Fork->wait_all;   # Wait for all children to finish

# More processes
my $p1 = $Fork->schedule(...)->ready;
my $p2 = $Fork->schedule(..., run_after=>[$p1])->ready;
$Fork->wait_all;   # p1 will complete before p2 starts

# Other functions
$Fork->poll;       # Service any active children
foreach my $proc ($Fork->running) {  # Loop on each running child

while ($Fork->is_any_left) {
    $Fork->poll;
    usleep(10*1000);
}

DESCRIPTION

Parallel::Forker manages parallel processes that are either subroutines or system commands. Forker supports most of the features in all the other little packages out there, with the addition of being able to specify complicated expressions to determine which processes run after others, or run when others fail.

Function names are loosely based on Parallel::ForkManager.

The unique property of Parallel::Forker is the ability to schedule processes based on expressions that are specified when the processes are defined. For example:

my $p1 = $Fork->schedule(..., label=>'p1');
my $p2 = $Fork->schedule(..., label=>'p2');
my $p3 = $Fork->schedule(..., run_after => ["p1 | p2"]);
my $p4 = $Fork->schedule(..., run_after => ["p1 & !p2"]);

Process p3 is specified to run after process p1 *or* p2 have completed successfully. Process p4 will run after p1 finishes successfully, and process p2 has completed with bad exit status.

For more examples, see the tests.

METHODS

$self->find_proc_name(<name>)

Returns one or more Parallel::Forker::Process objects for the given name (one object returned) or label (one or more objects returned). Returns undef if no processes are found.

$self->in_parent

Return true if and only if called from the parent process (the one that created the Forker object).

$self->is_any_left

Return true if any processes are running, or runnable (need to run).

$self->kill_all(<signal>)

Send a signal to all running children. You probably want to call this only from the parent process that created the Parallel::Forker object, wrap the call in "if ($self->in_parent)."

$self->kill_tree_all(<signal>)

Send a signal to all running children and their subchildren.

$self->poll_interval(<usec>)

Set the time in microseconds between polls when using wait_all. Default is 100000 usec (10 microseconds), smaller numbers may improve performance when jobs complete quickly.

$self->max_proc(<number>)

Specify the maximum number of processes that the poll method will run at any one time. Defaults to undef, which runs all possible jobs at once. Max_proc takes effect when you schedule processes and mark them "ready," then rely on Parallel::Forker's poll method to move the processes from the ready state to the run state. (You should not call ->run yourself, as this starts a new process immediately, ignoring max_proc.)

$self->new(<parameters>)

Create a new manager object. There may be more than one manager in any application, but applications taking advantage of the sig_child handler should call every manager's sig_child method in the application's SIGCHLD handler.

Parameters are passed by name as follows:

max_proc => (<number>)

See the max_proc object method.

use_sig_child => ( 0 | 1 )

See the use_sig_child object method. This option must be specified to prevent a warning.

$self->poll

See if any children need work, and service them. Start up to max_proc processes that are "ready" by calling their run method. Non-blocking; always returns immediately.

$self->process(<process_name>)

Return Parallel::Forker::Process object for the specified process name, or undef if none is found. See also find_proc_name.

$self->processes

Return Parallel::Forker::Process objects for all processes.

$self->processes_sorted

Return Parallel::Forker::Process objects for all processes, sorted by name.

$self->ready_all

Mark all processes as ready for scheduling.

$self->reap_processes

Reap all processes which have no other processes waiting for them, and the process is is_done or is_parerr. Returns list of processes reaped. This reclaims memory for when a large number of processes are being created, run, and destroyed.

$self->running

Return Parallel::Forker::Process objects for all processes that are currently running.

$self->schedule(<parameters>)

Register a new process perhaps for later running. Returns a Parallel::Forker::Process object. Parameters are passed by name as follows:

label

Optional name to use in run_after commands. Unlike name, this may be reused, in which case run_after will wait on all commands with the given label. Labels must contain only [a-zA-Z0-9_].

name

Optional name to use in run_after commands. Note that names MUST be unique! When not specified, a unique number will be assigned automatically.

run_on_start

Subroutine reference to execute when the job begins, in the forked process. The subroutine is called with one argument, a reference to the Parallel::Forker::Process that is starting.

If your callback is going to fork, you'd be advised to have the child:

$SIG{ALRM} = 'DEFAULT';
$SIG{CHLD} = 'DEFAULT';

This will prevent the child from inheriting the parent's handlers, and possibly confusing any child calls to waitpid.

run_on_finish

Subroutine reference to execute when the job ends, in the master process. The subroutine is called with two arguments, a reference to the Parallel::Forker::Process that is finishing, and the exit status of the child process. Note the exit status will only be correct if a CHLD signal handler is installed.

run_pre_start

Subroutine reference to execute before forking the child, in the master process. The subroutine is called with one argument, a reference to the Parallel::Forker::Process that is starting.

run_after

A list reference of processes that must be completed before this process can be runnable. You may pass a process object (from schedule), a process name, or a process label. You may use "|" or "&" in a string to run this process after ANY processes exit, or after ALL exit (the default.) ! in front of a process name indicates to run if that process fails with bad exit status. ^ in front of a process indicates to run if that process succeeds OR fails.

$self->sig_child

Must be called in a $SIG{CHLD} handler by the parent process if use_sig_child was called with a "true" value. If there are multiple Parallel::Forker objects each of their sig_child methods must be called in the $SIG{CHLD} handler.

$self->state_stats

Return hash containing statistics with keys of state names, and values with number of processes in each state.

$self->use_sig_child( 0 | 1 )

This should always be called with a 0 or 1. If you install a $SIG{CHLD} handler which calls your Parallel::Forker object's sig_child method, you should also turn on use_sig_child, by calling it with a "true" argument. Then, calls to poll() will do less work when there are no children processes to be reaped. If not using the handler call with 0 to prevent a warning.

$self->wait_all

Wait until there are no running or runable jobs left.

$self->write_tree(filename => <filename>)

Print a dump of the execution tree.

DISTRIBUTION

The latest version is available from CPAN and from https://www.veripool.org/parallel-forker.

Copyright 2002-2020 by Wilson Snyder. This package is free software; you can redistribute it and/or modify it under the terms of either the GNU Lesser General Public License Version 3 or the Perl Artistic License Version 2.0.

AUTHORS

Wilson Snyder <wsnyder@wsnyder.org>

SEE ALSO

Parallel::Forker::Process