NAME

Parallel::SubFork::Task - Run Perl functions in forked processes.

SYNOPSIS

use Parallel::SubFork::Task;

# Run a some arbitrary Perl code in a separated process
my $task = Parallel::SubFork::Task->start(\&job, @args);
$task->wait_for();

# Create and execute the task (same as above)
my $task2 = Parallel::SubFork::Task->new(\&job, @args);
$task2->execute();
$task2->wait_for();

# Wait with a live progress
local $| = 1; # Force print to flush the output
my $task3 = Parallel::SubFork::Task->new(\&job, @args);
while ($task3->wait_for(0.5)) {
	print ".";	
}

# Access any of the properties
printf "PID of task was %s\n", $task->pid;
printf "Args of task where %s\n", join(", ", $task->args);
printf "Exit code: %d\n", $task->exit_code;

DESCRIPTION

This module provides a simpler way to run arbitrary Perl code in a different process. This module consists of a fancy wrapper over the system calls fork and waitpid. The idea is to execute any standard Perl function in a different process without any of the inconveniences of managing the forks by hand.

TASK

This module is used to encapsulate a task, i.e. the function to be executed in a different process and it's arguments. In a nutshell a task consists of a reference to a Perl function (\&my_sub) or a closure (sub { 1; }), also known as an anonymous subroutine, and optionally the arguments to provide to that function.

A task also stores some runtime properties such as the PID of the process that executed the code, the exit code and the exit status of the process. These properties can then be inspected by the parent process through their dedicated accessors.

There's also some helper methods that are used to create the child process and to wait for it to resume.

PROCESSES

Keep in mind that the function being executed is run in a different process. This means that any modification performed within that function will only affect the process running the task. This is true even for global variables. All data exchange or communication between the parent the child process has to be implemented manually through standard inter process communication (IPC) mechanisms (see perlipc).

The child process used to executes the Perl subroutines has it's environment left unchanged. This means that all file descriptors, signal handlers and other resources are still available. It's up to the subroutine to prepare it self a proper environment.

RETURN VALUES

The subroutine return's value will be used as the process exit code, this is the only thing that the invoking process will be able to get back from the task without any kind of IPC. This means that the return value should be an integer. Furthermore, since the return value is used as an exit value in this case 0 is considered as successful execution while any other value is usually interpreted as an error.

EXIT

The subroutine is free to raise any exceptions through die or any similar mechanism. If an error is caught by the framework it will be interpreted as an error and an appropriate exit value will be used.

If the subroutine needs to resume it's execution through a the system call exit then consider instead using _exit as defined in the module POSIX. This is because exit not only terminates the current process but it performs some cleanup such as calling the functions registered with atexit and flush all stdio streams before finishing the process. Normally, only the main process should call exit, in the case of a fork the children should finish their execution through POSIX::_exit.

PROCESS WAIT

Waiting for process to finish can be problematic as there are multiple ways for waiting for processes to resume each having it's advantages and disadvantages.

The easiest way is to register a signal handler for CHLD signal. This has the advantage of receiving the child notifications as they happen, the disadvantage is that there's no way to control for which children the notifications will happen. This is quite inconvenient because a lot of the nice built-in functions and operators in Perl such as `ls`, system and even open (when used in conjunction with a |) use child processes for their tasks and this could potentially interfere with such utilities.

Another alternative is to wait for all processes launched but this can also interfere with other processed launched manually through fork.

Finally, the safest way is to wait explicitly only for the processes that we know to have started and nothing else. This there will be no interference with the other processes. This is exactly the approach used by this module.

METHODS

A task defines the following methods:

start

Creates and executes a new task, this is simply a small shortcut for starting new tasks.

In order to manage tasks easily consider using use the module Parallel::SubFork instead.

Parameters:

$code: the code reference to execute in a different process.
@args: the arguments to pass to the code reference (optional).

new

Creates a new task, this is simply a constructor and the task will not be started yet.

The task can latter by started through a call to "execute".

In order to manage tasks easily consider using use the module Parallel::SubFork instead.

Parameters:

$code

The code reference to execute.

@args (optional)

The arguments to pass to the code reference.

code

Accessor to the function (code reference) that will be executed in a different process. This is what the child process will execute.

This function is expected to return 0 for success and any other integer to indicate a failure. The function is free to raise any kind of exception as the framework will catch all exceptions and return an error value instead.

The function will receive it's parameters normally through the variable @_.

pid

The PID of the process executing the subroutine, the child's PID.

exit_code

The exit code of the task, this is the value returned by exit, POSIX::_exit or return.

status

The exit code returned to the parent process as described by wait. The status code can be inspected through the "WAIT"" in "POSIX macros .

args

The arguments that will be given to the subroutine being executed in a separated process. The subroutine will receive this very same arguments through @_.

This method always return it's values as a list and not as an array ref.

execute

Executes the tasks (the code reference encapsulated by this task) in a new process. The code reference will be invoked with the arguments passed in the constructor.

This method performs the actual fork and returns automatically for the invoker, while the child process will start to execute the code in defined in the code reference. Once the subroutine has finished the child process will resume right away.

The invoker (the parent process) should call "wait_for" in order to wait for the child process to finish and obtain it's exit value.

wait_for

Waits until the process running the task (the code reference) has finished. By default this method waits forever until task resumes either naturally or due to an error.

If a parameter is passed then it is assumed to be the number of seconds to wait. Once the timeout has expired the method will return with a true value. This is the only condition under which the method will return with a true value.

If the module Time::HiRes is available then timeout can be in fractions (ex: 0.5 for half a second) otherwise full integers have to be provided. If not Perl will round the results during the conversion to int.

The timeout is implemented through sleep and has all the caveats of sleep, see perdoc -f sleep for more details. Remember that sleep could take a second less than requested (sleep 1 could do no sleep at all) and mixin calls to sleep and alarm is at your own risks as sleep is sometimes implemented through alarm. Furthermore, if a timeout between 0 and 1 second is provided as a fraction and that Time::Hires is not available Perl will round the value to 0.

The exit status of the process can be inspected through the accessor "exit_code" and the actual status, the value returned in $? by waitpid can be accessed through the accessor "status".

Parameters:

$timeout (optional)

The number of seconds to wait until the method returns due to a timeout. If undef then the method doesn't apply a timeout and waits until the task has resumed.

Returns:

If the method was invoked without a timeout then a false value will always be returned, no matter the outcome of the task. If a timeout was provided then the method will return a true value only when the timeout has been reached otherwise a false value will be returned.

kill

Sends a signal to the process. This is a simple wrapper over the system call kill. It takes the kind of signal that the built-in kill function.

NOTE: Calling kill doesn't warranty that the task will die. Most signals can be caught by the process and may not kill it. In order to be sure that the process is killed it is advised to call "wait_for". Even if the signal kills the process "wait_for" has to be called otherwise the task's process will be flagged as zombie process (see http://en.wikipedia.org/wiki/Zombie_process).

The following code snippet shows how to properly kill a task:

my $task = Parallel::SubFork::Task->start(\&job);
if ($task->wait_for(2)) {
	# Impatient block
	$task->kill('KILL');
	$task->wait_for();
}

Parameters:

$signal

The signal to send to the process. Same as the first parameter passed to the Perl built-in.

Returns:

The same value as Perl's kill.

NOTES

The API is not yet frozen and could change as the module goes public.

SEE ALSO

Take a look at POE for asynchronous multitasking and networking.

AUTHOR

Emmanuel Rodriguez, <emmanuel.rodriguez@gmail.com>

COPYRIGHT AND LICENSE

Copyright (C) 2008-2010 by Emmanuel Rodriguez

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.