NAME
Sub::Slice::Manual - user guide for Sub::Slice
USING Sub::Slice
Sub::Slice is a way of breaking down a long-running process and maintaining state across a stateless protocol. This allows the client to draw a progress bar or abort the process part-way through.
The mechanism used by Sub::Slice is similar to the session management used on many web user authentication systems. However rather than simply passing an ID back as a token as these systems do, in Sub::Slice a data structure with richer information is passed to the client, allowing the client to make some intelligent decisions rather than blindly maintain state.
Overview
Use of Sub::Slice is best explained with a minimal example. Assume that there is a remoting protocol between the client and server such as XML/HTTP. For the sake of brevity, assume that methods called in package Server:: on the client are magically remoted to the server.
The server does two things. The first is to issue a token for the client to use:
#Server
sub create_token {
my $job = new Sub::Slice();
return $job->token;
}
The second is to provide the routine into which the token is passed for each iteration:
sub do_work {
my $token = shift;
my $job = new Sub::Slice(token => $token);
at_start $job sub {
my $files = files_to_process();
#Store some data defining the work to do
$job->store("files", $files);
};
at_stage $job "each_iteration" sub {
#Get some work
my $files = $job->fetch("files");
my $file = shift @$files;
my $was_ok = process_file($file);
#Record we did the work
$job->store("files", $files);
#Check if there's any more work left to do
$job->done() unless(@$files);
};
}
The client somehow gets a token back from the server. It then passes this back to the server for each iteration. It can inspect the token to check if there is any more work left.
#Client
my $token = Server::create_token();
for(1 .. MAX_ITERATIONS) {
Server::do_work($token);
last if $token->{done};
}
Named stages
You can create any number of named stages. This is useful if you need to complete several phases of iterative processing.
sub do_work {
my $token = shift;
my $job = new Sub::Slice(token => $token);
at_start $job sub {
my $files = files_to_process();
$job->store("files", $files);
$job->store("position", 0);
$job->next_stage("check_files"); #explicitly start with this stage
};
at_stage $job "check_files", sub {
my $files = $job->fetch("files");
my $position = $job->fetch("position");
my $file = $files->[$position];
check_file($file) or $job->abort("check failed for $file");
$job->store("position", $position+1);
#Decide when to move to next stage
$job->next_stage("publish_files") if($position == scalar @$files);
};
at_stage $job "publish_files" sub {
my $files = $job->fetch("files");
my $file = shift @$files;
my $was_ok = process_file($file);
$job->store("files", $files);
$job->done() unless(@$files);
};
}
If next_stage is NOT called, Sub::Slice begins iterations using the FIRST at_stage block within the routine. In the example above, we could have omitted the next_stage
call from at_start
and the "check_files" sub would have still been called first.
Return values
Return values from the at_* methods are accessible via $job->return_value()
. This allows you to capture the return value from the coderef and do something with it (such as store it into $job with set_data or return it to the caller):
sub do_work {
my $token = shift;
my $job = new Sub::Slice(token => $token);
at_start $job sub {
my $files = files_to_process();
$job->store("files", $files);
$job->store("position", 0);
#This won't return from do_work, only from the at_* coderef
return scalar(@$files) . " files to process";
};
at_stage $job "check_files", sub {
my $files = $job->fetch("files");
my $position = $job->fetch("position");
my $file = $files->[$position];
check_file($file) or $job->abort("check failed for $file");
$job->store("position", $position+1);
return "processed $file";
};
#Get the return value from the at_* coderef
return $job->return_value;
}
Aborting a job
If the there is a problem, the server may abort the Sub::Slice iteration. Extending the example above:
my $was_ok = process_file($file);
$job->abort("Unable to process $file") unless($was_ok);
The client can detect this by checking the abort
property of the token:
for(1 .. MAX_ITERATIONS) {
Server::do_work($token);
if($token->{done} || $token->{abort}) {
warn "Remote error: " . $token->{error}
if($token->{abort});
last;
}
}
The error
property is also set by the server calling abort()
.
Premature termination from the client
The client can stop the iteration by setting the done
property of the token to a true value:
for(1 .. MAX_ITERATIONS) {
Server::do_work($token);
$token->{done} = 1 if( user_pressed_cancel() );
last if($token->{done} || $token->{abort});
}
Handling transport errors
If there is a transport error occurs during one of the client-server exchanges during a sub-sliced job, the client will know there has been an error from it's transport client interface. It WONT know what state the server is in - the server may or may not have successfully completed the last batch of iterations.
The client should therefore simply represent the token to the server (which won't have been updated to reflect the work that may or may not have been done) and let the server worry about how to interpret this. For example if the server could reprocess the units of work if they are re-runnable, or if could elect to skip them if it knows they have already been done.
Of course if the tranport keeps failing, the client is entitled to give up entirely on job or save it for retrying later. If the client does just give up on the job, the server should have a cleaner process in place to mop up any dangling sub-slice jobs that were never finished off by the client (see "Using a belt-and-braces cleaner process").
Tuning the iterations per call from the client
You can use the iterations
property of the token to tell the server the maxmimum number of iterations it should perform on the next call. Beware that zero is interpreted as an infinite number of iterations.
use Time::HiRes;
use constant ACCEPTABLE_WAIT => 5; #Can cope with 5 sec between updates on client
while(1) {
my $i1 = $token->{count};
my $t1 = time();
Server::do_work($token);
my $di = $token->{count} - $i1;
my $dt = time() - $t;
my $iterations_per_sec = $di/$dt;
#Calculate the optimal number of iterations to perform per call
$token->{iterations} = int($iterations_per_sec * ACCEPTABLE_WAIT) || 1;
last if($token->{done} || $token->{abort});
}
Using at_start and at_end
The at_start
and at_end
blocks are run, respectively, before the first iteration, and after the last. They allow you to perform initialisation and a "commit" once all the at_stage steps have completed successfully (such as sending a set of generated files to a sink):
sub do_work {
my $token = shift;
my $job = new Sub::Slice(token => $token);
at_start $job sub {
find_files($job);
};
at_stage "each_iteration" sub {
work_through_list($job);
}
at_end $job sub {
post_files($job);
};
}
WARNING: You might consider using at_start
and at_end
to allocate and deallocate resources used for the job (an example would be using a lock file to ensure that jobs are run serially), but at_end
will not be run if a job dies in an at_stage
coderef. This means that resources allocated in at_start could leak if a job is aborted by something dying in an at_stage
coderef. You can code defensively around this, by trapping any exceptions and deallocating resources if the job has been aborted:
sub do_work {
my $token = shift;
my $job = new Sub::Slice(token => $token);
#Trap any exceptions
eval {
at_start $job sub {
my $id = create_lock_file();
$job->store(lockfile_id => $id);
};
#NB when something dies, Sub::Slice sets $job->abort($@) and rethrows the exception
at_stage "each_iteration" sub {
#Either your code or Sub::Slice::Backend::* might raise an exception
die("an exception") unless went_smoothly($job);
};
};
my $error = $@; #save away $@
# Cleanup if job has succeeded or failed
# we could have used an at_end block for the successful case,
# but we'd need still need to check for the job being aborted here
if($job->token->{done} || $job->token->{abort}) {
my $id = $job->fetch(lockfile_id);
release_lock_file($id);
};
die($error) if($error); #Rethrow exceptions
}
Storing BLOB data
If you need to store large amounts of data against a key, you can use the store_blob
method rather than the store
method.
$job->store('key1' => $wee_thing);
$job->store_blob('key2' => $huge_thing);
To get this back on a subsequent iteration, use the fetch_blob
method
$wee_thing = $job->fetch('key1');
$huge_thing = $job->fetch_blob('key2');
Alternatively you can let Sub::Slice take care of when to use BLOB storage by giving it a threshold size:
my $job = new Sub::Slice(token => $token, 'auto_blob_threshold' => 4096); #Store anything >4K as a BLOB
$job->store('key1' => $wee_thing);
$job->store('key2' => $huge_thing); #this will be stored as a blob if bigger than 4K
$wee_thing = $job->fetch('key1');
$huge_thing = $job->fetch('key2');
Note that currently auto_blob_threshold only applies to scalars containing characters or bytes (not references).
It's up to the backend exactly what it does with BLOB data. The different store/fetch methods give the backend the opportunity to use a more efficient storage strategy for large chunks of BLOB data. For example, in the default Filesystem
backend, normal key-value data is serialised into a storable file for each Sub::Slice job whereas BLOB data is held outside in separate files.
Using a belt-and-braces cleaner process
Although Sub::Slice cleans up jobs that are finished, the data from jobs never completed will persist. In real life, this kind of "should never happen" error has a habit of happening occasionally so it's advisable to use a cleaner process to clear up any ancient Sub::Slice junk:
# clean up jobs from the default path, after the default period
perl -MSub::Slice::Backend::Filesystem -we "Sub::Slice::Backend::Filesystem->new()->cleanup"
This will delete anything over the default age threshold from the default subslice path. You can override defaults:
# clean up jobs from /var/tmp/sslice, after 20 days
Sub::Slice::Backend::Filesystem->new(path => '/var/tmp/sslice')->cleanup(20)
Alternatively, you can clean up with a couple of find(1) commands (if under UNIX):
find /var/tmp/sub_slice -type f -mtime +1 -exec rm {} \;
find /var/tmp/sub_slice -type d -mtime +1 -empty -exec rmdir {} \;
This is basically what the cleanup call does, plus some additional error checking.
EXAMPLES
Simple HTTP remoting
This is a "roll your own remoting protocol" example to show how Sub::Slice iteracts with some real remoting code.
##################################################################
# Client
##################################################################
use XML::Simple;
use HTTP::Request::Common qw(POST);
use LWP::UserAgent;
use constant MAX_ITERATIONS => 100;
my $token = create_token();
my $i = 0;
while (!$token->{done}) {
die "Long loop, presumed infinite" if $i++ > MAX_ITERATIONS;
do_work($token);
last if $token->{done};
}
# Proxy methods for remoting
sub create_token {
_call_method("create_token");
}
sub do_work {
_call_method("do_work", shift());
}
# Primative HTTP remoting
sub _call_method {
my ($name, $arg) = @_;
my $xml = (defined $arg) ? XMLout($arg) : '<opt/>'; #Serialise data as XML
my $ua = LWP::UserAgent->new;
my $req = POST 'http://myserver/cgi-bin/server.pl',
[ method => $name, args => $xml ];
my $res = $ua->request($req);
die($res->message) unless($res->is_success);
$xml = $res->content();
%$arg = %{XMLin($xml)};
$arg;
}
##################################################################
# Server - server.pl (installed into cgi-bin on myserver)
##################################################################
use CGI;
use XML::Simple;
use Sub::Slice;
use constant ALLOWED_METHOD =>
{ map {$_ => 1} qw(create_token do_work) };
# Primitive HTTP server dispatcher
my $method = CGI::param('method');
die("Unrecognised method ($method) called")
unless ALLOWED_METHOD->{$method};
my $args = XMLin( CGI::param('args')); #Deserialise data
my $out = $method->($args);
$out = XMLout ({ %$out }); # won't serialize a blessed class
print "Content-type: text/xml\n\n";
print $out;
sub create_token {
my $job = new Sub::Slice();
return $job->token;
}
sub do_work { #Mindlessly simple example
my $token = shift;
my $job = new Sub::Slice (token => $token);
at_start $job sub {
$job->store("steps_to_take", 50);
$job->store("position", 0);
};
at_stage $job "check_files", sub {
my $position = $job->fetch("position");
$job->store("position", $position+1);
$job->done() if($position >= $job->fetch("steps_to_take"));
};
$job->token;
}
Note that we return the job token from &do_work, so that it gets updated on the client. If we didn't do this, we would keep resetting position to 0, and we would loop forever (until we hit MAX_ITERATIONS).
Using Sub::Slice with SOAP::Lite
Rolling your own remoting protocol may be fun, but it's often more sensible to use a standard one such as XML/RPC or SOAP. Here's an example SOAP server using Sub::Slice with SOAP::Lite.
##################################################################
# SOAP Server - soapserver.pl (installed into cgi-bin on myserver)
##################################################################
use strict;
use SOAP::Transport::HTTP;
SOAP::Transport::HTTP::CGI
-> dispatch_to('My::SoapServer')
-> handle;
package My::SoapServer;
sub create_token {
my $class = shift;
my @args = @_;
my $job = new Sub::Slice();
at_start $job sub {
$job->store(args => @args);
$job->store("steps_to_take", scalar @args);
$job->store("position", 0);
};
return $job->token;
}
sub do_work {
my $class = shift;
my ($token) = @_;
my $job = new Sub::Slice( token => $token);
at_stage $job "check_files" sub {
my $position = $job->fetch("position");
$job->store("position", $position+1);
$job->done() if($position >= $job->fetch("steps_to_take"));
};
return $job->token;
}
Here's its client-side counterpart. The first few lines are where we define our SOAP application and process the request:
##################################################################
# SOAP Client
##################################################################
use strict;
use SOAP::Lite;
my $soap = SOAP::Lite
->uri("http://myserver.org/My/SoapServer")
->proxy("http://myserver/cgi-bin/soapserver.pl");
my @filenames = qw(badger.xhtml vole.xhtml dormouse.xhtml);
my ($token) = $soap->create_token(@filenames)->result;
while (!$token->{done} && !$token->{abort}) {
($token) = $soap->do_work($token)->result;
}
It starts by defining the $soap
object which will connect to our application at the proxy
url. The methods called on $soap
are transparently proxied over HTTP to our application. The proxied methods don't return the values from the remote method, but rather a SOAP message object. To access the return values from the remote method we call paramsall()
on the SOAP message object. And finally, our example retrieves an updated copy of the $token
after each remote call.
See SOAP::Lite for more information. In particular, the sections on AutoBinding and AutoDispatching may give you some ideas about how to improve on keeping the token up-to-date in the client.
VERSION
$Revision: 1.16 $ on $Date: 2004/12/17 16:31:27 $ by $Author: tims $
AUTHOR
John Alden <cpan _at_ bbc _dot_ co _dot_ uk>
COPYRIGHT
(c) BBC 2004. This program is free software; you can redistribute it and/or modify it under the GNU GPL.
See the file COPYING in this distribution, or http://www.gnu.org/licenses/gpl.txt