NAME

Grid::Request - An API for submitting jobs to a computational grid such as SGE or Condor.

DESCRIPTION

An API for submitting work to a Distributed Resource Management (DRM) system such as Sun Grid Engine (SGE) or Condor.

SYNOPSIS

use Grid::Request;
my $request = Grid::Request->new( project => "SomeProject" );

$request->times(2);
$request->command("/path/to/executable");
$request->initialdir("/path/to/initial/directory");
$request->error("/path/to/dir/stderr.err");

# Note, most of the methods in this module may also be called
# with get_ and set_ prefixes. For example, the above code would
# also have worked if coded like so:

$request->set_times(2);
$request->set_command("/path/to/executable");
$request->set_initialdir("/path/to/initial/directory");
$request->set_error("/path/to/dir/stderr.err");

# When retrieving information (accessor behavior), you can call
# such methods with no arguments to return the information, or
# the "get_" may be prepended. For example:

my $times = $request->times();
my $times_another_way = $request->get_times();
# Please note that calling the get version of a method and
# providing arguments does not make sense and will likely, not work...

# WRONG
my $times_wrong_way = $request->get_times(3);

# Finally, submit the request...
my @id = $request->submit();
print "The first ID for this request is $id[0].\n";

# ...and wait for the results. This step is not necessary, only
# if you wish to block, or wait for the request to complete before
# moving on to other tasks.
$request->wait_for_request();

# Or, you could simply submit and block:
$request->submit_and_wait();

exit;

CONSTRUCTOR AND INITIALIZATION

Grid::Request->new(%args);

Description: This is the object constructor. Parameters are passed to the constructor in the form of a hash. Examples:

my $req = Grid::Request->new( project => "SomeProject" );

or

my $req = Grid::Request->new( project    => "SomeProject",
                              opsys      => "Linux",
                              initialdir => "/path/to/initialdir",
                              output     => "/path/to/output",
                              times      => 5,
                            );
Users may also add a "debug" flag to the constructor call for increased
reporting:

my $req = Grid::Request->new( project => "SomeProject",
                              debug   => 1 );

Parameters: Only the 'project' parameter is mandatory when calling the constructor.

Returns: $obj, a Grid::Request object.

CONFIGURATION

By default, the configuration file to that is used to determine what grid engine type to use and where to store temporary files is located in the invoking user's home directory under ~/.grid_request.conf. The file needs needs to have a [request] header and configurations for the 'tempdir' and 'drm' parameters. The following is an example:

[request]
tempdir=/path/to/grid/accessible/tmp/directory
drm=SGE

The 'tempdir' directory must point to a directory that is accessible to the grid execution machines, for instance, over NFS... Users may provide an alternate path to a different configuration file by specifying the 'config' parameter to the constructor:

my $req = Grid::Request->new( project => "SomeProject",
                              config => "/some/other/dir/request.conf",
                            );
Another way of specifying an alternate configuration is to define
the GRID_CONFIG environment variable.

Class and object methods

$obj->account([account]);

Description: The account attribute is used to affiliate a grid job with a particular account. Grid engines differ in their treatment of the account attribute.

Parameters: To use as a setter, the first parameter will be used to set (or reset) the account attribute for the command.

Returns: The currently set account (if called with no parameters).

$obj->add_param($scalar | @list | %hash );

Description: Add a command line argument to the executable when it is executed on the grid. Since many executables associate meaning with the order that command line arguments are given, Grid::Request also honors the order in which parameters are added. They are reassembled at runtime on the grid in the same order that they were added...

Parameters: If the number of arguments is 1, then it will be considered to be a simple, "anonymous" parameter... When called with a single scalar argument, no logic is attempted to interpret the string provided. The module simply adds the specified parameter verbatim to the list of parameters when building the command line to invoke on the grid. If 3 parameters are passed, then they are read as "key", "value", "type". The type can be either "ARRAY", "DIR", "PARAM", "FILE", or "TEMPFILE" (the default is "PARAM" when less than 3 arguments are passed). The type is used in the following way to aid in the parallelization of processes: If ARRAY is used, the job will be iterated over the elements of the array, with the value of the parameter being changed to the next element of the array each time. The array must be an array of simple strings passed in as an array reference to VALUE. Newlines will be stripped. Note: Nested data structures will not be respected. If DIR is specified, the file contents of the directory will be iterated over. If a directory contains 25 files, then there will be at least 25 jobs, with the name of each file being a parameter value for each invocation. If FILE is specified, then the VALUE specified in the method call will be interpreted as the path to a file containing entries to iterate over. The file may contain hundreds of entries (1 per line) to generate a corresponding number of jobs. TEMPFILE works similarly to FILE, except that the HTC system will delete, or clean up the file when the request has finished being processed. Use with caustion. Finally, PARAM, the default type, provides simple parameter support and no iteration will occur.

If greater clarity and flexibility is desired, one may wish to pass named parameters in a hash reference instead:

$obj->add_param( { key   => "--someparam",
                   value => "somevalue",
                   type  => "DIR",
                 });

The 3 supported keys are case insensitive, so "KEY", "Value" and "tYpE" are also valid. Unrecognized keys will generate warnings.

If more then 3 arguments are passed to the method an error occurs.

For each parameter that is added, the 'key' is what dictates how the parameter should be processed as a command line argument and how the values from the iterable directory, array or file are to be dropped into the final command line invocation. Parameter keys can make use of two tokens: $(Index) and $(Name). The $(Index) token is replaced at runtime with the actual sequence number of the job on the grid. The '$(Name)' token is replaced with the string taken from the iterable file, directory or array. In the case of parameters of type

FILE  -> $(Name) is repeatedly replaced with each line in the file
DIR   -> $(Name) is repeatedly replaced with the name of each file in the directory 
ARRAY -> $(Name) is repeatedly replaced with each scalar value of the element of the array

Examples:

FILE
   $request->add_param({ type  => "FILE",
                         key   => '--string=$(Name)',
                         value => "/path/to/some/file.txt",
                      });

DIR
   $request->add_param({ type   => "DIR",
                         key    => "--filepath=$(Name)',
                         value  => "/path/to/some/directory",
                      });
ARRAY
   $request->add_param({ type   => "ARRAY",
                         key    => "--element=$(Name)',
                         value  => \@array,
                      });

Returns: None.

$obj->class([$class]);

Description: This method is used to set and retrieve the request's class attribute. A request's class describes its general purpose or what it will be used for. For example, a command can be marked as a request for "engineering" or "marketing". Ad hoc requests will generally not use a class setting. If in doubt, leave the class attribute unset.

Parameters: With no parameters, this method functions as a getter. With one parameter, the method sets the request's class. No validation is performed on the class passed in.

Returns: The currently set class (when called with no arguments).

$obj->command([$command]);

Description: This method is used to set or retrieve the executable that will be called for the request.

Parameters: With no parameters, this method functions as a getter. With one parameter, the method sets the executable. Currently, this module does not attempt to verify whether the exeutable is actually present or whether permissions on the executable allow it to be called by the DCE.

Returns: The currently set executable (when called with no arguments).

$obj->email([$command]);

Description: This method is used to set or retrieve the email of the user submitting the request. The email is important for notifications and for tracking purposes in case something goes wrong.

Parameters: With no parameters, this method functions as a getter and returns the currently configured email address. If the request has not yet been submitted, the user may set or reset the email address by providing an argument. The address is not currently validated for RFC compliance.

Returns: The email address currently set, or undef if unset (when called with no arguments).

$obj->end_time()

Description: Retrieve the finish time of the request.

Parameters: None.

Returns: The ending time of the request (the time the DCE finished processing the request), or undef if the end_time has not yet been established.

$obj->error([errorfile])

Description: This method allows the user to set, or if the request has not yet been submitted, to reset the error file. The error file will be the place where all STDERR from the invocation of the executable will be written to. This file should be in a globally accessible location on the filesystem. The attribute may not be changed with this method once the request has been submitted.

Parameters: To set the error file, call this method with one parameter, which should be the path to the file where STDERR is to be written.

Returns: When called with no arguments, this method returns the currently set error file, or undef if not yet set.

$obj->getenv([1]);

Description: The getenv method is used to set whether the user's environment should be replicated to the DCE or not. To replicate your environment, call this method with an argument that evaluates to true. Calling it with a 0 argument, or an expression that evaluates to false, will turn off environment replication. The default is NOT to replicate the user environment across the DCE.

Parameters: This method behaves as a getter when called with no arguments. If called with 1, or more arguments, the first will be used to set the attribute to either 1 or 0.

Returns: The current setting for getenv (if called with no arguments).

$obj->ids();

Description: This method functions only as a getter, but returns the DRM ids associated with the overall request after it has been submitted.

Parameters: None.

Returns: Returns an array in list context. In scalar context, returns a reference to an array.

$obj->is_submitted();

Description: Returns whether a request object has been submitted.

Parameters: None.

Returns: 1 if the request has been submitted and 0 if it has not.

$obj->project([$project]);

Description: The project attribute is used to affiliate usage of the DRM with a particular administrative project. This will allow for more effective control and allocation of resources, especially when high priority projects must be fulfilled. Therefore, the "project" is mandatory when the request object is built. However, the user may still change the project attribute as long as the job has not yet been submitted (after submission most attributes are locked).

Parameters: The first parameter will be used to set (or reset) the project attribute for the request, as long as the request has not been submitted.

Returns: The currently set project (if called with no parameters).

$obj->input([path]);

Description:

Parameters:

Returns:

$obj->initialdir([path]);

Description: This method sets the directory where the DCE will be chdir'd to before invoking the executable. This is an optional parameter, and if the user leaves it unspecified, the default will be that the DCE will be chdir'd to the root directory "/" before beginning the request. Use of initialdir is encouraged to promote use of relative paths.

Parameters: A scalar holding the path to the directory the DCE should chdir to before invoking the executable.

Returns: When called with no arguments, returns the currently set initialdir, or undef if not yet set.

$obj->length([length]);

Description: This method is used to characterize how long the request is expected to take to complete. For long running requests, an attempt to match appropriate resources is made. If unsure, leave this setting unset.

Parameters: "short", "medium", "long". No attempt is made to validate the length passed in when used as a setter.

Returns: The currently set length attribute (when called with no arguments).

$obj->name([name]);

Description: The name attribute for request objects is optional and is provided as a convenience to users to name their requests.

Parameters: A scalar name for the request.

Returns: When called with no arguments, returns the current name, or undef if not yet set. The name cannot be changed once a request is submitted.

$obj->next_command();

Description: The module allows for requests to encapsulate multiple commands. This method will start work on a a new one by moving a cursor. Commands are processed in the order in which they are created. In addition, the only attribute that the new command inherits from the command that preceded it, is the project. However, users are free to change the project by calling the project() method...

Parameters: None.

Returns: None.

$obj->opsys([$os]);

Description: The default operating system that the request will be processed on is Linux. Users can choose to submit requests to other operating systems by using this method. Available operating systems are "Linux", "Solaris". An attempt to set the opsys attribute to anything else results in an error. Values must be comma separated, so if you would loke your command to run on Linux or Solaris:

$obj->opsys("Linux,Solaris");

and for Linux only:

$obj->opsys("Linux"):

Parameters: "Linux", "Solaris", etc, when called as a setter (with one argument).

Returns: When called with no arguments, returns the operating system the request will run on, which defaults to "Linux".

$obj->hosts([hostname]);

Description: Used to set a set the list of possible machines to run the jobs on. If this value is not set then any host that matches the other requirements will be used according to the grid engine in use. Hostnames are passed in in comma-separated form with no spaces.

Parameters: hostname(s), example "machine1,machine2"

Returns: When called with no arguments, returns the hosts if set.

$obj->memory([megabytes]);

Description: Used to set the minimum amount of physical memory needed.

Parameters: memory in megabytes, example 10MB, 512MB

Returns: When called with no arguments, returns the memory if set.

$obj->pass_through([pass_value]);

Description: Used to pass strings to the underlying DRM (Distributed Resource Mangement) system (Condor, SGE, LSF, etc...) as part of the request's requirements. Such pass throughs are forwarded unchanged. This is an advanced option and should only be used by those familiar with the the underlying DRM.

Parameters: $string

Returns: None.

$obj->output([path]);

Description: Sets the path for the output file, which would hold all of the output directed to STDOUT by the request on the DCE. This method functions as a setter and getter.

Parameters: A path to a file. The file must be globally accessible on the filesystem in order to work, otherwise, the location will not be accessible to compute nodes on the DCE. This attribute may not be changed once a request is submitted.

Returns: When called with no arguments, the method returns the currently set path for the output file, or undef if not yet set.

$obj->params();

Description: Retrieve the list of currently registered parameters for the request.

Parameters: None.

Returns: The method returns a list of hash references.

$obj->priority([priority]);

Description: Use this method to set the optional priority attribute on the request. The priority setting is used to help allocate the appropriate resources to the request. Higher priority requests may displace lower priority requests.

Parameters: Scalar priority value.

Returns: The current priority, or undef if unset.

$obj->set_env_list(@vars);

Description: This method is used to establish the environment that a a request to the grid should run under. Users may pass this method a list of strings that are in "key=value" format. The keys will be converted into environment variables set to "value" before execution of the command is begun. Normally, a request will not copy the user's environment in this way. The only time the environment is established on the DCE will be if the user invokes the getenv method or sets it with this method. This method allows the user to override the environment with his or her own notion of what the environment should be at runtime on the grid.

Parameters: A list of strings in "key=value" format. If any string does not contain the equals (=) sign, it is skipped and a warning is generated.

Returns: None.

$obj->simulate([value]);

Description: This method is used to toggle the simulate flag for the request. If this method is passed a true value, the request will not be submitted to the grid, but will appear to have been submitted. This is most useful in development and testing environments to conserve resources. When a request marked simulate is submitted, the request ID returned will be -1. Note that this attribute cannot be modified once a request is submitted.

Parameters: A true value (such as 1) to mark the request as a simulation. A false value, or express (such as 0) to mark the request for execution.

Returns: When called with no arguments, this method returns the current values of the simulate toggle. 1 for simulation, 0 for execution.

$obj->start_time();

Description: Retrieve the start time when the request began processing. Any attempt to set the time will result in an error.

Parameters: None.

Returns: $time, the start time (scalar) that the grid began processing the request.

$obj->state();

Description: Retrieve the "state" attribute of the request. This method is "read only" and an attempt to set the state will result in an error. The states are:

INIT
INTERRUPTED
FAILURE
FINISHED
RUNNING
SUSPENDED
UNKNOWN
WAITING

Parameters: None.

Returns: $state, a scalar with the current state of the request.

$obj->stop([$id]);

Description: Stop a request that has already been submitted.

Parameters: Request ID (optional)

Returns: None.

$obj->submit_serially();

Description: Calling this method is the equivalent of calling submit with the serial flag set to a true value, eg. $obj->submit(1).

Parameters: None.

Returns: The array of grid ids in list context, or an array reference in scalar context.

$obj->submit([$serial]);

Description: Submit the request to the grid for execution.

Parameters: An optional parameter, which if true, will cause the commands to be executed serially. The default is for asynchronous execution

Returns: The array of DRM ids in list context, or an array reference in scalar context.

$obj->submit_and_wait();

Description: Submit the request for execution on the grid and wait for the request to finish executing before returning control (block).

Parameters: None.

Returns: $id, the request's id.

$obj->times([times]);

Description: Sometimes it may be desirable to execute a command more than one time. For instance, a user may choose to run an executable many times, with each invocation operating on a different input file. This technique allows for very powerful parallelization of commands. The times method establishes how many times the executable should be invoked.

Parameters: An integer number may be passed in to set the times attribute on the request object. If no argument is passed, the method functions as a getter and returns the currently set "times" attribute, or undef if unset. The setting cannot be changed after the request has been submitted.

Returns: $times, when called with no arguments.

$obj->to_xml();

Description: Returns the XML representation of the entire request.

Parameters: None.

Returns: $xml, a scalar XML string.

$obj->command_count();

Description: Returns the number of currently configured commands in the overall request.

Parameters: None.

Returns: $count, a scalar.

$obj->wait_for_request();

Description: Once a request has been submitted, a user may choose to wait for the request to complete before proceeding. This is called "blocking". To block and wait for a request, submit it ( by calling submit() ) and then call wait_for_request(). Control will return once the request has been finished (either completed or errored). If an attempt is made to call this method before the request has been submitted, a warning is generated.

Parameters: None.

Returns: None.

$obj->get_tasks();

Description: Retrieve the tasks for this request

Parameters: None.

Returns: A hash of hashes (HoH) representing the tasks for this request. The hash is organized by the index and the value is another hashref with the actual data. The following is an example of the return data structure:

$hashref = {
          '1' => {
                 'returnValue' => 0,
                 'message'     => undef,
                 'state'       => 'FINISHED'
               },
          '2' => {
                 'returnValue' => -1,
                 'message'     => 'Failed task.',
                 'state'       => 'FAILED'
               }
        }

ENVIRONMENT

The GRID_CONFIG environment variable is checked for an alternate path to the configuration file holding the DRM engine in use and the shared temporary directory to use.

If however, the getenv() method is called, this module will read and store the entire environment and attempt to recreate it for the job(s) on the grid.

DIAGNOSTICS

"Initialization failed. Too many arguments."

The object could not be initialized when the constructor was called. Too many arguments were provided to "new".

BUGS

None known.

SEE ALSO

Config::IniFiles
Hash::Util
IO::Scalar
Log::Log4perl
Schedule::DRMMAc
Grid::Request::Command
Grid::Request::HTC
Grid::Request::DRM::SGE
XML::Writer