NAME

GRID::Cluster - Virtual clusters using SSH links

SYNOPSIS

use GRID::Cluster;

my $np = 4;     # Number of processes
my $N = 1000;   # Number of iterations
my $clean = 0;  # The files are not removed when the execution is finished

my $machine = [ 'host1', 'host2', 'host3' ];                # Hosts
my $debug = { host1 => 0, host2 => 0, host3 => 0 };         # Debug mode in every host
my $max_num_np = { host1 => 1, host2 => 1, host3 => 1 };    # Maximum number of processes supported by every host

my $c = GRID::Cluster->new(host_names => $machine, debug => $debug, max_num_np => $max_num_np);
  || die "No machines has been initialized in the cluster";

# Transference of files to remote hosts
$c->copyandmake(
  dir => 'pi',
  makeargs => 'pi',
  files => [ qw{pi.c Makefile} ],
  cleanfiles => $clean,
  cleandirs => $clean, # remove the whole directory at the end
  keepdir => 1,
);

# This method changes the remote working directory of all hosts
$c->chdir("pi/")  || die "Can't change to pi/\n";

# Tasks are created and executed in remote machines using the method 'qx'
my @commands = map {  "./pi $_ $N $np |" } 0..$np-1
print "Pi Value: ".sum @{$c->qx(@commands)}."\n";

DESCRIPTION

This module is based on the module GRID::Machine. It provides a set of methods to create 'virtual' clusters by the use of SSH links for communications among different remote hosts.

Since main features of GRID::Machine are zero administration and minimal installation, GRID::Cluster directly inherites these features.

Mainly, GRID::Cluster provides:

  • An extension of the Perl qx method. Instead of a single command it receives a list of commands. Commands are executed - via SSH - using the master-worker paradigm.

  • Services for the transference of files among machines.

DEPENDENCIES

This module requires these other modules and libraries:

  • GRID::Machine module by Casiano Rodriguez Leon

METHODS

The Constructor new

This method returns a new instance of an object.

There are two ways to call the constructor. The first one looks like:

my $cluster = GRID::Cluster->new(
                                 debug      => {machine1 => 0, machine2 => 0,...},
                                 max_num_np => {machine1 => 1, machine2 => 1,...},
                                );

where:

  • debug is a reference to a hash that specifies which machines will be in debugging mode. It is optional.

  • max_num_np is a reference to a hash containing the maximum number of processes supported for each machine.

The second one looks like:

my $cluster = GRID::Cluster->new(config => $config_file_name);

where:

  • config is the name of the file containing the cluster specification. The specification is written in Perl itself. The code inside the config file must return a list defining the max_num_np and debug parameters as in the previous call. See the following example:

    $ cat -n MachineConfig.pm
    1  my %debug = (machine1 => 0, machine2 => 0, machine3 => 0, machine4 => 0);
    2  my %max_num_np = (machine1 => 3, machine2 => 1, machine3 => 1, machine4 => 1);
    3
    4  return (debug => \%debug, max_num_np => \%max_num_np);

The Method qx

The syntax of the method qx is:

my $result = $cluster->qx(@commands); 

It receives a list of commands and executes each command as a remote process. It uses a farm-based approach. At some time a chunk of commands - the size of the chunk depending on the number of processors - is being executed. As soon as some command finishes, another one is sent to the new idle worker (if there are pending tasks).

In a scalar context, a reference to a list that contains every results is returned. Such list contains the outputs of the @commands. Observe however that no assumption can be made about the processor where an individual command c in @commands is eexecuted. See the following example:

An example of use:

$ cat -n uname_echo_qx.pl
   1	  #!/usr/bin/perl
   2	  use strict;
   3	  use warnings;
   4	
   5	  use GRID::Cluster;
   6	  use Data::Dumper;
   7	
   8	  my $cluster = GRID::Cluster->new(max_num_np => {orion => 1, europa => 1},);
   9	
  10	  my @commands = ("uname -a", "echo Hello");
  11	  my $result = $cluster->qx(@commands);
  12	
  13	  print Dumper($result);

The result of this example produces the following output:

$ ./uname_echo_qx.pl 
$VAR1 = [                                                   
          'Linux europa 2.6.24-24-generic #1 SMP Wed Apr 15 15:11:35 UTC 2009 x86_64 GNU/Linux
',                                                                                            
          'Hello                                                                              
'                                                                                             
        ];  

Observe that the first output corresponds to the first command uname -a, and the second output to the second command echo Hello. Notice also that we can't assume that the first command will be executed in the first machine, the second one in the second machine, etc. We can only be certain that all the commands will be executed in some machine of the cluster pool.

The Method copyandmake

The syntax of the method copyandmake is:

my $result = $cluster->copyandmake(
               dir => $dir,
               files => [ @files ],      # files to transfer
               make => $command,         # execute $command $commandargs
               makeargs => $commandargs, # after the transference
               cleanfiles => $cleanup,   # remove files at the end
               cleandirs => $cleanup,    # remove the whole directory at the end
             )

and it returns a GRID::Cluster::Result object.

copyandmake copies (using scp) the files @files to a directory named $dir in remote machines. The directory $dir will be created if it does not exists. After the file transfer the command specified by the copyandmake option

make => 'command'

will be executed with the arguments specified in the option makeargs. If the make option is not specified but there is a file named Makefile between the transferred files, the make program will be executed. Set the make option to number 0 or the string '' if you want to avoid the execution of any command after the transfer. The transferred files will be removed when the connection finishes if the option cleanfiles is set. If the option cleandirs is set, the created directory and all the files below it will be removed. Observe that the directory and the files will be kept if they were not created by this connection. The call to copyandmake by default sets dir as the current directory in remote machines. Use the option keepdir => 1 to one to avoid this.

The Method chdir

The syntax of this method is as follows:

my $result = $cluster->chdir($remote_dir);

and it returns a GRID::Cluster::Result object.

The method chdir changes the remote working directory to $remote_dir in every remote machine.

INSTALLATION

To install GRID::Cluster, follow these steps:

  • Set automatic ssh-authentication with machines where you have an account.

    SSH includes the ability to authenticate users using public keys. Instead of authenticating the user with a password, the SSH server on the remote machine will verify a challenge signed by the user's private key against its copy of the user's public key. To achieve this automatic ssh-authentication you have to:

    • Generate a public key use the ssh-keygen utility. For example:

      local.machine$ ssh-keygen -t rsa -N ''

      The option -t selects the type of key you want to generate. There are three types of keys: rsa1, rsa and dsa. The -N option is followed by the passphrase. The -N '' setting indicates that no pasphrase will be used. This is useful when used with key restrictions or when dealing with cron jobs, batch commands and automatic processing which is the context in which this module was designed. If still you don't like to have a private key without passphrase, provide a passphrase and use ssh-agent to avoid the inconvenience of typing the passphrase each time. ssh-agent is a program you run once per login sesion and load your keys into. From that moment on, any ssh client will contact ssh-agent and no more passphrase typing will be needed.

      By default, your identification will be saved in a file /home/user/.ssh/id_rsa. Your public key will be saved in /home/user/.ssh/id_rsa.pub.

    • Once you have generated a key pair, you must install the public key on the remote machine. To do it, append the public component of the key in

      /home/user/.ssh/id_rsa.pub

      to file

      /home/user/.ssh/authorized_keys

      on the remote machine. If the ssh-copy-id script is available, you can do it using:

      local.machine$ ssh-copy-id -i ~/.ssh/id_rsa.pub user@remote.machine

      Alternatively you can write the following command:

      $ ssh remote.machine "umask 077; cat >> .ssh/authorized_keys" < /home/user/.ssh/id_rsa.pub

      The umask command is needed since the SSH server will refuse to read a /home/user/.ssh/authorized_keys files which have loose permissions.

    • Edit your local configuration file /home/user/.ssh/config (see man ssh_config in UNIX) and create a new section for GRID::Cluster connections to that host. Here follows an example:

      ...

       # A new section inside the config file:
       # it will be used when writing a command like:
       #                  $ ssh gridyum
      
       Host gridyum
      
       # My username in the remote machine
       user my_login_in_the_remote_machine
      
       # The actual name of the machine: by default the one provided in the
       # command line
       Hostname real.machine.name
      
       # The port to use: by default 22
       Port 2048
      
       # The identitiy pair to use. By default ~/.ssh/id_rsa and ~/.ssh/id_dsa
       IdentityFile /home/user/.ssh/yumid
      
       # Useful to detect a broken network
       BatchMode yes
      
       # Useful when the home directory is shared across machines,
       # to avoid warnings about changed host keys when connecting
       # to local host
       NoHostAuthenticationForLocalhost yes
      
      
       # Another section ...
       Host another.remote.machine an.alias.for.this.machine
       user mylogin_there
      
       ...
      
       This way you don't have to specify your I<login> name on the remote machine even if it
       differs from your  I<login> name in the local machine, you don't have to specify the
       I<port> if it isn't 22, etc. This is the I<recommended> way to work with C<GRID::Cluster>.
       Avoid cluttering the constructor C<new>.
    • Once the public key is installed on the remote machine you should be able to authenticate using your private key

      $ ssh remote.machine
      Linux remote.machine 2.6.15-1-686-smp #2 SMP Mon Mar 6 15:34:50 UTC 2006 i686
      Last login: Sat Jul  7 13:34:00 2007 from local.machine
      user@remote.machine:~$

      You can also automatically execute commands in the remote server:

      local.machine$ ssh remote.machine uname -a
      Linux remote.machine 2.6.15-1-686-smp #2 SMP Mon Mar 6 15:34:50 UTC 2006 i686 GNU/Linux
  • Before running the tests. Set on the local machine the environment variable GRID_REMOTE_MACHINES to point to a set of machines that is available using automatic authentication. For example, on a bash:

    export GRID_REMOTE_MACHINES=user@machine_1.domain:user@machine_2.domain:...:user@machine_n.domain

    Otherwise most connectivity tests will be skipped. This and the previous steps are optional.

  • Follow the traditional steps:

    perl Makefile.PL
    make
    make test
    make install

SEE ALSO

AUTHORS

Eduardo Segredo Gonzalez <esegredo@ull.es> and Casiano Rodriguez Leon <casiano@ull.es>

AKNOWLEDGEMENTS

This work has been supported by the EC (FEDER) and the Spanish Ministry of Science and Innovation inside the 'Plan Nacional de I+D+i' with the contract number TIN2008-06491-C04-02.

Also, it has been supported by the Canary Government project number PI2007/015.

The work of Eduardo Segredo has been developed under the contract PTA2003-02-01053.

COPYRIGHT AND LICENSE

Copyright (C) 2009 by Casiano Rodriguez Leon and Eduardo Segredo Gonzalez. All rights reserved.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.