NAME
Helios::Tutorial - a tutorial for getting started with Helios
DESCRIPTION
This is a short tutorial to introduce the Helios system's basic concepts and to show some quick examples of how to get started working with Helios.
HELIOS CONCEPTS
There are a few basic concepts you need to learn in order to understand the way Helios works. Once you understand these concepts, it will be simple for you to create Helios applications and manage a Helios collective.
Jobs
Jobs are simply a set of parameters for services (see below) that represent a discrete unit of work. Jobs are represented by XML-style markup and can be submitted either programatically via the Helios API, via the command line helios_job_submit.pl program, or via HTTP request to the submitJob.pl CGI program.
Services
Services are Perl classes that define how jobs of a certain type should be processed. Service classes are subclasses of Helios::Service, and implement a run() method to perform a job's operations. The run() method marks the job as successful or failed just before it ends. Services can be configured across the collective (see below) using Helios's built-in configuration subsystem, which can be accessed via the Helios::Panoptes web interface or by using the helios_config_* shell commands.
Services are loaded into memory by the helios.pl service daemon program. When jobs are submitted to Helios for a particular service, worker processes (see below) are launched to actually perform the work.
Workers
Workers are processes launched by helios.pl service daemons to actually perform jobs. A worker will instantiate its associated service class, do some preparation, and call the service object's run() method. In normal operation, a worker process performs one job and then exits, but in "OVERDRIVE" mode a worker process will stay in memory and perform as many jobs as possible, until 1) there are no more jobs in the queue, 2) it is told to HOLD or HALT job processing, or 3) it encounters an error processing a job that causes it to exit.
Collective
A collective is a group of servers running helios.pl daemons connected to the same Helios database. Services in a collective can be centrally administered using the Helios::Panoptes web interface.
In addition to these basics, there are a couple of other Helios concepts that will not be dealt with in this tutorial but is worth knowing:
# End of section covered by CEB Toolbox, Inc. copyright.
Jobtypes
Every job in the Helios system has a jobtype, which is sort of an abstraction of a queue. For now, all you need to know is every Helios service has a corresponding jobtype with the same name. When you submit a job to Helios, you will set the jobtype to the name of the service you want to run the job.
# the following section is Copyright (C) 2008-9 by CEB Toolbox, Inc.
Metajobs
Metajobs are large batches of jobs submitted together to Helios. Bound together by XML, a metajob will be burst apart into its constituent jobs when first serviced by Helios. Metajobs can greatly decrease the time it takes to submit large batches of jobs into the Helios job queue. Also, in conjunction with worker OVERDRIVE mode, metajobs allow workers to achieve maximum system throughput.
A BASIC HELIOS SERVICE
Writing a Helios service involves writing a service class, a Perl class that subclasses Helios::Service. Your service class will need to implement the service's run() method. The run() method will be passed a Helios::Job object representing the job to be performed.
Here's a very simple sample class as an example:
package TestService;
use strict;
use warnings;
use base qw(Helios::Service);
sub run
{
my $self = shift;
my $job = shift;
my $config = $self->getConfig();
my $args = $self->getJobArgs($job);
foreach my $arg (keys %$args)
{
$self->logMsg($job, "param:".$arg." value:".$args->{$arg});
print '*** JOBID: '.$job->getJobid().' param: '.$arg.' value: '.$args->{$arg}." ***\n";
}
$self->completedJob($job);
}
1;
This service is extremely simple; all it does is pick up the service's configuration and the given job's arguments, and logs the job's arguments in the Helios log. It will also print the arguments to the terminal. Then it calls the completedJob() method to mark the job as finished successfully. Despite its simplicity, all Helios services ultimately follow this same basic pattern.
Let's take a closer look at this simple example. First, let's look at the package declaration and modules:
package TestService;
use strict;
use warnings;
use base qw(Helios::Service);
In addition to declaring the service's name with the package declaration, we've also enabled the strict and warnings pragmas. We declare our service to be a subclass Helios::Service by using the use base pragma.
Next, we have the run() method. This is the only required method in your service class. It starts by pulling in config parameters and job arguments from the Helios system:
sub run {
my $self = shift;
my $job = shift;
my $config = $self->getConfig();
my $args = $self->getJobArgs($job);
The only parameter directly passed to run() is a Helios::Job object that represents the job the service needs to run. After stashing the service in the $self variable and the Helios::Job object in the $job variable, the run() method does two more things before the actual job processing starts. First, it grabs the service's configuration using the getConfig() method, and then gets the job's arguments using the getJobArgs() method. Both the service configuration and job arguments are returned as hashrefs, so it will be easy to work with them later in the run() method.
Next we have the rest of the run() method:
foreach my $arg (keys %$args)
{
$self->logMsg($job, "param:".$arg." value:".$args->{$arg});
print '*** JOBID: '.$job->getJobid().' param: '.$arg.' value: '.$args->{$arg}." ***\n";
}
$self->completedJob($job);
}
The foreach block is just looping through all the arguments in the job argument hashref and using the logMsg() method to log them in the Helios system log. It then also prints them to the terminal. In reality, this part of the run() method could be anything: a mathematical computation, the processing of a file, a call to another function or method in another Perl module. What work you actually do in your run() method is entirely up to you!
Note: one thing you don't normally do in Helios services is print to the terminal, since usually there is no terminal to print to. But we'll be running this service later in debug mode, and it will be helpful for you to see the job do something on the screen.
What is important, however, is what happens when your work is done. The last thing in this run() method (and indeed, all run() methods) is the call to mark the job as completed successfully or failed. This run() method is very, very simple, so in this case we are going to assume the job is successful and mark it as such by calling the completedJob() method. The only parameter for completedJob() is the Helios::Job object that run() was passed. If we had decided instead that the $job had failed, we would have used the failedJob() method:
$self->failedJob($job,"It failed!");
The failedJob() method works like completedJob() except it marks the job as failed rather than succeeded in the system. In addition, you may also specify an error message that will be recorded with the job so you can see why the job failed.
Once we've marked the job as completed or failed, the run() method is over.
So that, in a nutshell, is the basics of creating a Helios service class. All Helios service classes ultimately use this design pattern. This makes creating new Helios services easy, either by writing new code or adapting existing code.
STARTING A HELIOS SERVICE AND SUBMITTING A JOB
Having read through the last section, you may ask, "But how do I actually get this TestService thing to run a job?" If you've got your helios.ini configured and ready, you're almost ready to go.
Make sure the path to your helios.ini is set in the HELIOS_INI environment variable, and that the variable is exported. At the command line:
export HELIOS_INI=/path/to/helios.ini
Also make sure it is an absolute path; relative paths will confuse the Helios service loader/daemon. Also, for this tutorial, go ahead and enable debug mode by setting the HELIOS_DEBUG environment variable:
export HELIOS_DEBUG=1
This will allow you to see some extra Helios debugging messages and prevent the service daemon from daemonizing, allowing you to stop it from the command line.
First, we'll go ahead and submit the job we want to run by using the helios_job_submit.pl program at the command line:
helios_job_submit.pl -v TestService "<job><params><myarg1>This is a test</myarg1></params></job>"
This will submit a job with a jobtype of TestService, meaning it is meant to be run by the service named "TestService". In the XML arguments for the job, there is actually only one argument, named 'myarg1', that has the value "This is a test". Of course, you can have a large number of arguments; the limit in the default Helios MySQL database schema is about 16MB, though you really should not be submitting that much data as job arguments, at least while you are learning the system.
The -v option tells helios_job_submit.pl to return the jobid of the new job. If you use the -v option or you enabled HELIOS_DEBUG, you should receive a message if your Helios setup is functioning properly:
Job submit successful. JOBID: 9
(The jobid will vary depending on how many jobs you have submitted to the system previously.) If you received an error, there is most likely a problem with your Helios configuration; go back to the install instructions, fix the problem, and try again.
So now that you have submitted a job to Helios, how do you make it run? If you saved the service we discussed above in a file called TestService.pm in the current directory, you can start the service using the helios.pl service loader/daemon:
helios.pl TestService
If you enabled HELIOS_DEBUG, you'll see a lot of messages scroll on the screen as helios.pl does some setup, attempts to load your service class, and parses the configuration for the service in helios.ini and in the Helios database. If that all goes well, the service daemon will look for jobs, see the job you submitted earlier, and launch a worker process to run the job. The worker process will call the run() method you defined, logging the job arguments to the Helios log and marking the job as completed. You'll see the job arguments printed on the screen:
*** JOBID: 9 param: myarg1 value: This is a test! ***
Once all that is done, you'll see a "0 waiting TestService jobs." message. At that point you can press Ctrl-C to exit the service daemon. You can also open another terminal session and submit another job and watch it being processed if you like.
(If you didn't enable HELIOS_DEBUG, the service daemon will still do all the things described, but you'll only see a message that your TestService class was loaded, and then helios.pl will daemonize, disconnecting from your terminal in the process.)
If you want to check the log messages your service wrote to the log while processing the job, you can use the helios_job_info command to find out a job's start and complete times, whether it ran successfully, and any log messages it recorded. If you have the jobid from the job submitted earlier, issue a command like this:
helios_job_info --jobid=9 --args --logs
to see a full report on the job like the one below:
Jobid: 9
Jobtype: Helios::TestService
Submit Time: Fri Mar 7 17:12:08 2014
Complete Time: Fri Mar 7 17:13:00 2014
Exitstatus: 0
Args:
<job><params><myarg1>This is a test</myarg1></params></job>
Logs:
Fri Mar 7 17:13:00 2014 [localhost:13432] INFO Helios::TestService says, "Hello World!"
Fri Mar 7 17:13:00 2014 [localhost:13432] INFO JOBARG=myarg1 VALUE=This is a test
You can also use the Helios::Panoptes web application to view and search the Helios system log. In addition to messages related to specific jobs, Helios::Panoptes will also show log messages that the helios.pl service daemon logged about starting up, seeing jobs, and launching processes to handle those jobs. It is worth becoming familiar with these messages so will be able to understand what is happening to your jobs and services as you develop, deploy, and manage services in your Helios collective.
SUBMITTING JOBS
In the previous section, you saw that you can submit jobs to Helios using the helios_job_submit.pl command line program. There are actually 3 ways to submit jobs to Helios:
helios_job_submit.pl, a shell program
over HTTP with the included submitJob.pl CGI script
in your own Perl programs, using the Helios::Job class
If you want to submit jobs via the shell or over HTTP, check the perldoc for helios_job_submit.pl and submitJob.pl for more information.
Sometimes you need more integration than a shell or CGI script can provide, especially if you're running in a persistent environment like FastCGI or mod_perl. In those cases, you should use the Helios job submission API directly.
To use the Helios job submission API, you initialize Helios using the Helios::Service class, create a Helios::Job object, and submit it to the system.
For example:
use strict;
use warnings;
use Helios::Service;
use Helios::Job;
# create a Helios::Service object, initialize it with prep()
# then get the $config hash with getConfig()
my $service = Helios::Service->new();
$service->prep() or die($service->errstr);
my $config = $service->getConfig();
# create your job arguments in XML
# then instantiate a Helios::Job object
# give it the Helios $config with setConfig()
# tell it the service class that should process the job with setJobType()
# set your job arguments with setArgString()
my $jobxml = '<job><params><filename>Rise.mp3</filename></params></job>';
my $job = Helios::Job->new();
$job->setConfig($config);
$job->setJobType('MP3IndexerService');
$job->setArgString($jobxml);
# finally, submit the job to the system
my $jobid = $job->submit();
The first thing to do is to instantiate a Helios::Service object, call the prep() method to parse the configuration and initialize a connection to the Helios collective database, and get the basic Helios configuration by calling the getConfig() method.
Once you have the Helios configuration, you're ready to create your job. Create an XML string specifying the job arguments in XML. Then instantiate the Helios::Job object with the new() method. Give your job object the Helios configuration you retrieved earlier (with setConfig()) and the name of the service class you want to service the job (with setJobType()). Finally, set the job's arguments by using the setArgString() method.
Then submit the job to Helios using the submit() method. If the job submission was successful, submit() will return the jobid of the newly submitted job. If something goes wrong, submit() will throw an exception.
Once the job is submitted, it goes into the Helios collective's job queue marked for the service you specified. When a service with that name starts, the helios.pl daemon will see jobs for that service are available, and will launch worker processes to process them. The worker processes will pull the jobs from the queue and call your service's run() method, passing it the Helios::Job object. Once your run() method has marked the job as a success or failure and returned, the worker process will end or, if the OVERDRIVE configuration parameter has been set, the worker process will pull another job from the queue and call your service's run() method again.
JOB ARGUMENT XML
Helios job arguments are normally specified in XML-like markup that follow a relatively simple format:
<job>
<params>
<argument_tag>argumentValue</argument_tag>
...
</params>
</job>
While the markup language is definitely XML-like and must be well-formed like XML, in reality there is no DTD to validate against, and the tags in the <params> section are left entirely up to the user to define. This gives you maximal flexibility in determining the names and values of your job arguments, and also makes it simple to parse the arguments into the job argument hash for Helios services to use. Take the following job arguments, for example:
<job>
<params>
<id>456</id>
<type>blog</type>
<email>hanse@davion.gov</email>
</params>
</job>
In the run() method of a service, calling the getJobArgs() method with a job with the above arguments will yield a reference to a hash like this:
{
'id' => '456',
'type' => 'blog',
'email' => 'hanse@davion.gov'
}
So the tag names become the keys of the hash, and the enclosed strings become the hash values.
Keep in mind that although job argument XML can be flexible, the XML parser is set up to do things relatively simply, so complex XML structures should be avoided. In Helios, "jobs" are really only parameters to "services," so job arguments are best kept simple. The logic of your application should go in your Helios service class.
CONFIGURING SERVICES
In the previous simple TestService example, you saw that the service's configuration is available via the getConfig() method. But how is that configuration set up? The Helios configuration system provides the ability to centrally configure services across an entire collective and, if necessary, tailor a service's configuration on a per host basis.
The first piece of the Helios configuration system is the helios.ini file. All of the configuration parameters set in the [global] section of helios.ini are available not just to the helios.pl service daemon, but to all Helios services running in a particular collective. You may also put configuration parameters specific to your service in helios.ini by creating a section named the same as the service:
[global]
dsn=dbi:mysql:host=hostname;db=helios_db
user=helios
password=password
[TestService]
loggers=HeliosX::Logger::File
logfile_path=/var/log/helios/
logfile_priority_threshold=6
The [TestService] section here would set up the logging configuration specifically for the TestService service (see below for more about the Helios logging system). While all Helios services will see the configuration options set in the [global] section, only the TestService service will see the configuration options set in the TestService section.
While you can set the configuration options for your service in helios.ini and distribute the helios.ini between all of your hosts, that is very tedious and unwieldly way to manage a service's configuration. In addition to the helios.ini file, configuration parameters for a service can also be set using the helios_config_set command. The helios_config_set command takes 4 arguments:
- --service
-
The service you are setting the config parameter for.
- --hostname
-
The host you are setting the config parameter for. A parameter can be set to affect a service on a single host or every host in the collective. If you do not specify a --hostname, helios_config_set will assume the parameter should only affect the specified service on the current host. If you want the parameter to affect the service running on any host, set the hostname an asterisk ("*").
- --param
-
The name of the config parameter to set.
- --value
-
The actual value of the parameter to set.
For example, if you want to Helios to run up to 5 TestService workers at a time on the current host, you can issue the following command to set the MAX_WORKERS config parameter:
helios_config_set --service=TestService --param=MAX_WORKERS --value=5
To enable OVERDRIVE mode on TestService workers on every host in your Helios collective, use the --hostname parameter and set it to '*':
helios_config_set --service=TestService --hostname=* --param=OVERDRIVE --value=1
If you want to check your work, you can use the helios_config_get command, with the same options:
helios_config_get --service=TestService --param=MAX_WORKERS
You can also use the helios_config_unset command to delete a parameter from the collective database entirely.
You can also use the Helios::Panoptes web application to set config parameters for your services. Also, remember that though Helios defines a lot of special configuration parameters itself, you can use the Helios configuration subsystem to specify other parameters your service might need. For example, if you have a Helios service called Indexer, which has a landing directory where it stores incoming files, you can specify a "landing_zone" parameter available to all of Indexer instances running on every host of your collective:
helios_config_set --service=Indexer --hostname=* --param=landing_zone --value=/mnt/SAN1/incoming"
Regardless of how you set configuration parameters, when your service class calls the getConfig() method, a hashref will be returned that will contain the configuration options specific to the service running on that particular host. The hash keys will be the parameter name, while the hash values will be the values specified for that particular parameter. The hash will contain:
any parameters set in the helios.ini [global] section,
any parameters set in helios.ini with section name matching the service's name,
any parameters in Helios collective database matching the service's name and a hostname set to '*'
any parameters in collective database with the service's name and a hostname set to the current host.
Each of the above items will override the config options set by the previous ones. For example, if you set a 'log_priority_threshold' option for a service for the current host, it will override any 'log_priority_threshold' options set for the service globally (hostname = '*') or in helios.ini. In this way you can set configuration options for services running across the collective but isolate specific instances of a service on particular hosts if necessary.
LOGGING
You will note in the TestService example the use of the logMsg() method to send messages to the Helios logging system. The Helios logging system is an extensible system to keep track of what goes on in the Helios system and during job processing.
Inside of your service, the logMsg() method is what you need to log messages to the Helios logging system. The logMsg() method takes 3 parameters:
the Helios::Job object of representing the current job (optional)
the priority level of the message (optional)
a string with the message you want to add to the log
If you pass a Helios::Job object in your call to logMsg(), the jobid will be recorded along with the message.
The message priority levels of messages are defined in Helios::LogEntry::Levels. If you import these levels with the ':all' tag at the beginning of your service:
use Helios::LogEntry::Levels ':all';
you can use symbols rather than integers to specify the severity of your log entry. If you don't specify a priority level, the message will default to LOG_INFO priority.
The default, internal Helios logging system records messages in a table in the Helios collective database. You can access log messages for a specific job using the helios_job_info command. You can also use the Helios::Panoptes application to view log messages for particular jobs and more system-level messages recorded by the helios.pl daemon. Helios::Panoptes will also allow you to filter and search for messages matching certain criteria.
You can check the Helios::Service man page entry for the logMsg() method for information on logging, and the Helios::Configuration page for more information about logging configuration. If you want to configure your Helios collective to use some other logging system, check the Helios::Logger man page for information about creating your own Helios interfaces to other logging systems.
A MORE USEFUL EXAMPLE
Included in the eg/ directory of your Helios distribution is a simple sample Helios application called MP3IndexerService. Unlike the TestService service class discussed in this tutorial, MP3IndexerService actually does something useful: given a list of filenames of MP3s, MP3IndexerService will parse the ID3 and other useful information and store it in a database table. It can be useful for finding duplicate copies of tracks or just reviewing the different artists, albums, etc. that you have on your hard drive. A look at its code will reveal it uses all the major Helios subsystems (job queuing, configuration, logging) in some way or another. Though it remains a very simple application, it demonstrates how easily a useful Helios application can be written.
SEE ALSO
helios.pl, Helios::Service, Helios::Job, Helios::Panoptes
AUTHOR
Andrew Johnson, <lajandy at cpan dotorg>
COPYRIGHT AND LICENSE
Copyright (C) 2012-4 by Andrew Johnson.
Portions of this document, where noted, are Copyright (C) 2008-9 by CEB Toolbox, Inc.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.0 or, at your option, any later version of Perl 5 you may have available.
WARRANTY
This software comes with no warranty of any kind.