NAME

mod_perl - Embed a Perl interpreter in the Apache HTTP server

DESCRIPTION

The Apache/Perl integration project brings together the full power of the Perl programming language and the Apache HTTP server. This is achieved by linking the Perl runtime library into the server and providing an object oriented Perl interface to the server's C language API. These pieces are seamlessly glued together by the `mod_perl' server plugin, making it is possible to write Apache modules entirely in Perl. In addition, the persistent interpreter embedded in the server avoids the overhead of starting an external interpreter and the penalty of Perl start-up (compile) time.

Without question, the most popular Apache/Perl module is Apache::Registry module. This module emulates the CGI environment, allowing programmers to write scripts that run under CGI or mod_perl without change. Existing CGI scripts may require some changes, simply because a CGI script has a very short lifetime of one HTTP request, allowing you to get away with "quick and dirty" scripting. Using mod_perl and Apache::Registry requires you to be more careful, but it also gives new meaning to the work "quick"! Apache::Registry maintains a cache of compiled scripts, which happens the first time a script is accessed by a child server or once again if the file is updated on disk.

Although it may be all you need, a speedy CGI replacement is only a small part of this project. Callback hooks are in place for each stage of a request. Apache-Perl modules may step in during the handler, header parser, uri translate, authentication, authorization, access, type check, fixup and logger stages of a request.

FAQ

The mod_perl FAQ is maintained by Frank Cringle <fdc@cliwe.ping.de>: http://perl.apache.org/faq/

Apache/Perl API

See 'perldoc Apache' for info on how to use the Perl-Apache API.

See the lib/ directory for example modules and apache-modlist.html for a comprehensive list.

See the eg/ directory for example scripts.

mod_perl

For using mod_perl as a CGI replacement see the cgi_to_mod_perl document.

You may load modules at server startup via:

PerlModule Apache::SSI SomeOther::Module

Optionally:

PerlRequire  perl-scripts/script_to_load_at_startup.pl

A PerlRequire file is commonly used for intialization during server startup time. A PerlRequire file name can be absolute or relative to ServerRoot or a path in @INC. A PerlRequire'd file must return a true value, i.e., the end of this file should have a:

1; #return true value

See eg/startup.pl for an example to start with.

In an httpd.conf <Location /foo> or .htaccess you need:

PerlHandler sub_routine_name

This is the name of the subroutine to call to handle each request. e.g. in the PerlModule Apache::Registry this is "Apache::Registry::handler".

If PerlHandler is not a defined subroutine, mod_perl assumes it is a package name which defines a subroutine named "handler".

PerlHandler   Apache::Registry

Would load Registry.pm (if it is not already) and call it's subroutine "handler".

There are several stages of a request where the Apache API allows a module to step in and do something. The Apache documentation will tell you all about those stages and what your modules can do. By default, these hooks are disabled at compile time, see the INSTALL document for information on enabling these hooks. The following configuration directives take one argument, which is the name of the subroutine to call. If the value is not a subroutine name, mod_perl assumes it is a package name which implements a 'handler' subroutine.

PerlChildInitHandler          (requires apache_1.3.0 or higher)
PerlPostReadRequestHandler    (requires apache_1.3.0 or higher)
PerlInitHandler
PerlTransHandler    
PerlHeaderParserHandler       
PerlAccessHandler
PerlAuthenHandler
PerlAuthzHandler
PerlTypeHandler
PerlFixupHandler
PerlHandler
PerlLogHandler
PerlCleanupHandler
PerlChildExitHandler          (requires apache_1.3.0 or higher)

Only ChildInit, ChildExit, PostReadRequest and Trans handlers are not allowed in .htaccess files.

Modules can check if the code is being run in the parent server during startup by checking the $Apache::Server::Starting variable.

RESTARTING

PerlFreshRestart

By default, if a server is restarted (ala kill -USR1 `cat logs/httpd.pid`), Perl scripts and modules are not reloaded. To reload PerlRequire's, PerlModule's, other use()'d modules and flush the Apache::Registry cache, enable with this command:

PerlFreshRestart On  
PERL_DESTRUCT_LEVEL

With Apache versions 1.3.0 and higher, mod_perl will call the perl_destruct() Perl API function during the child exit phase. This will cause proper execution of END blocks found during server startup along with invoking the DESTROY method on global objects who are still alive. It is possible that this operation may take a long time to finish, causing problems during a restart. If your code does not contain and END blocks or DESTROY methods which need to be run during child server shutdown, this destruction can be avoided by setting the PERL_DESTRUCT_LEVEL environment variable to -1.

ENVIRONMENT

Under CGI the Perl hash %ENV is magical in that it inherits environment variables from the parent process and will set them should a process spawn a child. However, with mod_perl we're in the parent process that would normally setup the common environment variables before spawning a CGI process. Therefore, mod_perl must feed these variables to %ENV directly. Normally, this does not happen until the response stage of a request when PerlHandler is called. If you wish to set variables that will be available before then, such as for a PerlAuthenHandler, you may use the PerlSetEnv configuration directive:

PerlSetEnv  SomeKey  SomeValue

You may also use the PerlPassEnv directive to pass an already existing environment variable to Perl's %ENV:

PerlPassEnv SomeKey 
CONFIGURATION

The PerlSetVar and PerlAddVar directives provide a simple mechanism for passing information from configuration files to Perl modules or Registry scripts.

The PerlSetVar directive allows you to set a key/value pair.

PerlSetVar  SomeKey  SomeValue

Perl modules or scripts retrieve configuration values using the $r->dir_config method.

$SomeValue = $r->dir_config('SomeKey');

The PerlAddVar directive allows you to emulate Perl arrays:

PerlAddVar  SomeKey  FirstValue
PerlAddVar  SomeKey  SecondValue
...         ...      ...
PerlAddVar  SomeKey  Nth-Value

In the Perl modules the values are extracted using the $r->dir_config->get method.

@array = $r->dir_config->get('SomeKey');

Alternatively in your code you can extend the setting with:

$r->dir_config->add(SomeKey => 'Bar');

PerlSetVar and PerlAddVar handle keys case-insensitively.

GATEWAY_INTERFACE

The standard CGI environment variable GATEWAY_INTERFACE is set to CGI-Perl/1.1 when running under mod_perl.

MOD_PERL

The environment variable `MOD_PERL' is set so scripts can say:

if(exists $ENV{MOD_PERL}) { 
    #we're running under mod_perl
    ...
}
else {
    #we're NOT running under mod_perl
}

BEGIN blocks

Perl executes BEGIN blocks during the compile time of code as soon as possible. The same is true under mod_perl. However, since mod_perl normally only compiles scripts and modules once, in the parent server or once per-child, BEGIN blocks in that code will only be run once. As perlmod explains, once a BEGIN has run, it is immediately undefined. In the mod_perl environment, this means BEGIN blocks will not be run during each incoming request unless that request happens to be one that is compiling the code.

Modules and files pulled in via require/use which contain BEGIN blocks will be executed: - only once, if pulled in by the parent process - once per-child process if not pulled in by the parent process - an additional time, once per-child process if the module is pulled in off of disk again via Apache::StatINC - an additional time, in the parent process on each restart if PerlFreshRestart is On - unpredictable if you fiddle with %INC yourself

Apache::Registry scripts which contain BEGIN blocks will be executed: - only once, if pulled in by the parent process via Apache::RegistryLoader - once per-child process if not pulled in by the parent process - an additional time, once per-child process if the script file has changed on disk - an additional time, in the parent process on each restart if pulled in by the parent process via Apache::RegistryLoader and PerlFreshRestart is On

END blocks

As perlmod explains, an END subroutine is executed as late as possible, that is, when the interpreter is being exited. In the mod_perl environment, the interpreter does not exit until the server is shutdown. However, mod_perl does make a special case for Apache::Registry scripts.

Normally, END blocks are executed by Perl during it's perl_run() function, which is called once each time the Perl program is executed, e.g. once per (mod_cgi) CGI scripts. However, mod_perl only calls perl_run() once, during server startup. Any END blocks encountered during main server startup, i.e. those pulled in by the PerlRequire or by any PerlModule are suspended and run at server shutdown, aka child_exit (requires apache 1.3.0+). Any END blocks that are encountered during compilation of Apache::Registry scripts are called after the script done is running, including subsequent invocations when the script is cached in memory. All other END blocks encountered during other Perl*Handler callbacks, e.g. PerlChildInitHandler, will be suspended while the process is running and called during child_exit when the process is shutting down. Module authors may be wish to use $r->register_cleanup as an alternative to END blocks if this behavior is not desirable.

MEMORY CONSUMPTION

Don't be alarmed by the size of your httpd after you've linked with mod_perl. No matter what, your httpd will be larger than normal to start, simply because you've linked with perl's runtime.

Here's I'm just running

% /usr/bin/perl -e '1 while 1'

  PID USERNAME PRI NICE   SIZE   RES STATE   TIME   WCPU    CPU COMMAND
10214 dougm     67    0   668K  212K run     0:04 71.55% 21.13% perl

Now with a few random modules:

% /usr/bin/perl -MDBI -MDBD::mSQL -MLWP::UserAgent -MFileHandle -MIO -MPOSIX -e '1 while 1'

10545 dougm     49    0  3732K 3340K run     0:05 54.59% 21.48% perl

Here's my httpd linked with libperl.a, not having served a single request:

10386 dougm      5    0  1032K  324K sleep   0:00  0.12%  0.11% httpd-a

You can reduce this if you configure perl 5.004+ with -Duseshrplib. Here's my httpd linked with libperl.sl, not having served a single request:

10393 dougm      5    0   476K  368K sleep   0:00  0.12%  0.10% httpd-s

Now, once the server starts receiving requests, the embedded interpreter will compile code for each 'require' file it has not seen yet, each new Apache::Registry subroutine that's compiled, along with whatever modules it's use'ing or require'ing. Not to mention AUTOLOADing. (Modules that you 'use' will be compiled when the server starts unless they are inside an eval block.) httpd will grow just as big as our /usr/bin/perl would, or a CGI process for that matter, it all depends on your setup. The mod_perl_tuning document gives advice on how to best setup your mod_perl server environment.

The mod_perl INSTALL document explains how to build the Apache:: extensions as shared libraries (with 'perl Makefile.PL DYNAMIC=1'). This may save you some memory, however, it doesn't work on a few systems such as aix and unixware.

However, on most systems, this strategy will only make the httpd look smaller. When in fact, an httpd with Perl linked static with take up less real memory and preform faster than shared libraries at the same time. See the mod_perl_tuning document for details.

MEMORY TIPS

Leaks

If you are using a module that leaks or have code of their own that leaks, in any case using the apache configuration directive 'MaxRequestsPerChild' is your best bet to keep the size down.

Perl Options

Newer Perl versions also have other options to reduce runtime memory consumption. See Perl's INSTALL file for details on -DPACK_MALLOC and -DTWO_POT_OPTIMIZE. With these options, my httpd shrinks down ~150K.

Server Startup

Use the PerlRequire and PerlModule directives to load commonly used modules such as CGI.pm, DBI, etc., when the server is started. On most systems, server children will be able to share this space.

Importing Functions

When possible, avoid importing of a module functions into your namespace. The aliases which are created can take up quite a bit of space. Try to use method interfaces and fully qualified Package::function names instead. Here's a freshly started httpd who's served one request for a script using the CGI.pm method interface:

TTY   PID USERNAME  PRI NI   SIZE   RES  STATE   TIME %WCPU  %CPU COMMAND
  p4  5016 dougm     154 20  3808K  2636K sleep   0:01  9.62  4.07 httpd

Here's a freshly started httpd who's served one request for the same script using the CGI.pm function interface:

TTY   PID USERNAME  PRI NI   SIZE   RES  STATE   TIME %WCPU  %CPU COMMAND
  p4  5036 dougm     154 20  3900K  2708K sleep   0:01  3.19  2.18 httpd

Now do the math: take that difference, figure in how many other scripts import the same functions and how many children you have running. It adds up!

Global Variables

It's always a good idea to stay away from global variables when possible. Some variables must be global so Perl can see them, such as a module's @ISA or $VERSION variables. In common practice, a combination of use strict and use vars keeps modules clean and reduces a bit of noise. However, use vars also creates aliases as the Exporter does, which eat up more space. When possible, try to use fully qualified names instead of use vars. Example:

package MyPackage;
use strict;
@MyPackage::ISA = qw(...);
$MyPackage::VERSION = "1.00";

vs.

package MyPackage;
use strict;
use vars qw(@ISA $VERSION);
@ISA = qw(...);
$VERSION = "1.00";
Further Reading

In case I forgot to mention, read Vivek Khera's mod_perl_tuning document for more tips on improving Apache/mod_perl performance.

SWITCHES

Normally when you run perl from the command line or have the shell invoke it with `#!', you may choose to pass perl switch arguments such as -w or -T. Since the command line is only parsed once, when the server starts, these switches are unavailable to mod_perl scripts. However, most command line arguments have a perl special variable equivilant. For example, the $^W variable coresponds to the -w switch. Consult perlvar for more details. With mod_perl it is also possible to turn on warnings globaly via the PerlWarn directive:

PerlWarn On

The switch which enables taint checks does not have a special variable, so mod_perl provides the PerlTaintCheck directive to turn on taint checks. In httpd.conf, enable with:

PerlTaintCheck On

Now, any and all code compiled inside httpd will be checked.

The environment variable PERL5OPT can be used to set additional perl startup flags such as -d and -D. See perlrun.

PERSISTENT DATABASE CONNECTIONS

Another popular use of mod_perl is to take advantage of it's persistance to maintain open database connections. The basic idea goes like so:

#Apache::Registry script
use strict;
use vars qw($dbh);

$dbh ||= SomeDbPackage->connect(...);

Since $dbh is a global variable, it will not go out of scope, keeping the connection open for the lifetime of a server process, establishing it during the script's first request for that process.

It's recommended that you use one of the Apache::* database connection wrappers. Currently for DBI users there is Apache::DBI and for Sybase users Apache::Sybase::DBlib. These modules hide the peculiar code example above. In addition, different scripts may share a connection, minimizing resource consumption. Example:

#httpd.conf has
# PerlModule Apache::DBI
#DBI scripts look exactly as they do under CGI
use strict;
my $dbh = DBI->connect(...);

Although $dbh shown here will go out of scope when the script ends, the Apache::DBI module's reference to it does not, keep the connection open.

WARNING: Do not attempt to open a persistent database connection in the parent process (via PerlRequire or PerlModule). If you do, children will get a copy of this handle, causing clashes when the handle is used by two processes at the same time. Each child must have it's own unique connection handle.

STACKED HANDLERS

With the mod_perl stacked handlers mechanism, it is possible for more than one Perl*Handler to be defined and run during each stage of a request.

Perl*Handler directives can define any number of subroutines, e.g. (in config files)

PerlTransHandler OneTrans TwoTrans RedTrans BlueTrans

With the method, Apache->push_handlers, callbacks can be added to the stack by scripts at runtime by mod_perl scripts.

Apache->push_handlers takes the callback hook name as it's first argument and a subroutine name or reference as it's second. e.g.:

Apache->push_handlers("PerlLogHandler", \&first_one);

$r->push_handlers("PerlLogHandler", sub {
    print STDERR "__ANON__ called\n";
    return 0;
});

After each request, this stack is cleared out.

All handlers will be called unless a handler returns a status other than OK or DECLINED, this needs to be considered more. Post apache-1.2 will have a DONE return code to signal termiation of a stage, which Rob and I came up with while back when first discussing the idea of stacked handlers. 2.0 won't come for quite sometime, so mod_perl will most likely handle this before then.

example uses:

CGI.pm maintains a global object for it's plain function interface. Since the object is global, it does not go out of scope, DESTROY is never called. CGI->new can call:

Apache->push_handlers("PerlCleanupHandler", \&CGI::_reset_globals);

This function will be called during the final stage of a request, refreshing CGI.pm's globals before the next request comes in.

Apache::DCELogin establishes a DCE login context which must exist for the lifetime of a request, so the DCE::Login object is stored in a global variable. Without stacked handlers, users must set

PerlCleanupHandler Apache::DCELogin::purge

in the configuration files to destroy the context. This is not "user-friendly". Now, Apache::DCELogin::handler can call:

Apache->push_handlers("PerlCleanupHandler", \&purge);

Persistent database connection modules such as Apache::DBI could push a PerlCleanupHandler handler that iterates over %Connected, refreshing connections or just checking that ones have not gone stale. Remember, by the time we get to PerlCleanupHandler, the client has what it wants and has gone away, we can spend as much time as we want here without slowing down response time to the client.

PerlTransHandlers may decide, based or uri or other condition, whether or not to handle a request, e.g. Apache::MsqlProxy. Without stacked handlers, users must configure:

PerlTransHandler Apache::MsqlProxy::translate
PerlHandler      Apache::MsqlProxy

PerlHandler is never actually invoked unless translate() sees the request is a proxy request ($r->proxyreq), if it is a proxy request, translate() set $r->handler("perl-script"), only then will PerlHandler handle the request. Now, users do not have to specify 'PerlHandler Apache::MsqlProxy', the translate() function can set it with push_handlers().

Includes, footers, headers, etc., piecing together a document, imagine (no need for SSI parsing!):

PerlHandler My::Header Some::Body A::Footer

This was my first test:

#My.pm
package My;

sub header {
    my $r = shift;
    $r->content_type("text/plain");
    $r->send_http_header;
    $r->print("header text\n");
}
sub body   { shift->print("body text\n")   }
sub footer { shift->print("footer text\n") }
1;
__END__ 
#in config
<Location /foo>
SetHandler "perl-script"
PerlHandler My::header My::body My::footer
</Location>

Parsing the output of another PerlHandler? this is a little more tricky, but consider:

<Location /foo>
  SetHandler "perl-script"
  PerlHandler OutputParser SomeApp 
</Location>
<Location /bar>
  SetHandler "perl-script"
  PerlHandler OutputParser AnotherApp
</Location>

Now, OutputParser goes first, but it untie's *STDOUT and re-tie's to it's own package like so:

 package OutputParser;

 sub handler {
     my $r = shift; 
     untie *STDOUT;	
     tie *STDOUT => 'OutputParser', $r;
 }

 sub TIEHANDLE {
     my($class, $r) = @_;
     bless { r => $r}, $class;
 }

 sub PRINT {
     my $self = shift;
     for (@_) {
         #do whatever you want to $_
	 $self->{r}->print($_ . "[insert stuff]");
     }
 }

 1;
 __END__

To build in this feature, configure with:

% perl Makefile.PL PERL_STACKED_HANDLERS=1 [PERL_FOO_HOOK=1,etc]

Another method 'Apache->can_stack_handlers' will return TRUE if mod_perl was configured with PERL_STACKED_HANDLERS=1, FALSE otherwise.

PERL METHOD HANDLERS

See mod_perl_method_handlers.

PERL SECTIONS

With <Perl></Perl> sections, it is possible to configure your server entirely in Perl.

<Perl> sections can contain *any* and as much Perl code as you wish. These sections are compiled into a special package who's symbol table mod_perl can then walk and grind the names and values of Perl variables/structures through the Apache core config gears. Most of the configurations directives can be represented as $Scalars or @Lists. A @List inside these sections is simply converted into a single-space delimited string for you inside. Here's an example:

#httpd.conf
<Perl>
@PerlModule = qw(Mail::Send Devel::Peek);

#run the server as whoever starts it
$User  = getpwuid($>) || $>;
$Group = getgrgid($)) || $); 

$ServerAdmin = $User;

</Perl>

Block sections such as <Location></Location> are represented in a %Hash, e.g.:

 $Location{"/~dougm/"} = {
     AuthUserFile => '/tmp/htpasswd',
     AuthType => 'Basic',
     AuthName => 'test',
     DirectoryIndex => [qw(index.html index.htm)],	
     Limit => {
	 METHODS => 'GET POST',
	 require => 'user dougm',
     },
 };

 #If a Directive can take say, two *or* three arguments
 #you may push strings and the lowest number of arguments
 #will be shifted off the @List
 #or use array reference to handle any number greater than
 #the minimum for that directive

 push @Redirect, "/foo", "http://www.foo.com/";

 push @Redirect, "/imdb", "http://www.imdb.com/";

 push @Redirect, [qw(temp "/here" "http://www.there.com")];

Other section counterparts include %VirtualHost, %Directory and %Files.

These are somewhat boring examples, but they should give you the basic idea. You can mix in any Perl code your heart desires. See eg/httpd.conf.pl and eg/perl_sections.txt for some examples.

A tip for syntax checking outside of httpd:

<Perl>
#!perl

#... code here ...

__END__
</Perl>

Now you may run perl -cx httpd.conf.

It may be the case that <Perl> sections are not completed or an oversight was made in an certain area. If they do not behave as you expect, please send a report to the modperl mailing list.

To configure this feature build with 'perl Makefile.PL PERL_SECTIONS=1'

mod_perl and mod_include integration

As of apache 1.2.0, mod_include can handle Perl callbacks.

A `sub' key value may be anything a Perl*Handler can be: subroutine name, package name (defaults to package::handler), Class->method call or anonymous sub {}

Example:

Child <!--#perl sub="sub {print $$}" --> accessed
<!--#perl sub="sub {print ++$Access::Cnt }" --> times. <br>

<!--#perl sub="Package::handler" arg="one" arg="two" -->

#don't forget to escape double quotes!
Perl is
       <!--#perl sub="sub {for (0..10) {print \"very \"}}"-->
       fun to use!

The Apache::Include module makes it simple to include Apache::Registry scripts with the mod_include perl directive.

Example:

<!--#perl sub="Apache::Include" arg="/perl/ssi.pl" -->

You can also use 'virtual include' to include Apache::Registry scripts of course. However, using #perl will save the overhead of making Apache go through the motions of creating/destroying a subrequest and making all the necessary access checks to see that the request would be allowed outside of a 'virtual include' context.

To enable perl in mod_include parsed files, when building apache the following must be present in the Configuration file:

EXTRA_CFLAGS=-DUSE_PERL_SSI -I. `perl -MExtUtils::Embed -ccopts`

mod_perl's Makefile.PL script can take care of this for you as well:

perl Makefile.PL PERL_SSI=1

If you're interested in sprinkling Perl code inside your HTML documents, you'll also want to look at the Apache::Embperl (http://perl.apache.org/embperl/), Apache::ePerl and Apache::SSI modules.

DEBUGGING

MOD_PERL_TRACE

To enable mod_perl debug tracing configure mod_perl with the PERL_TRACE option:

perl Makefile.PL PERL_TRACE=1

The trace levels can then be enabled via the MOD_PERL_TRACE environment variable which can contain any combination of:

d - Trace directive handling during configuration read
s - Trace processing of perl sections
h - Trace Perl*Handler callbacks
g - Trace global variable handling, intepreter construction, END blocks, etc.
all - all of the above
spinning httpds

To see where an httpd is "spinning", try adding this to your script or a startup file:

use Carp ();
$SIG{'USR1'} = sub { 
   Carp::confess("caught SIGUSR1!");
};

Then issue the command line:

kill -USR1 <spinning_httpd_pid>

PROFILING

It is possible to profile code run under mod_perl with the Devel::DProf module available on CPAN. However, you must have apache version 1.3.0 or higher and the PerlChildExitHandler enabled. When the server is started, Devel::DProf installs an END block to write the tmon.out file, which will be run when the server is shutdown. Here's how to start and stop a server with the profiler enabled:

% setenv PERL5OPT -d:DProf
% httpd -X -d `pwd` &
... make some requests to the server here ...
% kill `cat logs/httpd.pid`
% unsetenv PERL5OPT
% dprofpp

See also: Apache::DProf

BENCHMARKING

How much faster is mod_perl that CGI? There are many ways to benchmark the two, see the benchmark/ directory for some examples.

See also: Apache::Timeit

WARNINGS

See mod_perl_traps.

SUPPORT

See the SUPPORT file.

Win32

See INSTALL.win32 for building from sources.

Info about win32 binary distributions of mod_perl are available from:

http://perl.apache.org/distributions/

REVISION

$Id: mod_perl.pod,v 1.22 2002/01/02 09:41:23 stas Exp $

AUTHOR

Doug MacEachern