Server Configuration
The next step after building and installing your new mod_perl enabled Apache server, is to configure the server. The configuration process consists of two parts: Apache and mod_perl specific directives configuration.
Prior to version 1.3.4, the default Apache install used three configuration files -- httpd.conf, srm.conf, and access.conf. The 1.3.4 version began distributing the configuration directives in a single file -- httpd.conf. This Guide uses the httpd.conf in its examples.
So the only file that you should need to edit is httpd.conf that by default is put into a conf directory under the document root. The document root is the directory that you choose for Apache installation or the default one, which is /usr/local/apache/ on many UNIX platforms.
Apache Configuration
To minimize the number of things that can go wrong, it can be a good idea to configure Apache itself first (without mod_perl) and make sure that it works.
The Apache distibution comes with an extensive configuration manual and in addition each section of the configuration file includes helpful comments explaining how every directive should be configured and what the defaults values are.
Configuration Directives
If you didn't move Apache's directories around, the installation program will have configured everything for you. Just start the server and test it working. To start the server use the apachectl
utility which comes bundled with Apache distribution and resides in the same directory with httpd
(the Apache server itself). Execute:
/usr/local/apache/bin/apachectl start
Now you can test the server, by trying to access it from http://localhost .
For a basic setup there are just a few things to configure. If you have moved directories you have to update them in httpd.conf. There are many of them, here are just a few of them:
ServerRoot "/usr/local/apache"
DocumentRoot "/home/httpd/docs"
You should set a name for your machine as it's to be known to the external world if it's not a testing machine and referring to it as localhost
isn't what you want.
ServerName www.example.com
If you want to run it on a different from port 80, edit the Port
directive.
Port 8080
You might want to change the user and group names the server will run under. Note that if started as root user (which is generally the case), the parent process will continue to run as root, but children will run as the user and group you have specified. For example:
User nobody
Group nobody
There are other directives that you might need to configure as well, as mentioned earlier you will find them all in httpd.conf.
After single valued directives come the Directory
and Location
sections of configuration. That's the place where for each directory and location you can determine its unique behaviour, which will apply to every request that happens to fall into its domain.
<Directory>, <Location> and <Files>
I'll explain the basics of the <Directory
>, <Location
> and <Files
> sections configuration. Remember that there is more to know and the rest of the information is available in the Apache documentation. The information I'll present here is important for understanding the mod_perl configuration section.
<Directory directory
> ...</Directory
>Can appear in server and virtual host configurations.
<Directory
> and</Directory
> are used to enclose a group of directives which will apply only to the named directory and sub-directories of that directory. Any directive which is allowed in a directory context may be used.Directory
is either the full path to a directory, or a wild-card string. In a wild-card string,?
matches any single character,*
matches any sequences of characters, and[]
character ranges. (This is similar to the shell's file globs.) None of the wildcards will match a `/' character. For example:<Directory /home/httpd/docs> Options Indexes </Directory>
If you want to use a regex to match then the
<DirectoryMatch regex
> ...</DirectoryMatch
> syntax should be used.If multiple (non-regular expression) directory sections match the directory (or its parents) containing a document, then the directives are applied in the order of shortest match first, interspersed with the directives from the .htaccess files. For example, with
<Directory /> AllowOverride None </Directory> <Directory /home/httpd/docs/*> AllowOverride FileInfo </Directory>
for access to the document /home/httpd/docs/index.html the steps are:
Apply directive
AllowOverride None
(disabling .htaccess files).Apply directive
AllowOverride FileInfo
(for directory /home/httpd/docs/).Apply any
FileInfo
directives in /home/httpd/docs/.htaccess.
<Files filename
> ...</Files
>Can appear in server and virtual host configurations, and .htaccess files as well.
The
<Files
> directive provides for access control by filename. It is comparable to the<Directory
> directive and<Location
> directives. It should be closed with a</Files
> directive. The directives given within this section will be applied to any object with a basename (last component of filename) matching the specified filename.<Files
> sections are processed in the order they appear in the configuration file, after the<Directory
> sections and .htaccess files are read, but before<Location
> sections. Note that<Files
> can be nested inside<Directory
> sections to restrict the portion of the filesystem they apply to.The filename argument should include a filename, or a wild-card string, where
?
matches any single character, and*
matches any sequences of characters. Extended regular expressions can also be used, with the addition of the~
character. For example:<Files ~ "\.(gif|jpe?g|png)$">
would match most common Internet graphics formats. Another alternative is the
<FilesMatch regex
> ...</FilesMatch
> syntax.<Location URL> ... </Location>
Can appear in server and virtual host configurations.
The
<Location
> directive provides for access control by URL. It is similar to the<Directory
> directive, and starts a subsection which is terminated with a</Location
> directive.<Location
> sections are processed in the order they appear in the configuration file, after the<Directory
> sections and .htaccess files are read, and after the<Files
> sections.This is the directive that is most often used with mod_perl.
URLs do not have to refer to real directories or files within the filesystem at all,
<Location
> operates completely outside the filesystem. Indeed it may be wise to ensure that<Location
>s do not match real paths to avoid confusion.The URL may use wildcards. In a wild-card string,
?
matches any single character, and*
matches any sequences of characters. For regex matches use the<LocationMatch regex
> ...</LocationMatch
> syntax.The
Location
functionality is especially useful when combined with theSetHandler
directive. For example, to enable status requests, but allow them only from browsers at example.com, you might use:<Location /status> SetHandler server-status order deny,allow deny from all allow from .example.com </Location>
How Directory, Location and Files Sections are Merged
When configuring the server, it's important to understand the order in which the rules of each section apply to requests. The order of merging is:
- 1
<Directory
> (except regular expressions) and .htaccess are processed simultaneously (with .htaccess overriding<Directory
>) - 1
<DirectoryMatch
>, and<Directory
> with regular expressions - 1
<Files
> and<FilesMatch
> are processed simultaneously - 1
<Location
> and<LocationMatch
> are processed simultaneously
Apart from <Directory
>, each group is processed in the order that they appear in the configuration files. <Directory
> (group 1 above) is processed in the order shortest directory component to longest. If multiple <Directory
> sections apply to the same directory then they are processed in the configuration file order.
Sections inside <VirtualHost
> sections are applied after the corresponding sections outside the virtual host definition. This allows virtual hosts to override the main server configuration.
Sub-Grouping of <Location>, <Directory> and <Files> Sections
Let's say that you want all files, except for a few of files in a specific directory and below, to be handled in the same way. For example if we want all files in /home/http/docs to be served as plain files, but files with ending .html and .txt to be processed by the content handler of our Apache::MyFilter
module.
<Directory /home/httpd/docs>
<FilesMatch "\.(html|txt)$">
SetHandler perl-script
PerlHandler Apache::MyFilter
</FilesMatch>
</Directory>
Thus, it is possible to embed sections inside sections to create subgroups which have their own distinct behavior. Alternatively you can use <Files
> inside an .htaccess
file.
Note that you can't have the <Files
> and <FilesMatch
> sub-sections inside the <Location
> section, but you can inside a <Directory
> section.
Options Values Merging
Normally, if multiple Options
directives could apply to a directory, then the most specific one is taken complete; the options are not merged. However if all the options on the Options
directive are preceded by a +
or -
symbol, the options are merged. Any options preceded by +
are added to the options currently in force, and any options preceded by -
are removed.
For example, without any +
and -
symbols:
<Directory /home/httpd/docs>
Options Indexes FollowSymLinks
</Directory>
<Directory /home/httpd/docs/shtml>
Options Includes
</Directory>
then only Includes
will be set for the /home/httpd/docs/shtml directory. However if the second Options
directive uses the +
and <-> symbols:
<Directory /home/httpd/docs>
Options Indexes FollowSymLinks
</Directory>
<Directory /home/httpd/docs/shtml>
Options +Includes -Indexes
</Directory>
then the options FollowSymLinks
and Includes
are also set for the /home/httpd/docs/shtml directory.
mod_perl Configuration
When you have tested that the Apache server works on your machine, it's time to configure mod_perl. Part of the configuration directives are already familiar to you, but mod_perl introduces a few new ones.
It can be a good idea to keep all the mod_perl related configuration at the end of the configuration file, after the native Apache configurations.
META: explain Include file directive to load mod_perl side configuration.
Alias Configurations
First, you need to specify the locations on a file-system where the scripts will be found.
Add configuration directives like these but reflecting your own file-system:
# for plain cgi-bin:
ScriptAlias /cgi-bin/ /usr/local/myproject/cgi/
# for Apache::Registry mode
Alias /perl/ /usr/local/myproject/cgi/
# Apache::PerlRun mode
Alias /cgi-perl/ /usr/local/myproject/cgi/
Alias
provides a mapping of a URL to a file system object under mod_perl
. ScriptAlias
is being used for mod_cgi
.
Alias defines the start of the URL path to the script you are referencing. For example, using the above configuration, fetching http://www.example.com/perl/test.pl, will cause the server to look for the file test.pl at /usr/local/myproject/cgi, and execute it as an Apache::Registry
script if we define Apache::Registry
to be the handler for the /perl location (see below).
The URL http://www.example.com/perl/test.pl will also be mapped to /usr/local/myproject/cgi/test.pl. This means that you can have all your CGI scripts located at the same place in the file-system, and call the script in any of three modes simply by changing the directory name component of the URL (cgi-bin|perl|cgi-perl). This makes it easy to migrate your scripts to mod_perl. (Although this is the configuration we have used above, i.e. all three Aliases pointing to the same directory within your file system, you can of course have them point to different directories if you prefer.)
If your script does not seem to be working while running under mod_perl, you can easily call the script in straight mod_cgi mode without making any script changes (in most cases), simply by changing the URL you invoke it with.
ScriptAlias
is actually the same as:
Alias /foo/ /path/to/foo/
SetHandler cgi-handler
where SetHandler cgi-handler
invokes mod_cgi. The latter will be overwritten if you enable Apache::Registry
. In other words, ScriptAlias
does not work for mod_perl, it only appears to work when the additional configuration is in there. If the Apache::Registry
configuration came before the ScriptAlias
, scripts would be run under mod_cgi. While handy, ScriptAlias
is a known kludge--it's always better to use Alias
and SetHandler
.
Of course you can choose any other alias (will be used later in configuration). All three modes or part of them can be used. But you should remember that it is undesirable to run scripts in plain mod_cgi from a mod_perl-enabled server--the price is too high, it is better to run these on a plain Apache server. (See Standalone mod_perl Enabled Apache Server)
<Location> Configuration
The <Location
> section assigns a number of rules which the server should follow when the request's URI matches the Location domain. It's widely accepted to use /perl as a base URI of the perl scripts running under mod_perl, like /cgi-bin for mod_cgi. Let's review the following very widely used <Location
> section:
<Location /perl>
SetHandler perl-script
PerlHandler Apache::Registry
Options ExecCGI
allow from all
PerlSendHeader On
</Location>
This configuration causes all requests' URI starting with /perl to be handled by the mod_perl Apache module with the handler from the Apache::Registry
Perl module. Let's review the directives inside the <Location
> section in the example:
<Location /perl>
Remember the Alias from the above section? We use the same Alias
here; if you were to use a Location
that does not have the same Alias
, the server will fail to locate the script in the file system. You needed the Alias
setting only if the code that should be executed is located in the file. So Alias
just provides the URI to filepath translation rule.
Sometimes there is no script to be executed. Instead there is some module whose method is being executed, similar to /perl-status, where the code is stored in an Apache module. In such cases we don't need Alias
settings for those <Location
>s.
SetHandler perl-script
This assigns the mod_perl Apache module to handle the content generation phase.
PerlHandler Apache::Registry
Here we tell Apache to use the Apache::Registry
Perl module for the actual content generation.
Options ExecCGI
The Options
directive accepts various parameters (options), one of which is the ExecCGI
option that tells the server that the file is a program and should be executed, instead of just displayed like a plain html file. If you omit this option then depending on the clients configuration, the script will either be rendered as plain text or trigger a Save-As dialog.
allow from all
This directive is used to set access control based on domain. The above settings allows any client to run the script from any domain.
PerlSendHeader On
PerlSendHeader On
tells the server to send an HTTP header to the browser on every script invocation. You will want to turn this off for nph (non-parsed-headers) scripts.
The PerlSendHeader On
setting invokes ap_send_http_header()
after parsing your script headers. It is only meant for CGI emulation, and it's always better to use CGI->header
from the CGI.pm
module or $r->send_http_header
directly to send the HTTP header.
</Location>
Closes the <Location
> section definition.
Note that sometimes you will have to preload the module before using it in the <Location
> section. In the case of Apache::Registry
the configuration will look like this:
PerlModule Apache::Registry
<Location /perl>
SetHandler perl-script
PerlHandler Apache::Registry
Options ExecCGI
allow from all
PerlSendHeader On
</Location>
PerlModule
is equal to Perl's native use() function call.
No changes are required to the /cgi-bin location (mod_cgi), since it has nothing to do with mod_perl.
Here is another very similar example this time using Apache::PerlRun
(More about Apache::PerlRun):
<Location /cgi-perl>
SetHandler perl-script
PerlHandler Apache::PerlRun
Options ExecCGI
allow from all
PerlSendHeader On
</Location>
The only difference from the Apache::Registry
configuration is the argument of the PerlHandler
directive, where Apache::Registry
has been replaced with Apache::PerlRun
.
PerlModule and PerlRequire Directives
As we saw earlier the module should be loaded before it is used. PerlModule
and PerlRequire
are the two mod_perl directives equivalent to the Perl's use() and require() respectively. Since they are equivalent, the same rules apply to their arguments. Thus you would pass Apache::DBI
as an argument for PerlModule
, and Apache/DBI.pm
for PerlRequire
.
You may load modules from the configuration file at server startup e.g.:
PerlModule Apache::DBI CGI DBD::Mysql
Generally the modules are preloaded from the startup script, usually called startup.pl. This is a file with plain perl code which is executed through the PerlRequire
directive. For example:
PerlRequire /home/httpd/perl/lib/startup.pl
As with any file with Perl code that gets require()'d--it must return a true value. To ensure that this happens don't forget to add 1;
at the end of file.
Perl*Handlers
As you know Apache specifies about eleven phases of the request loop, namely (and in order): Post-Read-Request, URI Translation, Header Parsing, Access Control, Authentication, Authorization, MIME type checking, FixUp, Response (Content phase), Logging and finally Cleanup. These are the stages of a request where the Apache API allows a module to step in and do something. There is a dedicated PerlHandler for each of these stages, specifically:
PerlChildInitHandler
PerlPostReadRequestHandler
PerlInitHandler
PerlTransHandler
PerlHeaderParserHandler
PerlAccessHandler
PerlAuthenHandler
PerlAuthzHandler
PerlTypeHandler
PerlFixupHandler
PerlHandler
PerlLogHandler
PerlCleanupHandler
PerlChildExitHandler
The first four handlers cannot be used in <Location
>, <Directory
> or <Files
> sections nor in .htaccess
files; this is mainly because all of them require a known path to the file in order to bind a requested path with one or more of the identifiers above. Starting from PerlHeaderParserHandler
(5th) the URI is already being mapped to a physical pathname, and thus can be used to match the <Location
>, <Directory
> or <Files
> configuration section, or to look in a .htaccess
file if exists at the specified directory in the translated path.
The Apache documentation (or even better -- the "Writing Apache Modules with Perl and C" book by Doug MacEachern and Lincoln Stein) will tell you all about those stages and what your modules can do. By default, these hooks are disabled at compile time, see the INSTALL document for information on enabling them.
Note that by default the Perl API expects a subroutine called handler
to handle the request in the registered PerlHandler module. Thus if your module implements this subroutine, you can register the handler like this:
Perl*Handler Apache::SomeModule
Replace Perl*Handler with the name of a specific handler from the list given above. mod_perl will preload the specified module for you. But if you decide to give the handler routine a different name, like my_handler
, you must preload the module and explicitly write the chosen name:
PerlModule Apache::SomeModule
Perl*Handler Apache::SomeModule::my_handler
Please note that the former approach will not preload the module at startup, so you should either explicitly preload it with the PerlModule
directive, or add it to the startup file, or use a nice shortcut the Perl*Handler
syntax provides:
Perl*Handler +Apache::SomeModule
Notice the leading +
character. It's equivalent to:
PerlModule Apache::SomeModule
Perl*Handler Apache::SomeModule
If a module needs to know which handler is currently being run, it can find out with the current_callback method. This method is most useful to PerlDispatchHandlers which wish to take action for certain phases only.
if($r->current_callback eq "PerlLogHandler") {
$r->warn("Logging request");
}
Stacked Handlers
With the mod_perl stacked handlers mechanism, it is possible for more than one Perl*Handler
to be defined and run during each stage of a request.
Perl*Handler directives can define any number of subroutines, e.g. (in configuration files)
PerlTransHandler OneTrans TwoTrans RedTrans BlueTrans
With the method, Apache->push_handlers()
, callbacks can be added to the stack by scripts at runtime by mod_perl scripts.
Apache->push_handlers()
takes the callback hook name as its first argument and a subroutine name or reference as its second. e.g.:
Apache->push_handlers("PerlLogHandler", \&first_one);
$r->push_handlers("PerlLogHandler", sub {
print STDERR "__ANON__ called\n";
return 0;
});
After each request, this stack is cleared out.
All handlers will be called unless a handler returns a status other than OK
or DECLINED
.
example uses:
CGI.pm
maintains a global object for its plain function interface. Since the object is global, it does not go out of scope, DESTROY is never called. CGI->new
can call:
Apache->push_handlers("PerlCleanupHandler", \&CGI::_reset_globals);
This function will be called during the final stage of a request, refreshing CGI.pm
's globals before the next request comes in.
Apache::DCELogin
establishes a DCE login context which must exist for the lifetime of a request, so the DCE::Login
object is stored in a global variable. Without stacked handlers, users must set
PerlCleanupHandler Apache::DCELogin::purge
in the configuration files to destroy the context. This is not "user-friendly". Now, Apache::DCELogin::handler
can call:
Apache->push_handlers("PerlCleanupHandler", \&purge);
Persistent database connection modules such as Apache::DBI
could push a PerlCleanupHandler
handler that iterates over %Connected
, refreshing connections or just checking that connections have not gone stale. Remember, by the time we get to PerlCleanupHandler
, the client has what it wants and has gone away, so we can spend as much time as we want here without slowing down response time to the client (although the process itself is unavailable for serving new requests before the operation is completed).
PerlTransHandlers
may decide, based on URI or some other condition, whether or not to handle a request, e.g. Apache::MsqlProxy
. Without stacked handlers, users must configure it themselves:
PerlTransHandler Apache::MsqlProxy::translate
PerlHandler Apache::MsqlProxy
PerlHandler
is never actually invoked unless translate()
sees the request is a proxy request ($r->proxyreq
), if it is a proxy request, translate()
sets $r->handler("perl-script")
, and only then will PerlHandler
handle the request. Now, users do not have to specify PerlHandler Apache::MsqlProxy
, the translate()
function can set it with push_handlers()
.
Includes, footers, headers, etc., piecing together a document, imagine (no need for SSI parsing!):
PerlHandler My::Header Some::Body A::Footer
A small example:
# My.pm
package My;
sub header {
my $r = shift;
$r->content_type("text/plain");
$r->send_http_header;
$r->print("header text\n");
}
sub body { shift->print("body text\n") }
sub footer { shift->print("footer text\n") }
1;
__END__
# in httpd.conf or perl.conf
<Location /foo>
SetHandler "perl-script"
PerlHandler My::header My::body My::footer
</Location>
Parsing the output of another PerlHandler? This is a little more tricky, but consider:
<Location /foo>
SetHandler "perl-script"
PerlHandler OutputParser SomeApp
</Location>
<Location /bar>
SetHandler "perl-script"
PerlHandler OutputParser AnotherApp
</Location>
Now, OutputParser goes first, but it untie()'s *STDOUT
and re-tie()'s to its own package like so:
package OutputParser;
sub handler {
my $r = shift;
untie *STDOUT;
tie *STDOUT => 'OutputParser', $r;
}
sub TIEHANDLE {
my($class, $r) = @_;
bless { r => $r}, $class;
}
sub PRINT {
my $self = shift;
for (@_) {
#do whatever you want to $_
$self->{r}->print($_ . "[insert stuff]");
}
}
1;
__END__
To build in this feature, configure with:
% perl Makefile.PL PERL_STACKED_HANDLERS=1 [PERL_FOO_HOOK=1,etc]
Another method Apache->can_stack_handlers
will return TRUE if mod_perl was configured with PERL_STACKED_HANDLERS=1
, FALSE otherwise.
Perl Method Handlers
If a Perl*Handler
is prototyped with $$
, this handler will be invoked as method. e.g.
package My;
@ISA = qw(BaseClass);
sub handler ($$) {
my($class, $r) = @_;
...;
}
package BaseClass;
sub method ($$) {
my($class, $r) = @_;
...;
}
__END__
Configuration:
PerlHandler My
or
PerlHandler My->handler
Since the handler is invoked as a method, it may inherit from other classes:
PerlHandler My->method
In this case, the My
class inherits this method from BaseClass
.
To build in this feature, configure with:
% perl Makefile.PL PERL_METHOD_HANDLERS=1 [PERL_FOO_HOOK=1,etc]
PerlFreshRestart
To reload PerlRequire
, PerlModule
, other use()
'd modules and flush the Apache::Registry
cache on server restart, add:
PerlFreshRestart On
Make sure you read Evil things might happen when using PerlFreshRestart.
Starting from mod_perl version 1.22 PerlFreshRestart
is ignored when mod_perl is DSO. But it almost doesn't matter, since mod_perl DSO will do a full tear-down (perl_destruct()) so it's still a FreshRestart, just fresher than static (non-DSO) mod_perl :)
But if you have:
PerlFreshRestart No
and mod_perl DSO--you will still get a FreshRestart.
PerlSetVar, PerlSetEnv and PerlPassEnv
PerlSetEnv key val
PerlPassEnv key
PerlPassEnv
passes, PerlSetEnv
sets and passes the ENVironment variables to your scripts. you can access them in your scripts through %ENV
(e.g. $ENV{"key"}
).
Regarding the setting of PerlPassEnv PERL5LIB
in httpd.conf: if you turn on taint checks (PerlTaintMode On
), $ENV{PERL5LIB}
will be ignored (unset).
PerlSetVar
is very similar to PerlSetEnv
, but you extract it with another method.
PerlSetVar key val
or
push @{ $Location{"/"}->{PerlSetVar} }, [ key => 'val' ];
and in the code you read it with:
my $r = Apache->request;
print $r->dir_config('key');
The above prints:
val
Note that you cannot do this:
push @{ $Location{"/"}->{PerlSetVar} }, [ key => \%hash ];
All values are treated as strings, so you will get a stringified reference to a string as a value, which cannot be revivified upon retrieval.
PerlSetupEnv
See PerlSetupEnv Off.
PerlWarn and PerlTaintCheck
For PerlWarn and PerlTaintCheck directives see 'Switches -w, -T' section.
MinSpareServers MaxSpareServers StartServers MaxClients MaxRequestsPerChild
MinSpareServers
, MaxSpareServers
, StartServers
and MaxClients
are standard Apache configuration directives that control the number of servers that can be launched at the server startup and kept alive through the server's work duration.
MaxRequestsPerChild
let's you to specify the maximum limit of requests for each child to serve. The process who served MaxRequestsPerChild
is killed and a new one replaces it.
These five directives are very important for achieving the best performance from your server. The 'Tuning Apache's Configuration Variables for the Best Performance' section provides the required details.
Start-up File
There is more that can be done at server startup, other than just preloading files, before child processes are spawned to receive incoming requests. You might want to register code that will initialize a database connection for each child when this gets forked, tie read-only dbm files, etc.
The startup file is an ideal place to put the code that should be executed when the server starts. Once you have prepared the code, load it before the rest of the mod_perl configuration directives like this:
PerlRequire /home/httpd/perl/lib/startup.pl
I must stress that all the code that is run at the server initialization time is run with root priveleges if you are executing it as a root user (you have to, unless you choose to run the server on an unpriviledged port, above 1024). This means that anyone who has write access to a script or module that is loaded by PerlModule
or PerlRequire
, effectively has root access to the system. You might want to take a look at the new and experimental PerlOpmask
directive and PERL_OPMASK_DEFAULT
compile time option to try to disable some dangerous operators.
Since the startup file is a file written in plain perl, one can validate its syntax with:
% perl -c /home/httpd/perl/lib/startup.pl
The Sample Start-up File
Let's look at a real world startup file:
startup.pl
----------
use strict;
# extend @INC if needed
use lib qw(/dir/foo /dir/bar);
# make sure we are in a sane environment.
$ENV{GATEWAY_INTERFACE} =~ /^CGI-Perl/
or die "GATEWAY_INTERFACE not Perl!";
# for things in the "/perl" URL
use Apache::Registry;
#load perl modules of your choice here
#this code is interpreted *once* when the server starts
use LWP::UserAgent ();
use Apache::DBI ();
use DBI ();
# tell me more about warnings
use Carp ();
$SIG{__WARN__} = \&Carp::cluck;
# Load CGI.pm and call its compile() method to precompile
# (but not to import) its autoloaded methods.
use CGI ();
CGI->compile(':all');
# init the connections for each child
Apache::DBI->connect_on_init
("DBI:mysql:$Match::Config::c{db}{DB_NAME}::$Match::Config::c{db}{SERVER}",
$Match::Config::c{db}{USER},
$Match::Config::c{db}{USER_PASSWD},
{
PrintError => 1, # warn() on errors
RaiseError => 0, # don't die on error
AutoCommit => 1, # commit executes immediately
}
);
Now we'll review the code explaining why each line is used.
use strict;
This pragma is worth using in every script longer than half a dozen lines. It will save a lot of time and debugging later on.
use lib qw(/dir/foo /dir/bar);
The only chance to permanently modify the @INC
before the server is started is with this command. Later the running code can modify @INC
just for the a moment it requre()'s some file, and than @INC
s value gets reset to the previous one.
$ENV{GATEWAY_INTERFACE} =~ /^CGI-Perl/
or die "GATEWAY_INTERFACE not Perl!";
A sanity check, if Apache wasn't properly built, the above code will abort the server startup.
use Apache::Registry;
use LWP::UserAgent ();
use Apache::DBI ();
use DBI ();
Preload the modules that get used by our Perl code serving the requests. Unless you need the symbols (variables and subroutines) exported by the modules you preload to accomplish something within the startup file, don't import them, since it's just a waste of startup time. Instead use the empty list ()
to tell the import() function not to import anything.
use Carp ();
$SIG{__WARN__} = \&Carp::cluck;
This is a useful snippet to enable extended warnings logged in the error_log file. In addition to basic warnings, a trace of calls is added which makes the tracking of the potential problem a much easier task, since you know who called whom. For example, with normal warnings you might see:
Use of uninitialized value at
/usr/lib/perl5/site_perl/5.005/Apache/DBI.pm line 110.
but you have no idea where it was called from. When we use the Carp
as shown above we might see:
Use of uninitialized value at
/usr/lib/perl5/site_perl/5.005/Apache/DBI.pm line 110.
Apache::DBI::connect(undef, 'mydb::localhost', 'user',
'passwd', 'HASH(0x87a5108)') called at
/usr/lib/perl5/site_perl/5.005/i386-linux/DBI.pm line 382
DBI::connect('DBI', 'DBI:mysql:mydb::localhost', 'user',
'passwd', 'HASH(0x8375e4c)') called at
/usr/lib/perl5/site_perl/5.005/Apache/DBI.pm line 36
Apache::DBI::__ANON__('Apache=SCALAR(0x87a50c0)') called at
PerlChildInitHandler subroutine
`Apache::DBI::__ANON__' line 0
eval {...} called at PerlChildInitHandler subroutine
`Apache::DBI::__ANON__' line 0
we clearly see that the warning was triggered by eval()'uating the Apache::DBI::__ANON__
which called DBI::connect
with the arguments that we see as well, which in turn called Apache::DBI::connect
method. Now we know where to look for a problem.
use CGI ();
CGI->compile(':all');
Some modules create their subroutines at run time to improve their load time. This helps when the module includes many subroutines, but only a few are actually used. CGI.pm
falls into this category. Since with mod_perl the module is loaded only once, it might be a good idea to precompile all or a part of its methods.
CGI.pm
's compile() method performs this task. Notice that this is a propietary function of this module, other modules can implement this feature or not and use this or some other name for this functionality. As with all modules we preload in the startup file, we don't import symbols from them as they will be lost when they go out of the file's scope.
Note that starting with $CGI::VERSION
2.46, the recommended method to precompile the code in CGI.pm
is:
use CGI qw(-compile :all);
But the old method is still available for backward compatibility.
See also the 'Apache::Status -- Embedded interpreter status information' section.
What Modules Should You Add to the Start-up File and Why
Every module loaded at the server startup will be shared among server children, saving a lot of RAM on your machine. Usually I put most of the code I develop into modules and preload them.
You can even preload your CGI script with Apache::RegistryLoader
and preopen the database connections with Apache::DBI
. (See Preload Perl modules at server startup).
The Confusion with use() at the Server Start-up File
Some people wonder, why you need to duplicate the use()
clause in startup file and in the script itself. The confusion arises due to misunderstanding the use()
function. use() normally performs two operations, namely require()
and import()
, called within a BEGIN
block. See the section "use()" for a detailed explanation of the use(), require() and import() functions.
In the startup file we don't want to import any symbols since they will be lost when we leave the scope of the startup file anyway, i.e. they won't be visible to any of child process in which our mod_perl scripts run. Instead we want to preload the module in the startup file and then import any symbols that we actually need in each script individually.
Normally when we write use MyModule;
, use
will both load the module and import its symbols; so for the startup file we write use MyModule ();
and the empty parantheses will ensure that the module is loaded but that no symbols are imported. Then in the actual mod_perl script that we write we use use()
in the standard way, e.g. use MyModule;
, and since the module has already been preloaded the only action taken is to import the symbols. For example in the startup file you write:
use CGI ();
since you probably don't need any symbols to be exported there. But in your code you probably would write:
use CGI qw(:html);
For example, just because you have use()'d
Apache::Constants
in the startup file, does not mean you can have the following handler:
package MyModule;
sub {
my $r = shift;
## Cool stuff goes here
return OK;
}
1;
You would either need to add:
use Apache::Constants qw( OK );
Or use the fully qualified name:
return Apache::Constants::OK;
If you want to use the function interface without exporting the symbols, use fully qualified function names, e.g. CGI::param
. The same rule applies to variables, you can import variables and you can access them by their full name. e g. $My::Module::bar
. When you use the object oriented (methods) interface you don't need to export the method symbols as well.
Technically, you aren't required to supply the use() statement in your (handler?) code if it was already loaded at server initialization/startup (i.e. PerlRequire/startup.pl ). When writing your code, you should not assume the module code has been preloaded. In the future, you or someone else will revist this code and will not understand how it is possible you used a module's method without first loading the module itself.
Read the Exporter
and perlmod
manpages for more information about import()
.
The Confusion with Global Variables in the Start-up File
PerlRequire
allows you to execute code that preloads modules and performs other functions. Imported or defined variables are visible in the scope of the startup file. It is wrong to assume that global variables that were defined in the startup file will be visible to child processes.
If you define or import variables in your scripts they will be visible inside the child process which is running the script: but they will not be shared between siblings. Remember that every script is running in a specially (uniquely) named package - so it cannot access variables from other packages unless it inherits from them or use()
's them.
Apache Configuration in Perl
With <Perl
>...</Perl
> sections in httpd.conf, it is possible to configure your server entirely in Perl.
Behind the scenes mod_perl defined a Apache::ReadConfig
package where all the variables you define inside the <Perl
> sections go to. Which means that you can create a module where you should declare the package Apache::ReadConfig
, to put the code inside it and then load it with PerlModule
, PerlRequire
or from within the startup file.
Usage
<Perl
> sections can contain any and as much Perl code as you wish. These sections are compiled into a special package whose symbol table mod_perl can then walk and grind the names and values of Perl variables/structures through the Apache core configuration gears. Most of the configuration directives can be represented as scalars ($scalar
) or lists (@list
). A @List
inside these sections is simply converted into a space delimited string for you inside. Here is an example:
httpd.conf
------------
<Perl>
@PerlModule = qw(Mail::Send Devel::Peek);
#run the server as whoever starts it
$User = getpwuid($>) || $>;
$Group = getgrgid($)) || $);
$ServerAdmin = $User;
</Perl>
Block sections such as <Location
>..</Location
> are represented in a %Location
hash, e.g.:
$Location{"/~dougm/"} = {
AuthUserFile => '/tmp/htpasswd',
AuthType => 'Basic',
AuthName => 'test',
DirectoryIndex => [qw(index.html index.htm)],
Limit => {
METHODS => 'GET POST',
require => 'user dougm',
},
};
If an Apache directive can take two or three arguments you may push strings and the lowest number of arguments will be shifted off the @List
or use an array reference to handle any number greater than the minimum for that directive:
push @Redirect, "/foo", "http://www.foo.com/";
push @Redirect, "/imdb", "http://www.imdb.com/";
push @Redirect, [qw(temp "/here" "http://www.there.com")];
Other section counterparts include %VirtualHost
, %Directory
and %Files
.
To pass all environment variables to the children with a single configuration directive, rather than listing each one via PassEnv
or PerlPassEnv
, a <Perl
> section could read in a file and:
push @PerlPassEnv, [$key => $val];
or
Apache->httpd_conf("PerlPassEnv $key $val");
These are somewhat simple examples, but they should give you the basic idea. You can mix in any Perl code your heart desires. See eg/httpd.conf.pl and eg/perl_sections.txt in the mod_perl distribution for more examples.
Assuming that you have a cluster of machines with similar homogeneous configurations and only small distinctions between them. Ideally you would want to maintain a single configuration file, but because the configurations aren't exactly the same (i.e. ServerName
directive) it's not that simple.
<Perl
> sections are coming to rescue. Now you have a single configuration file and the full power of Perl to make the local configuration tweaking. For example to solve the uniqueness of the ServerName
directive you might want to have this <Perl
> section:
<Perl>
$ServerName = `hostname`;
</Perl>
For example if you want to allow personal directories on all machines but the ones whose name is starting with secure:
<Perl>
$ServerName = `hostname`;
if ( $ServerName !~ /^secure/) {
$UserDir = "public.html";
} else {
$UserDir = "DISABLED";
}
</Perl>
Enabling
To enable <Perl
> sections you should build mod_perl with perl Makefile.PL PERL_SECTIONS=1
.
Caveats
Be careful when you declare package names inside the <Perl
> sections, for example in this code:
<Perl>
package My::Trans;
use Apache::Constants qw(:common);
sub handler{ OK }
$PerlTransHandler = "My::Trans";
</Perl>
The PerlTransHandler
you have tried to defined is actually undefined, because when you put the code inside the <Perl
> sections it's actually goes into the Apache::ReadConfig
package, which is already declared for you. If you define a different package name within <Perl
> sections make sure to close the scope of the package and return to the Apache::ReadConfig
package when you want to define the configuration parameters, like this:
<Perl>
package My::Trans;
use Apache::Constants qw(:common);
sub handler{ OK }
package Apache::ReadConfig;
$PerlTransHandler = "My::Trans";
</Perl>
The next section shows how to dump the configuration you have made with a help of the <Perl
> sections.
Verifying
To check the <Perl
> section syntax outside of httpd, we make it look like a Perl script:
<Perl>
# !perl
# ... code here ...
__END__
</Perl>
Now you may run:
perl -cx httpd.conf
You can see how have you configured the <Perl
> sections through the /perl-status location, by choosing the Perl Section Configuration from the menu. In order to make this item show up in the menu you should set $Apache::Server::SaveConfig
to a true value. When you do that the Apache::ReadConfig namespace, the configuration data is stored in, will not be flushed, making configuration data available to Perl modules at request time.
Example:
<Perl>
$Apache::Server::SaveConfig = 1;
$DocumentRoot = ...
...
</Perl>
At request time, the value of $DocumentRoot can be accessed with the fully qualified name $Apache::ReadConfig::DocumentRoot.
You can dump the configuration of <Perl
> sections like this:
<Perl>
use Apache::PerlSections();
...
# Configuration Perl code here
...
print STDERR Apache::PerlSections->dump();
</Perl>
Alternatively you can store it in a file:
Apache::PerlSections->store("httpd_config.pl");
You can then require() that file in some other <Perl
> section.
Strict <Perl> Sections
If the Perl code doesn't compile, the server won't start. If the generated Apache config is invalid, <Perl
> sections have always just logged an error and carried on, since there might be globals in the section that are not intended for the config.
$Apache::Server::StrictPerlSections = 1;
will not tolerate invalid Apache configuration syntax and croak (die) if this is the case. At this time the default value is 0
. (This variable has been added in the mod_perl version 1.22).
Debugging
If you compile modperl with PERL_TRACE=1
and set the environment variable MOD_PERL_TRACE then you should see some useful diagnostics when mod_perl is processing <Perl> sections.
Validating the Configuration Syntax
apachectl configtest
tests the configuration file without starting the server. You can safely modify the configuration file on your production server, if you can successfully run this test before you restart the server. Of course it is not 100% perfect, but it will reveal any syntax errors you might have made while editing the file.
'apachectl configtest
' is the same as 'httpd -t
' and it doesn't just parse the code in startup.pl it actually executes it. <Perl
> configuration has always started Perl during the configuration read, and Perl{Require,Module}
do so as well.
If you want your startup code to get a control over the -t
(configtest
) server launch, start the server configuration test with:
httpd -t -Dsyntax_check
and in your startup file, add (at the top):
return if Apache->define('syntax_check');
if you want to prevent the code in the file from being executed.
Enabling Remote Server Configuration Reports
The nifty mod_info module displays the complete server configuration in your browser. In order to use it you have compile it in or load as an object if the server was compiled with DSO mode enabled. Then uncomment the already prepared section in the httpd.conf file:
<Location /server-info>
SetHandler server-info
Order deny,allow
Deny from all
Allow from www.example.com
</Location>
Now restart the server and issue the request:
http://www.example.com/server-info
Publishing Port Numbers other than 80
It is advised not to publish the 8080 (or similar) port number in URLs, but rather using a proxying rewrite rule in the thin (httpd_docs) server:
RewriteRule .*/perl/(.*) http://my.url:8080/perl/$1 [P]
One problem with publishing 8080 port numbers is that I was told that IE 4.x has a bug when re-posting data to a non-port-80 url. It drops the port designator, and uses port 80 anyway.
The other reason is that the firewalls the users work behind might have all ports closed, except port 80.
Configuring Apache + mod_perl with mod_macro
mod_macro is an Apache module written by Fabien Coelho that lets you define and use macros in the Apache configuration file.
mod_macro can be really useful when you have many virtual hosts, and where each virtual host has a number of scripts/modules, most of them with a moderately complex configuration setup.
First download the latest version of mod_macro from http://www.cri.ensmp.fr/~coelho/mod_macro/ , and configure your Apache server to use this module.
Here are some useful macros for mod_perl users:
# set up a registry script
<Macro registry>
SetHandler "perl-script"
PerlHandler Apache::Registry
Options +ExecCGI
</Macro>
# example
Alias /stuff /usr/www/scripts/stuff
<Location /stuff>
Use registry
</Location>
If your registry scripts are all located in the same directory, and your aliasing rules consistent, you can use this macro:
# set up a registry script for a specific location
<Macro registry $location $script>
Alias /script /usr/www/scripts/$script
<Location $location>
SetHandler "perl-script"
PerlHandler Apache::Registry
Options +ExecCGI
</Location>
</Macro>
# example
Use registry stuff stuff.pl
If you're using content handlers packaged as modules, you can use the following macro:
# set up a mod_perl content handler module
<Macro modperl $module>
SetHandler "perl-script"
Options +ExecCGI
PerlHandler $module
</Macro>
#examples
<Location /perl-status>
PerlSetVar StatusPeek On
PerlSetVar StatusGraph On
PerlSetVar StatusDumper On
Use modperl Apache::Status
</Location>
The following macro sets up a Location for use with HTML::Embperl
. Here we define all ".html" files to be processed by Embperl
.
<Macro embperl>
SetHandler "perl-script"
Options +ExecCGI
PerlHandler HTML::Embperl
PerlSetEnv EMBPERL_FILESMATCH \.html$
</Macro>
# examples
<Location /mrtg>
Use embperl
</Location>
Macros are also very useful for things that tend to be verbose, such as setting up Basic Authentication:
# Sets up Basic Authentication
<Macro BasicAuth $realm $group>
Order deny,allow
Satisfy any
AuthType Basic
AuthName $realm
AuthGroupFile /usr/www/auth/groups
AuthUserFile /usr/www/auth/users
Require group $group
Deny from all
</Macro>
# example of use
<Location /stats>
Use BasicAuth WebStats Admin
</Location>
Finally, here is a complete example that uses macros to set up simple virtual hosts. It uses the BasicAuth macro defined previously (yes, macros can be nested!).
<Macro vhost $ip $domain $docroot $admingroup>
<VirtualHost $ip>
ServerAdmin webmaster@$domain
DocumentRoot /usr/www/htdocs/$docroot
ServerName www.$domain
<Location /stats>
Use BasicAuth Stats-$domain $admingroup
</Location>
</VirtualHost>
</Macro>
# define some virtual hosts
Use vhost 10.1.1.1 example.com example example-admin
Use vhost 10.1.1.2 example.net examplenet examplenet-admin
mod_macro is also useful in a non vhost setting. Some sites for example have lots of scripts which people use to view various statistics, email settings and etc. It is much easier to read things like:
use /forwards email/showforwards
use /webstats web/showstats
General Pitfalls
My CGI/Perl Code Gets Returned as Plain Text Instead of Being Executed by the Webserver
Check your configuration files and make sure that the "ExecCGI" is turned on in your configurations.
<Location /perl>
SetHandler perl-script
PerlHandler Apache::Registry
Options ExecCGI
allow from all
PerlSendHeader On
</Location>
My Script Works under mod_cgi, but when Called via mod_perl I Get a 'Save-As' Prompt
Did you put PerlSendHeader On in the configuration part of the <Location foo></Location>?
Is There a Way to Provide a Different startup.pl File for Each Individual Virtual Host
No. Any virtual host will be able to see the routines from a startup.pl loaded for any other virtual host.
Is There a Way to Modify @INC on a Per-Virtual-Host or Per-Location Basis.
You can use PerlSetEnv PERL5LIB ...
or a PerlFixupHandler
with the lib
pragma (use lib qw(...)
).
Even a better way is to use Apache::PerlVINC
A Script From One Virtual Host Calls a Script with the Same Path From the Other Virtual Host
This has been a bug before, last fixed in 1.15_01, i.e. if you are running 1.15, that could be the problem. You should set this variable in a startup file (PerlRequire
):
$Apache::Registry::NameWithVirtualHost = 1;
But, as we know sometimes a bug turns out to be a feature. If the same script is running for more than one Virtual host on the same machine, this can be a waste, right? Set it to 0 in a startup script if you want to turn it off and have this bug as a feature. (Only makes sense if you are sure that there will be no other scripts named by the same path/name). It also saves you some memory as well.
$Apache::Registry::NameWithVirtualHost = 0;
the Server no Longer Retrieves the DirectoryIndex Files for a Directory
The problem was reported by users who declared mod_perl configuration inside a <Directory> section for all files matching to *.pl. The problem has gone away after placing the usage of mod_perl in a <File>- section.
Configuration Security Concerns
It is better not to advertise the port mod_perl server running to the outside world for it creates a potential security risk by revealing which module(s) and/or OS you are running your web server on.
The more modules you have in your web server, the more complex the code in your webserver.
The more complex the code in your web server, the more chances for bugs.
The more chance for bugs, the more chance that some of those bugs may involve security.
Never was completely sure why the default of the ServerToken directive in Apache is Full rather than Minimal. Seems like you would only make it full if you are debugging.
For more information see Publishing Port Numbers other than 80.
Another approach is to modify httpd sources to reveal no unwanted information, so if you know the port the HEAD
request will return an empty or phony Server:
field.
Apache Restarts Twice On Start
When the server is restarted. the configuration and module initialization phases are called again (twice in total) before children get forked. The restart is done in order to ensure that the future restart will workout correctly, by making sure that all modules can survive a restart (SIGHUP). This is very important if you restart a production server.
You can control what code to execute only on the start or only on restart by checking the value of $Apache::Server::Starting
and $Apache::Server::ReStarting
respectively. The former variable is true when the server is starting and the latter when it's restarting.
(META: And add an example that writes to the log file - "was restarted 1, 2 times")
Knowing the proxy_pass'ed Connection Type
Let's say that you have a frontend server running mod_ssl, mod_rewrite and mod_proxy. You want to make sure that user is using a secure connection for some specific actions like login information submission. You don't want to let the user login unless the request was submitted through a secure port.
Since you have to proxy_pass the request between front and backend servers, you cannot know where the connection has come from. Neither is using the HTTP headers reliable.
A possible solution for this problem is to have the the mod_perl server listen on two different ports (.i.e 8000 and 8001) and have the mod_rewrite proxy rule in the regular server redirect to port 8000 and the mod_rewrite proxy rule in the SSL virtual host redirect to port 8001. In the mod_perl server just check the PORT
variable to tell if the connection is encrypted or not.
Adding Custom Configuration Directives
Well this is all covered in the Eagle Book in a great details. This is just a simple example, showing how to add your own Configuration directive.
Makefile.PL
-----------
package Apache::TestDirective;
use ExtUtils::MakeMaker;
use Apache::ExtUtils qw(command_table);
use Apache::src ();
my @directives = (
{ name => 'Directive4',
errmsg => 'Anything',
args_how => 'RAW_ARGS',
req_override=> 'OR_ALL',
},
);
command_table(\@directives);
WriteMakefile(
'NAME' => 'Apache::TestDirective',
'VERSION_FROM' => 'TestDirective.pm',
'INC' => Apache::src->new->inc,
);
TestDirective.pm
----------------
package Apache::TestDirective;
use strict;
use strict;
use Apache::ModuleConfig ();
use DynaLoader ();
if($ENV{MOD_PERL}) {
no strict;
$VERSION = '0.01';
@ISA = qw(DynaLoader);
__PACKAGE__->bootstrap($VERSION); #command table, etc.
}
sub Directive4 {
warn "Directive4 @_\n";
}
1;
__END__
In the mod_perl source tree, add this to t/docs/startup.pl:
use blib qw(/home/dougm/test/Apache/TestDirective);
and at the bottom of <It/conf/httpd.conf>:
PerlModule Apache::TestDirective
Directive4 hi
Test it:
% make start_httpd
% make kill_httpd
You should see:
Directive4 Apache::TestDirective=HASH(0x83379d0)
Apache::CmdParms=SCALAR(0x862b80c) hi
And in the error log file:
% grep Directive4 t/logs/error_log
Directive4 Apache::TestDirective=HASH(0x83119dc)
Apache::CmdParms=SCALAR(0x8326878) hi
If it didn't work as expected try building mod_perl with PERL_TRACE=1, then do:
setenv MOD_PERL_TRACE all
before starting the server. Now you should get some useful diagnostics.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 1368:
alternative text '/perl-status' contains non-escaped | or /