Standalone mod_perl Enabled Apache Server

Installation in 10 lines

The Installation is very very simple. This example shows installation on the Linux operating system.

% cd /usr/src
% lwp-download http://www.apache.org/dist/apache_x.x.x.tar.gz
% lwp-download http://perl.apache.org/dist/mod_perl-x.xx.tar.gz
% tar xzvf apache_x.x.x.tar.gz
% tar xzvf mod_perl-x.xx.tar.gz
% cd mod_perl-x.xx
% perl Makefile.PL APACHE_SRC=../apache_x.x.x/src \
  DO_HTTPD=1 USE_APACI=1 EVERYTHING=1
% make && make test && make install
% cd ../apache_x.x.x
% make install

That's all!

Notes: Replace x.xx and x.x.x with the real version numbers of mod_perl and Apache respectively. The z flag tells Gnu tar to uncompress the archive as well as extract the files. You might need superuser permissions to do the make install steps.

Installation in 10 paragraphs

If you have the lwp-download utility installed, you can use it to download the sources of both packages:

% lwp-download http://www.apache.org/dist/apache_x.x.x.tar.gz
% lwp-download http://perl.apache.org/dist/mod_perl-x.xx.tar.gz

lwp-download is a part of the LWP module (from libwww package), you will need to have it installed in order for mod_perl's make test step to pass.

Extract both sources. Usually I open all the sources in /usr/src/, but your mileage may vary. So move the sources and chdir to the directory that you want to put the sources in. If you have a non-gnu tar utility it will be unable to decompress so you will do it in two steps: first uncompress the packages with:

gzip -d apache_x.x.x.tar.gz
gzip -d mod_perl-x.xx.tar.gz

then un-tar them with:

tar xvf apache_x.x.x.tar 
tar xvf mod_perl-x.xx.tar

You can probably use gunzip instead of gzip -d if you prefer.

% cd /usr/src
% tar xzvf apache_x.x.x.tar.gz
% tar xzvf mod_perl-x.xx.tar.gz

chdir to the mod_perl source directory:

% cd mod_perl-x.xx

Now build the Makefile. For your first installation and most basic work the parameters in the example below are the only ones you will need. APACHE_SRC tells the Makefile.PL where to find the Apache src directory. If you have followed my suggestion and have extracted the both sources under the directory /usr/src, then issue the command:

% perl Makefile.PL APACHE_SRC=../apache_x.x.x/src \
  DO_HTTPD=1 USE_APACI=1 EVERYTHING=1

There are many additional optional parameters. You can find some of them later in this section and in the Server Configuration section.

While running perl Makefile.PL ... the process will check for prerequisites and tell you if something is missing. If you are missing some of the perl packages or other software, you will have to install them before you proceed.

Next make the project. The command make builds the mod_perl extension and also calls make in the Apache source directory to build httpd. Then we run the test suite, and finally install the mod_perl modules in their proper places.

% make && make test && make install

Note that if make fails, neither make test nor make install will be executed. If make test fails, make install will be not executed.

Now change to the Apache source directory and run make install. This will install Apache's headers, default configuration files, build the Apache directory tree and put httpd in it.

% cd ../apache_x.x.x
% make install

When you execute the above command, the Apache installation process will tell you how to start a freshly built webserver (you need to know the path of apachectl, more about that later) and where to find the configuration files. Write down both, since you will need this information very soon. On my machine the two important paths are:

/usr/local/apache/bin/apachectl
/usr/local/apache/conf/httpd.conf

Now the build and installation processes are complete.

Configuration

First, a simple configuration. Configure apache as you usually would (set Port, User, Group, ErrorLog, other file paths etc).

Start the server and make sure it works, then shut it down. The apachectl utility can be used to start and stop the server:

% /usr/local/apache/bin/apachectl start
% /usr/local/apache/bin/apachectl stop

Now we will configure Apache to run perl CGI scripts under Apache::Registry handler.

You can add configuration directives to a separate file and tell httpd.conf to include it, but for now we will simply add them to the main configuration file. We will add the mod_perl configuration directives to the end of httpd.conf. In fact you can place them anywhere in the file, but they are easier to find at the end.

For the moment we will assume that you will put all the scripts which you want to be executed by the mod_perl enabled server under the directory /home/httpd/perl. We will alias this directory to the URI /perl

Add the following configuration directives to httpd.conf:

Alias /perl/ /home/httpd/perl/

PerlModule Apache::Registry
<Location /perl>
  SetHandler perl-script
  PerlHandler Apache::Registry
  Options ExecCGI
  PerlSendHeader On
  allow from all
</Location>

Now create a four-line test script in /home/httpd/perl/:

test.pl
-------
#!/usr/bin/perl -w
use strict;
print "Content-type: text/html\r\n\r\n";
print "It worked!!!\n";

Note that the server is probably running as a user with a restricted set of privileges, perhaps as user nobody or www. Look for the User directive in httpd.conf to find the userid of the server.

Make sure that you have read and execute permissions for test.pl.

% chmod u+rx /home/httpd/perl/test.pl

Test that the script works from the command line, by executing it:

% /home/httpd/perl/test.pl

You should see:

Content-type: text/html

It worked!!!

Assuming that the server's userid is nobody, make the script owned by this user. We already made it executable and readable by user.

% chown nobody /home/httpd/perl/test.pl

Now it is time to test that mod_perl enabled Apache can execute the script.

Start the server ('apachectl start'). Check in logs/error_log to see that indeed the server has started--verify the correct date and time of the log entry.

To get Apache to execute the script we simply fetch its URI. Assuming that your httpd.conf has been configured with the directive Port 80, start your favorite browser and fetch the following URI:

http://www.example.com/perl/test.pl

If you have the loop-back device (127.0.0.1) configured, you can use the URI:

http://localhost/perl/test.pl

In either case, you should see:

It worked!!!

If your server is listening on a port other than 80, for example 8000, then fetch the URI:

http://www.example.com:8000/perl/test.pl

or whatever is appropriate.

If something went wrong, go through the installation process again, and make sure you didn't make a mistake. If that doesn't help, read the INSTALL pod document (perlpod INSTALL) in the mod_perl distribution directory.

Now that your mod_perl server is working, copy some of your Perl CGI scripts into the directory /home/httpd/perl/ or below it.

If your programming techniques are good, chances are that your scripts will work with no modifications at all. With the mod_perl enabled server you will see them working very much faster.

If your programming techniques are sloppy, some of your scripts will not work and they may exhibit strange behaviour. Depending on the degree of sloppiness they may need anything from minor tweaking to a major rewrite to make them work properly. (See Sometimes My Script Works, Sometimes It Does Not )

The above setup is very basic, but as with Perl, you can start to benefit from mod_perl from the very first moment you try it. As you become more familiar with mod_perl you will want to start writing Apache handlers and make more use of its power.

One Plain and One mod_perl enabled Apache Servers

Since we are going to run two Apache servers we will need two complete (and different) sets of configuration, log and other files. We need a special directory layout. While some of the directories can be shared between the two servers (assuming that both are built from the same source distribution), others should be separated. From now on I will refer to these two servers as httpd_docs (plain Apache) and httpd_perl (Apache/mod_perl).

For this illustration, we will use /usr/local as our root directory. The Apache installation directories will be stored under this root. (/usr/local/bin, /usr/local/lib and so on.)

First let's prepare the sources. We will assume that all the sources go into the /usr/src directory. Since you will probably want to tune each copy of Apache separately, it is better to use two separate copies of the Apache source for this configuration. For example you might want only the httpd_docs server to be built with the mod_rewrite module.

Having two independent source trees will prove helpful unless you use dynamically shared objects (DSO) which is covered later in this section.

Make two subdirectories:

% mkdir /usr/src/httpd_docs
% mkdir /usr/src/httpd_perl

Next put a set of the Apache sources into the /usr/src/httpd_docs directory (replace the directory /tmp with the path to the downloaded file and x.x.x with the version of Apache that you have downloaded):

% cd /usr/src/httpd_docs
% gzip -dc /tmp/apache_x.x.x.tar.gz | tar xvf -

or if you have GNU tar:

% tar xvzf /tmp/apache_x.x.x.tar.gz

Just to check we have extracted in the right way:

% ls -l
drwxr-xr-x  8 stas  stas 2048 Apr 29 17:38 apache_x.x.x/

Now prepare the httpd_perl server sources:

% cd /usr/src/httpd_perl
% gzip -dc /tmp/apache_x.x.x.tar.gz | tar xvf -
% gzip -dc /tmp/modperl-x.xx.tar.gz | tar xvf -

% ls -l
drwxr-xr-x  8 stas  stas 2048 Apr 29 17:38 apache_x.x.x/
drwxr-xr-x  8 stas  stas 2048 Apr 29 17:38 modperl-x.xx/

Time to decide on the desired directory structure layout (where the Apache files go):

ROOT = /usr/local

The two servers can share the following directories (so we will not duplicate data):

/usr/local/bin/
/usr/local/lib
/usr/local/include/
/usr/local/man/
/usr/local/share/

Important: we assume that both servers are built from the same Apache source version.

The two servers will store their specific files in either the httpd_docs/ or the httpd_perl/ sub-directories:

/usr/local/etc/httpd_docs/
               httpd_perl/

/usr/local/sbin/httpd_docs/
                httpd_perl/

/usr/local/var/httpd_docs/logs/
                          proxy/
                          run/
               httpd_perl/logs/
                          proxy/
                          run/

After completion of the compilation and the installation of both servers, you will need to configure them.

To make things clear before we proceed to the details, you should for example configure the plain Apache server (/usr/local/etc/httpd_docs/httpd.conf) to listen to Port 80. Configure the mod_perl Apache server (/usr/local/etc/httpd_perl/httpd.conf) with a different Port (e.g. 8080) from the one which the plain Apache server listens to. The port numbers issue will be discussed later.

The next step is to configure and compile the sources: Below are the procedures to compile both servers, using the directory layout I have just suggested.

Configuration and Compilation of the Sources.

I will use x.x.x instead of real version numbers so this document will never become obsolete :).

Building the httpd_docs Server

Sources Configuration:
% cd /usr/src/httpd_docs/apache_x.x.x
% make clean
% env CC=gcc \
./configure --prefix=/usr/local \
  --sbindir=/usr/local/sbin/httpd_docs \
  --sysconfdir=/usr/local/etc/httpd_docs \
  --localstatedir=/usr/local/var/httpd_docs \
  --runtimedir=/usr/local/var/httpd_docs/run \
  --logfiledir=/usr/local/var/httpd_docs/logs \
  --proxycachedir=/usr/local/var/httpd_docs/proxy

Notice that you actually don't have to enlist all these options, it's enough to replace them all with --target=httpd_docs.

% ./configure --prefix=/usr/local \
  --target=httpd_docs

This will use the default directory layout, but will replace apache with httpd_docs everywhere. It'll even rename apachectl to be httpd_docsctl. But we will continue with the manual directory tuning in the scenario below.

If you need some other modules, such as mod_rewrite and mod_include (SSI), add them to the end of this list:

....
....
--proxycachedir=/usr/local/var/httpd_docs/proxy \
--enable-module=include --enable-module=rewrite

OS specific note: The httpd executable is at least 100K smaller if compiled by gcc than if compiled cc on AIX. Remove the line env CC=gcc if you want to use the default compiler. If you want to use it and you are a (ba)?sh user you will not need the env function, t?csh users will have to keep it.

It's very important to use the same compiler you build the perl with. See the section 'What Compiler Should Be Used to Build mod_perl' for more information.

Note: Add --layout to see the resulting directories' layout without actually running the configuration process.

Source Compilation:
% make
% make install

Rename httpd to http_docs:

% mv /usr/local/sbin/httpd_docs/httpd \
/usr/local/sbin/httpd_docs/httpd_docs

Now modify the apachectl utility to point to the renamed httpd via your favorite text editor or by using perl:

% perl -p -i -e 's|httpd_docs/httpd|httpd_docs/httpd_docs|' \
/usr/local/sbin/httpd_docs/apachectl

Building the httpd_perl Server

Before you start to configure the mod_perl sources, you should be aware that there are a few Perl modules that have to be installed before building mod_perl. You will be alerted if any required modules are missing when you run the perl Makefile.PL command below. If you discover that something is missing, grab it from your nearest CPAN repository (if you do not know what that is, pay a visit to http://www.perl.com/CPAN) or run the CPAN interactive shell via the command line perl -MCPAN -e shell.

Make sure the sources are clean:

% cd /usr/src/httpd_perl/apache_x.x.x
% make clean
% cd /usr/src/httpd_perl/mod_perl-x.xx
% make clean

It is important to make clean since some of the versions are not binary compatible (e.g apache 1.3.3 vs 1.3.4) so any "third-party" C modules need to be re-compiled against the latest header files.

% cd /usr/src/httpd_perl/mod_perl-x.xx

% /usr/local/bin/perl Makefile.PL \
APACHE_PREFIX=/usr/local \
APACHE_SRC=../apache_x.x.x/src \
DO_HTTPD=1 \
USE_APACI=1 \
PERL_STACKED_HANDLERS=1 \
ALL_HOOKS=1 \
APACI_ARGS=--sbindir=/usr/local/sbin/httpd_perl, \
       --sysconfdir=/usr/local/etc/httpd_perl, \
       --localstatedir=/usr/local/var/httpd_perl, \
       --runtimedir=/usr/local/var/httpd_perl/run, \
       --logfiledir=/usr/local/var/httpd_perl/logs, \
       --proxycachedir=/usr/local/var/httpd_perl/proxy

Notice that all APACI_ARGS (above) must be passed as one long line if you work with t?csh!!! However with (ba)?sh it works correctly the way it is shown above, breaking the long lines with '\'. When t?csh passes the APACI_ARGS arguments to ./configure it does not alter the newlines, but it strips the backslashes, thus breaking the configuration process.

Notice that just like in httpd_docs configuration you can use --target=httpd_perl instead of specifying each directory separately. Note that this option has to be the very last argument in APACI_ARGS, otherwise 'make test' tries to run "httpd_perl," which fails.

This will use the default directory layout, but will replace apache with httpd_docs everywhere. It'll even rename apachectl to be httpd_docsctl. But we will continue with the manual directory tuning in the scenario below.

As with httpd_docs you might need other modules such as mod_rewrite, so add them at the end of this list:

....
....
--proxycachedir=/usr/local/var/httpd_perl/proxy, \
--enable-module=rewrite

Note: PERL_STACKED_HANDLERS=1 is needed for Apache::DBI

Now, build, test and install the httpd_perl.

% make && make test && make install

Note: Apache puts a stripped version of httpd at /usr/local/sbin/httpd_perl/httpd. The original version which includes debugging symbols (if you need to run a debugger on this executable) is located at /usr/src/httpd_perl/apache_x.x.x/src/httpd.

Note: You may have noticed that we did not run make install in the Apache source directory. When USE_APACI is enabled, APACHE_PREFIX will specify the --prefix option for Apache's configure utility, which gives the installation path for Apache. When this option is used, mod_perl's make install will also make install for Apache, installing the httpd binary, the support tools, and the configuration, log and document trees.

If make test fails, look into t/logs and see what is in there. Also see make test fails.

While doing perl Makefile.PL ... mod_perl might complain by warning you about a missing library libgdbm. This is a crucial warning. See Missing or Misconfigured libgdbm.so for more info.

Now rename httpd to httpd_perl:

% mv /usr/local/sbin/httpd_perl/httpd \
/usr/local/sbin/httpd_perl/httpd_perl

Update the apachectl utility to drive the renamed httpd:

% perl -p -i -e 's|httpd_perl/httpd|httpd_perl/httpd_perl|' \
/usr/local/sbin/httpd_perl/apachectl

Configuration of the servers

Now when we have completed the building process, the last stage before running the servers is to configure them.

Basic httpd_docs Server Configuration

Configuring of the httpd_docs server is a very easy task. Starting from version 1.3.4 of Apache, there is only one file to edit. Open /usr/local/etc/httpd_docs/httpd.conf in your favorite text editor and configure it as you usually would, except make sure that you configure the log file directory (/usr/local/var/httpd_docs/logs and so on) and the other paths according to the layout you have decided to use.

Start the server with:

/usr/local/sbin/httpd_docs/apachectl start

Basic httpd_perl Server Configuration

Edit the /usr/local/etc/httpd_perl/httpd.conf. As with the httpd_docs server configuration, make sure that ErrorLog and other file location directives are set to point to the right places, according to the chosen directory layout.

The first thing to do is to set a Port directive - it should be different from that used by the plain Apache server (Port 80) since we cannot bind two servers to the same port number on the same machine. Here we will use 8080. Some developers use port 81, but you can bind to ports below 1024 only if the server has root permissions. If you are running on multiuser machine, there is a chance someone already uses that port, or will start using it in the future, which could cause problems. If you are the only user on your machine, basically you can pick any unused port number. Many organizations use firewalls which may block some of the ports, so port number choice can be a controversial topic. From my experience the most popular port numbers are: 80, 81, 8000 and 8080. Personally, I prefer the port 8080. Of course with the two server scenario you can hide the nonstandard port number from firewalls and users, by using either mod_proxy's ProxyPass directive or a proxy server like Squid.

For more details see Publishing Port Numbers other than 80, Running One Webserver and Squid in httpd Accelerator Mode, Running Two Webservers and Squid in httpd Accelerator Mode and Using mod_proxy.

Now we proceed to the mod_perl specific directives. It will be a good idea to add them all at the end of httpd.conf, since you are going to fiddle about with them a lot in the early stages.

First, you need to specify the location where all mod_perl scripts will be located.

Add the following configuration directive:

  # mod_perl scripts will be called from
Alias /perl/ /usr/local/myproject/perl/

From now on, all requests for URIs starting with /perl will be executed under mod_perl and will be mapped to the files in /usr/local/myproject/perl/.

Now we configure the /perl location.

PerlModule Apache::Registry

<Location /perl>
  #AllowOverride None
  SetHandler perl-script
  PerlHandler Apache::Registry
  Options ExecCGI
  allow from all
  PerlSendHeader On
</Location>

This configuration causes any script that is called with a path prefixed with /perl to be executed under the Apache::Registry module and as a CGI (hence the ExecCGI--if you omit this option the script will be printed to the user's browser as plain text or will possibly trigger a 'Save-As' window). The Apache::Registry module lets you run your (carefully written) Perl CGI scripts almost completely unchanged under mod_perl. The PerlModule directive is the equivalent of Perl's require(). We load the Apache::Registry module before we use it by giving the PerlHandler Apache::Registry directive.

PerlSendHeader On tells the server to send an HTTP header to the browser on every script invocation. You will want to turn this off for nph (non-parsed-headers) scripts.

This is only a very basic configuration. The Server Configuration section covers the rest of the details.

Now start the server with:

/usr/local/sbin/httpd_perl/apachectl start

Running Two webservers and Squid in httpd Accelerator Mode

While I have detailed the mod_perl server installation, you are on your own with installing the Squid server (See Getting Helped for more details). I run Linux, so I downloaded the RPM package, installed it, configured the /etc/squid/squid.conf, fired off the server and all was set.

Basically once you have Squid installed, you just need to modify the default squid.conf as I will explain below, then you are ready to run it.

First, let's take a look at what we have already running and what we want from squid.

We have the httpd_docs and httpd_perl servers listening on ports 80 and 8080. We want squid to listen on port 80, to forward requests for static objects (plain HTML pages, images and so on) to the port which the httpd_docs server listens to, and dynamic requests to httpd_perl's port. This is known as httpd accelerator mode in proxy dialect.

Our httpd_docs is listening to port 80, so we will have to reconfigure it to listen to port 81, since port 80 will be taken by Squid. Both copies of Apache will reside on the same machine as Squid. A proxy server makes all the magic behind it transparent to user. Both Apache servers return the data to Squid (unless it was already cached by Squid). The client never sees the other ports and never knows that there might be more than one server running. Do not confuse this scenario with mod_rewrite, where a server redirects the request somewhere according to the rewrite rules and forgets all about it.

Squid can be used as a straightforward proxy server. ISPs and other companies generally use it to cut down the incoming traffic by caching the most popular requests. However we want to run it in httpd accelerator mode. Two directives (httpd_accel_host and httpd_accel_port) enable this mode. We will see more details shortly.

If you are currently using Squid in the regular proxy mode, you can extend its functionality by running both modes concurrently. To accomplish this, you can extend the existing Squid configuration with httpd accelerator mode's related directives or you can just create one from scratch.

Now that you have Squid listening to port 80, you have to move the httpd_docs server to listen for example to port 81 (your mileage may vary :). So you have to modify httpd_docs/conf/httpd.conf and restart the httpd_docs server. But if you are working on a production server, do not do this before we get Squid running!

Let's go through the changes we should make to the default configuration file. Since this file (/etc/squid/squid.conf) is huge (about 60k+) and we will not alter 95% of its default settings, my suggestion is to write a new one including only the modified directives.

We want to enable the redirect feature, to be able to serve requests by more than one server (in our case we have two: the httpd_docs and httpd_perl servers). So we specify httpd_accel_host as virtual. This assumes that your server has multiple interfaces - Squid will bind to all of them.

httpd_accel_host virtual

Then we define the default port the requests will be sent to, unless redirected. We assume that most requests will be for static documents (also it's easier to define redirect rules for mod_perl server because of the URI that starts with perl or similar). We have our httpd_docs listening on port 81. Therefore we made this part particular choice.

httpd_accel_port 81

And as described before, squid listens to port 80.

http_port 80

We do not use icp (icp is used for cache sharing between neighboring machines, which is more relevant in the proxy mode).

icp_port 0

hierarchy_stoplist defines a list of words which, if found in a URL, causes the object to be handled directly by the cache. In other words, use this cache and do not query neighboring caches for certain objects. Note that I have configured the /cgi-bin and /perl aliases for my dynamic documents, if you named them in a different way, make sure you use the correct aliases here.

hierarchy_stoplist /cgi-bin /perl

Now we tell squid not to cache dynamic pages.

acl QUERY urlpath_regex /cgi-bin /perl
no_cache deny QUERY

Please note that the last two directives are controversial ones. If you want your scripts to be more compliant with the HTTP standards, according to the HTTP specs the headers of your scripts should carry the Caching Directives: Last-Modified and Expires. What are they for? (*) If you set the headers correctly, there is no need to tell the Squid accelerator NOT to try to cache anything. Squid will not bother your mod_perl servers a second time if a request is (a) cachable and (b) still in the cache. Many mod_perl applications will produce identical results on identical requests if not much time has elapsed between the requests. So your Squid might have a hit ratio of 50%, which means that the mod_perl servers will have only half as much work to do as they did before you installed Squid (or mod_proxy). But this is only possible if you set the headers correctly.

For more information, refere to the chapter Correct Headers - A quick guide for mod_perl users.

Even if you insert a user-ID and date in your page, caching can save resources when you set the expiration time to 1 second. A user might double click where a single click would do, thus sending two requests in parallel. Squid could serve the second request.

But if you are lazy, or just have too many things to deal with, you can leave the above directives the way I described. Just keep in mind that one day you will want to reread this snippet and the headers generation tutorial to squeeze even more power from your servers without investing money in more memory and better hardware.

While testing you might want to enable the debugging options and watch the log files in /var/log/squid/. But turn debugging off in your production server. Below I show it commented out. The parameter 28 means access control routes.

# debug_options ALL, 1, 28, 9

We need to provide a way for squid to dispatch requests to the correct servers. Static object requests should be redirected to httpd_docs unless they are already cached, while requests for dynamic documents should go to the httpd_perl server. The configuration below tells Squid to fire off 10 redirect daemons at the specified path of the redirect daemon and (as suggested by Squid's documentation) disables rewriting of any Host: headers in redirected requests. The redirection daemon script is listed below.

redirect_program /usr/lib/squid/redirect.pl
redirect_children 10
redirect_rewrites_host_header off

The maximum allowed request size is in kilobytes. This one is pretty obvious. If you are using POST to upload files, then set this to the largest file's size plus a few extra kbytes.

request_size 1000 KB

Then we have access permissions, which I will not explain. You might want to read the documentation, so as to avoid any security problems.

acl all src 0.0.0.0/0.0.0.0
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
acl myserver src 127.0.0.1/255.255.255.255
acl SSL_ports port 443 563
acl Safe_ports port 80 81 8080 81 443 563
acl CONNECT method CONNECT

http_access allow manager localhost
http_access allow manager myserver
http_access deny manager
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
# http_access allow all

Since Squid should be run as a non-root user, you need these if you are invoking the Squid as root.

cache_effective_user squid
cache_effective_group squid

Now configure a memory size to be used for caching. The Squid documentation warns that the actual size of Squid can grow to be three times larger than the value you set.

cache_mem 20 MB

Keep pools of allocated (but unused) memory available for future use. Read more about it in the Squid documents.

memory_pools on

Now tighten the runtime permissions of the cache manager CGI script (cachemgr.cgi, which comes bundled with squid) on your production server.

cachemgr_passwd disable shutdown
#cachemgr_passwd none all

Now the redirection daemon script (you should put it at the location you have specified in the redirect_program parameter in the config file above, and make it executable by the webserver of course):

#!/usr/local/bin/perl

$|=1;

while (<>) {
    # redirect to mod_perl server (httpd_perl)
  print($_), next 
    if s|www.example.com(:81)?/perl/|www.example.com:8080/perl/|o;

    # send it unchanged to plain apache server (http_docs)
  print;
}

Here is what the regular expression from above does; it matches all the URIs that include either www.example.com/perl/ or www.example.com:81/perl/ strings in them and replaces it with www.example.com:8080. When the match-n-replace is completed and it was successful, the resulting URI is printed. Otherwise the original URI is printed.

The above redirector can be more complex of course, but you know Perl, right?

A few notes regarding the redirector script:

You must disable buffering. $|=1; does the job. If you do not disable buffering, STDOUT will be flushed only when its buffer becomes full--and its default size is about 4096 characters. So if you have an average URL of 70 chars, only after about 59 (4096/70) requests will the buffer be flushed, and the requests will finally reach the server. Your users will not wait that long, unless you have hundreds requests per second and then the buffer will be flushed very frequently because it'll get full very fast.

If you think that this is a very ineffective way to redirect, I'll try to prove you the opposite. The redirector runs as a daemon, it fires up N redirect daemons, so there is no problem with Perl interpreter loading. Exactly as with mod_perl, perl is loaded all the time and the code has already been compiled, so the redirect is very fast (not much slower than if the redirector was written in C). Squid keeps an open pipe to each redirect daemon, thus there is not even the overhead of the system calls.

Now it is time to restart the server, at linux I do it with:

/etc/rc.d/init.d/squid restart

Now the setup is complete ...

Almost... When you try the new setup, you will be surprised and upset to discover port 81 showing up in the URLs of the static objects (like htmls). Hey, we did not want the user to see the port 81 and use it instead of 80, since then it will bypass the squid server and the hard work we went through was just a waste of time!

The solution is to make both squid and httpd_docs listen to the same port. This can be accomplished by binding each one to a specific interface (so they are listening to different sockets). Modify httpd.conf in the httpd_docs configuration directory:

Port 80
BindAddress 127.0.0.1
Listen 127.0.0.1:80

Modify squid.conf:

http_port 80
tcp_incoming_address 123.123.123.3
tcp_outgoing_address 127.0.0.1
httpd_accel_host 127.0.0.1
httpd_accel_port 80

Where 123.123.123.3 should be replaced with the IP address of your main server. Now restart squid and httpd_docs (it doesn't matter which one you start first), and voila--the port number has gone.

You must also have in the /etc/hosts an entry (chances are that it's already there):

127.0.0.1 localhost.localdomain localhost

Now if your scripts are generating HTML including fully qualified self references, using the 8080 or other port, you should fix them to generate links to point to port 80 (which means not using the port at all in the URI). If you do not do this, users will bypass Squid and will make direct requests to the mod_perl server's port.

The only question left is what to do with users who bookmarked your services and they still have the port 8080 inside the URL. Do not worry about it. The most important thing is for your scripts to return full URLs, so if the user comes from the link with 8080 port inside, let it be. Just make sure that all the subsequent calls to your server will be rewritten correctly. After a time users will change their bookmarks. You can send them an email if you know the address, or you could leave a note on your pages asking users to update their bookmarks. You will avoid this problem if you do not publish non-80 ports in the first place. See Publishing Port Numbers other than 80.

<META> Need to write up a section about server logging with squid. One thing I sure would like to know is how requests are logged with this setup. I have, as most everyone I imagine, log rotation, analysis, archiving scripts and they all assume a single log. <METAMETA> So what now with Apache TransferLog/ErrorLog/Refererlog? </METAMETA> Does one have different logs that have to be merged (up to 3 for each server + squid) ? Even when squid responds to a request out of its cache I'd still want the thing to be logged. </META>

See Using mod_proxy for information about X-Forwarded-For.

To save you some keystrokes, here is the whole modified squid.conf:

http_port 80
tcp_incoming_address 123.123.123.3
tcp_outgoing_address 127.0.0.1
httpd_accel_host 127.0.0.1
httpd_accel_port 80

icp_port 0

hierarchy_stoplist /cgi-bin /perl
acl QUERY urlpath_regex /cgi-bin /perl
no_cache deny QUERY

# debug_options ALL,1 28,9

redirect_program /usr/lib/squid/redirect.pl
redirect_children 10
redirect_rewrites_host_header off

request_size 1000 KB

acl all src 0.0.0.0/0.0.0.0
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
acl myserver src 127.0.0.1/255.255.255.255
acl SSL_ports port 443 563
acl Safe_ports port 80 81 8080 81 443 563
acl CONNECT method CONNECT

http_access allow manager localhost
http_access allow manager myserver
http_access deny manager
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
# http_access allow all

cache_effective_user squid
cache_effective_group squid

cache_mem 20 MB

memory_pools on

cachemgr_passwd disable shutdown

Note that all directives should start at the beginning of the line, so if you cut and paste from the text make sure you remove the leading whitespace from each line.

Running One Webserver and Squid in httpd Accelerator Mode

When I was first told about Squid, I thought: "Hey, now I can drop the httpd_docs server and have just Squid and the httpd_perl servers". Since all my static objects will be cached by squid, I do not need the light httpd_docs server.

But I was a wrong. Why? Because I still have the overhead of loading the objects into Squid the first time. If a site has many of them, unless a huge chunk of memory is devoted to Squid they won't all be cached and the heavy mod_perl server will still have the task of serving static objects.

How one would measure the overhead? The difference between the two servers is in memory consumption, everything else (e.g. I/O) should be equal. So you have to estimate the time needed for first time fetching of each static object at a peak period and thus the number of additional servers you need for serving the static objects. This will allow you to calculate the additional memory requirements. I imagine that this amount could be significant in some installations.

So I have decided to have even more administration overhead and to stick with the squid, httpd_docs and httpd_perl scenario, where I can optimize and fine tune everything. Of course this may not be your situation. If you are feeling that the scenario from the previous section is too complicated for you, make it simpler. Have only one server with mod_perl built in and let Squid to do most of the job that plain light apache used to do. As I have explained in the previous paragraph, you should pick this lighter setup only if you can make Squid cache most of your static objects. If it cannot, your mod_perl server will have to do work we do not want it to do.

If you are still with me, install apache with mod_perl and Squid. Then use a configuration similar to the previous section, but now httpd_docs is not there anymore. Also we do not need the redirector anymore and we specify httpd_accel_host as a name of the server and not virtual. Because we do not redirect there is no need to bind two servers on the same port so there are neither Bind nor Listen directives in httpd.conf.

The modified configuration (see the explanations in the previous section):

httpd_accel_host put.your.hostname.here
httpd_accel_port 8080
http_port 80
icp_port 0

hierarchy_stoplist /cgi-bin /perl
acl QUERY urlpath_regex /cgi-bin /perl
no_cache deny QUERY

# debug_options ALL, 1, 28, 9

# redirect_program /usr/lib/squid/redirect.pl
# redirect_children 10
# redirect_rewrites_host_header off

request_size 1000 KB

acl all src 0.0.0.0/0.0.0.0
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
acl myserver src 127.0.0.1/255.255.255.255
acl SSL_ports port 443 563
acl Safe_ports port 80 81 8080 81 443 563
acl CONNECT method CONNECT

http_access allow manager localhost
http_access allow manager myserver
http_access deny manager
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
# http_access allow all

cache_effective_user squid
cache_effective_group squid

cache_mem 20 MB

memory_pools on

cachemgr_passwd disable shutdown

One Light and One Heavy Server where All HTML is Perl-generated

META: a lot of info duplication in tricks section! remove/modify/merge it.

Instead of keeping all your Perl scripts in /perl and your static content everywhere else, you could keep your static content in special directories and keep your Perl scripts everywhere else. You can still use the light/heavy apache separation approach described above, with a few minor modifications.

Installation and Configuration

First you need to compile your light Apache with mod_proxy and mod_rewrite:

% ./configure --prefix=[snip...] --enable-module=rewrite \
                                 --enable-module=proxy

In the light Apache's httpd.conf file, turn rewriting on:

RewriteEngine on

and list the static directories something like this:

RewriteRule ^/img - [L]
RewriteRule ^/style - [L]

The [L] means that the rewrite engine should stop if it has a match. This is necessary because the very last rewrite rule proxies everything to the heavy server:

RewriteRule ^/(.*) http://www.example.com:8080/$1 [P]

This line (which must be the last RewriteRule) is the difference between a server for which static content is the default and one for which dynamic (perlish) content is the default.

The above RewriteRule assumes that the heavy server runs on the same machine as the light server. You can just insert a different URL if the heavy Apache is elsewhere, but keeping the two servers on the one machine and treating them as one has some advantages, as you will see later.

You should also add the reverse rewrite rule:

ProxyPassReverse / http://www.example.com/

so that the user doesn't see the port number :8080 in her browser's location window.

Of course www.example.com should be replaced with your own domain name.

It is possible to use localhost in the RewriteRule above if the heavy and light servers are on the same machine, but your heavy server might accidentally say localhost in a client redirect (see below) which would not be good. Also, if your heavy server understands virtual hosts, you probably don't want to use the name localhost.

Tricks, Traps and Gotchas

  • 'Closing your shutters' temporarily

    Very occasionally, your mod_perl server will suffer glitches. Perhaps you changed a module and restarted your mod_perl httpd when a perl -cw would have given you some very interesting information! Since all your html is dynamically generated, suddenly nobody can view any pages on your site. Disaster!! Worse, your users are getting cryptic Unable to contact upstream server error messages on a grey background, not the nice customised error messages you generate with Perl.

    If you insert a line into the light Apache's httpd.conf file:

    RewriteRule ^/(.*) /sorry.html [L]

    after the list of static directories but before the rule that proxies everything else to the heavy apache, your users now get a (relatively) nice `Sorry for the inconvenience' message instead of the cryptic message described above. What's more, because this sorry.html RewriteRule is listed after the image directory, you can refer to your images in it. Now all you have to do is figure out how to fix the module you broke.

    Of course you need to prepare the file sorry.html in advance of all this. When you alter the configuration you will have to restart the light server for the changes to take effect, and when you have fixed all the errors in the mod_perl server you must remove the change and restart the light server again too.

    This situation is easy to prevent. See Safe Code Updates on a Live Production Server for more info.

  • Logging

    There are a number of different ways to maintain logs of your hits. The easiest way is to let both Apaches log to their own access_log file. Unfortunately, this means that many requests will be logged twice, which makes it tricky to merge the two logfiles, should you want to. Also, as far as the heavy Apache is concerned, all requests will appear to come from the IP address of the machine on which the light apache is running. If you are logging IP addresses as part of your access_log the logs written by the heavy Apache will be fairly meaningless.

    One solution is to tell the heavy Apache not to bother logging requests that seem to come from the light Apache's machine. You might do this by installing a custom PerlLogHandler or just piping to access_log via grep -v (match all but this pattern) for the light Apache machine's IP address. In this scenario, the access_log written by the light Apache is the more important access_log, but you need to look for any direct accesses to the heavy server in case the proxy server is sometimes bypassed.

    Note that you don't want to pipe the access_log from the heavy Apache to /dev/null. If you do this, you won't be able to see any requests that bypass the lightweight Apache and come straight in on the port to which the heavy server is listening. Every time you see one of these requests you should ask yourself Why? and take steps to eliminate it.

    It's easy to get the logger to log the original client's IP address and not the one that comes from proxy server. Look for mod_proxy_add_forward at Building and Using mod_proxy for hints.

  • Eliminating :8080's

    By 8080 we mean the port your mod_perl enabled Apache is listening to. Substitute whatever port you have chosen.

    There are a number of ways in which the user can somehow be directed to URLs which have :8080 in them. If you are running the heavy Apache on a different machine from that of the light Apache, then provided that the heavy Apache has the same ServerName as the light Apache this will be less of a problem, but this section may still apply to you.

    If the user requests a URL that maps to a directory without a trailing slash (/), apache will issue a client redirect (301?) to the correct URL. Unfortunately the Apache that will issue this redirect will most likely be the heavy Apache, since most distinct requests are answered by it. It will issue the redirect to its own port on its own ServerName, and because the redirect is a so-called client redirect the URL (with the :8080 on the end) will be in the body, not the header, of the data returned to the user's browser. This means that the ProxyPassReverse in the light Apache's configuration file which is supposed to catch such things will be unable to catch this. :-(

    Since this will tend only to be a problem when the heavy and light Apaches are running on different ports on the same machine, if the light and heavy apaches have the same DocumentRoot we can have the light apache figure out that a request is for a directory without a trailing slash. Then it can do the redirect itself, before the heavy Apache finds out about it:

    RewriteCond /www/shop%{SCRIPT_FILENAME} -d
    RewriteRule ^(.+[^/])$ $1/ [R]   

    Note that these two lines should be after the RewriteRules for the static directories but before the final all-encompassing RewriteRule that proxies everything else to the heavy Apache.

    Beware: if you put these two lines in the light httpd.conf before the static directories are mentioned, then in this setup the light httpd may find itself in an infinite loop if somebody were to request for example </img>.

    Another way in which :8080's can creep into URLs is if you have Perl code which issues a redirect to http://$ENV{HTTP_HOST}/.... If you are migrating from one heavy server to one heavy and one light, you may find a few of these. If you replace HTTP_HOST with SERVER_NAME, all should be well. Note that you may need to do this whether or not the light and heavy servers are on the same machine.

    The :8080 effect can be insidious. Once a user gets a URL with :8080 in it, odd things will happen. If the heavy and light Apaches have the same DocumentRoot (normal if they are on the same machine) and/or the heavy Apache is able to deliver the same static content as the light Apache, the user's browser will display :8080 in the location box for every subsequent URL on your site until they follow an absolute link (e.g. http://www.example.com/file/stuff as opposed to just /file/stuff). At least if the heavy Apache can serve images, your site will still look normal. If the request is in a password-protected area, then the user may have to log in twice.

    If the heavy and light Apaches do not share the same DocumentRoot (normal if they are on different servers) and/or the heavy Apache cannot serve images, then all your pages will be imageless. This is a fairly compelling reason to run your light and heavy servers on the same machine and to have them share a DocumentRoot.

    Regardless of how hard you try to eliminate :8080s, they will crop up from time to time. You should occasionally examine the access_log of the heavy Apache. Assuming you aren't bothering to log requests that come via the light Apache, any requests that appear should be investigated.

    Interestingly, if the final catch-all RedirectRule is to localhost:8080, it is possible that localhost will leak into stray client redirects. Moral: use your server's name in redirects, unless you have a very good reason not to.

  • Security

    Because all http requests will appear to your Perl scripts to be coming from the light httpd, you must be careful not to authenticate based on the IP address from which a request came. This can be easy to overlook if you are moving from a single-server to a dual-server configuration.

    The URLs that return the /server-status and /perl-status of your Apache servers are often protected based on IP address. The /server-status URL for the heavy server is probably safe if the light Apache also defines an identical /server-status URL, but the /perl-status URL should be protected.

    If you must authenticate based on IP address, you should either make sure that the light Apache's IP address is not in any way privileged or you should block access to port 8080 from anywhere except the light Apache's IP address.

    If your heavy and light httpds can both serve static content (where :8080s only affect URLs - not content), then blocking port 8080 is not recommended. After all, if a user gets onto port 8080 in this scenario, the worst that will happen is that URLs will look odd.

    Note that if you are using the X-Forwarded-For HTTP header, then this subsection is of limited relevance to you.

mod_proxy

mod_proxy implements a proxy/cache for Apache. It implements proxying capability for FTP, CONNECT (for SSL), HTTP/0.9, and HTTP/1.0. The module can be configured to connect to other proxy modules for these and other protocols.

Concepts and Configuration Directives

In the following explanation, we will use www.example.com as the main server users access when they want to get some kind of service and backend.example.com as a machine that does the heavy work. The main and the back-end are different servers, they may coexist on the same machine and may not.

The mod_proxy module is built into the server that answers to requests to the www.example.com hostname. It doesn't matter what functionality is built into the backend.example.com server.

ProxyPass

You can use ProxyPass configuration directive for mapping remote hosts into the space of the local server; the local server does not act as a proxy in the conventional sense, but appears to be a mirror of the remote server.

Let's explore what this rule does:

ProxyPass   /modperl/ http://backend.example.com/modperl/

When user initiates a request to http://www.example.com/modperl/foo.pl, the request will be redirected to http://backend.example.com/modperl/foo.pl, and starting from this moment user will see http://backend.example.com/ in her location window, instead of http://www.example.com/.

You have probably noticed many examples from the real life Internet deployment. Free-email service providers and other similar heavy online services display the login or the main page from their main server, and then when you log-in you see something like x11.example.com, then w59.example.com, etc. These are the back-end servers that do the actual work.

Obviously this is not quite nice solution, but usually users don't really care about what they see in the location window. So you can get away with this approach. As I'll show in a minute there is a better solution which removes this caveat and provides even more useful functionalities.

ProxyPassReverse

This directive lets Apache adjust the URL in the Location header on HTTP redirect responses. For instance this is essential when Apache is used as a reverse proxy to avoid by-passing the reverse proxy because of HTTP redirects on the back-end servers which stay behind the reverse proxy. Generally used in conjunction with ProxyPass directive to build a complete front-end proxy server.

ProxyPass          /modperl/  http://backend.example.com/modperl/
ProxyPassReverse   /modperl/  http://backend.example.com/modperl/

When user initiates a request to http://www.example.com/modperl/foo.pl, the request will be redirected to http://backend.example.com/modperl/foo.pl but on the way back ProxyPassReverse will correct the location URL to become http://www.example.com/modperl/foo.pl . This happens absolutely transparently to user. User will never know that something has happened to his request behind the scenes.

Note that this ProxyPassReverse directive can also be used in conjunction with the proxy pass-through feature:

RewriteRule ... [P]

from mod_rewrite because its doesn't depend on a corresponding ProxyPass directive.

Security Issues

Whenever you use mod_proxy you want to make sure that your server will not become a proxy for some free riders. To block this you should have this setting:

RewriteRule ^proxy:.*  -  [F]

which makes sure that request of type proxy:http://www.example.com wouldn't keep your processes busy and return the status Forbidden.

Start by testing your own server, by telnetting to the port the server is listening on and issuing an external proxy request:

% telnet www.example.com 80
Trying 128.9.176.32...
Connected to www.example.com
Escape character is '^]'.
HEAD http://www.example.org/ HTTP/1.1
Host: www.example.org

HTTP/1.0 403 Forbidden Date: Mon, 10 Apr 2000 08:42:31 GMT Server: Apache/1.3.13-dev (Unix) Connection: close Content-Type: text/html; charset=iso-8859-1

Connection closed by foreign host.

As you see that we are dissalowed to make a proxy request to a different server, just as we wanted it to be. It means that this particular hole has been secured on our box.

Buffering Feature

In addition to the correcting the URI on its way back from the back-end server feature, mod_proxy provides a buffering services, mod_perl and similar heavy modules benefit from. The buffering feature allows mod_perl to pass the generated data to mod_proxy and move on serving new requests, instead of waiting for a possibly slow client to receive all the data.

This figure depicts this feature:

              [socket]                   wire     `o'
[mod_perl] => [      ] => [mod_proxy] => ____ =>  /|\
              [buffer]                            / \

From looking at this figure it's easy to see that the bottleneck is the socket buffer; it has to be able to absorb all the data that mod_perl has generated in order to untie the mod_perl process immediately.

ProxyReceiveBufferSize is the name of the parameter that specifies the size of the socket buffer. Configuring:

ProxyReceiveBufferSize 16384

will create a buffer of 16k in size. If mod_perl generates output which size is under 16k, the process will be immediately untied and allowed to serve new requests, if the output is bigger than 16k, the following process is taking place:

  1. The first 16k will enter the system buffer.

  2. mod_proxy picks the first 8k and sends down the wire.

  3. mod_perl writes the next 8k into the place of the 8k of data that was just picked by mod_proxy.

Stages 2 and 3 get repeated until mod_perl has run out of data it has to send. When this happens, it goes on its own business and the stage 2 is repeated until all the data was picked from the system buffer and sent down the wire.

Of course you want to set the buffer size as bigger as possible, since you want the heavy mod_perl processes to be utilized in the most effective way, so you don't want them waste their time waiting for a client to receive the data, especially if a client has a slow downstream connection.

As the ProxyReceiveBufferSize name states, its buffering feature applies only to downstream data (coming from the origin server to the proxy) and not upstream data. There is no buffering of data uploaded from the client browser to the proxy, thus you cannot use this technique to prevent the heavy mod_perl server from being tied up during a large POST such as a file upload. Falling back to mod_cgi seems to be the best solution for these specific scripts whose major function is getting a big upstream.

<META: check this ---> Of course just like mod_perl, mod_proxy writes the data it proxy-passes into its outgoing socket buffer, therefore the mod_proxy process gets released as soon as the last chuck of data was deposited into this buffer, even if the client didn't completed the downloading. The OS worries to complete the transfer and release the TCP socket used for this transfer.

Therefore if you don't use mod_proxy and mod_perl send its data directly to the client, and you have a big socket buffer, the mod_perl process will be released as soon as the last chunk of data will enter the buffer. Just like with mod_proxy, OS will worry to complete the data transfer.

<based on this comment> yes, too (but receive and transmit buffer may be of different size, depending on the OS)

The problem I don't know is, does the call to close the socket wait, until all data is actually send successfully or not. If it doesn't wait, you may not be noticed of any failure, but because the proxing Apache can write as fast to the socket transmission buffer as he can read, it should be possible that the proxing Apache copies all the data from the receive to the transmission buffer and after that releasing the receive buffer, so the mod_perl Apache is free to do other things, while the proxing Apache still wait until the client returns the success of data transmission. (The last, is the part I am not sure on)

</META>

Unfortunately you cannot set the socket buffer size as large as you want because there is a limit of the available physical memory and OS has its own upper limits on the possible buffer size.

It doesn't mean that you cannot change the OS imposed limits, but to do that you have to know the techniques for doing that. In the next section we will present a few OSes and the ways to achieve the upper limit rise.

To solve the physical memory limits you just have to add more memory.

Setting the Buffering Limits on Various OSes

As we just saw there are a few kinds of parameters we might want to adjust for our needs.

IOBUFSIZE Source Code Definition

The first one is the parameter that use by proxy_util.c:ap_proxy_send_fb() to loop over content being proxy passed in 8K chunks (as of this writing), passing that on to the client. In other words it specifies the size of the data that is sent downstream the wire.

This parameter is defined by the IOBUFSIZE:

#define IOBUFSIZE 8192

You have no control over this setting in the server configuration file, therefore you might want to change it in the source files, before you compile the server.

ProxyReceiveBufferSize Configuration Directive

You can control the socket buffer size with ProxyReceiveBufferSize directive:

ProxyReceiveBufferSize 16384

The above setting will set a buffer size to be of 16Kb. If it is not set explicitly or if it is set to 0 then the default buffer size is used. The number should be an integral multiple of 512.

Note that if you set the value of ProxyReceiveBufferSize bigger than the OS limit, the default value will be used.

Both the default and the maximum possible value of ProxyReceiveBufferSize depend on the Operating System.

  • Linux

    For 2.2 kernels the max limit is in /proc/sys/net/core/rmem_max and the default value is in /proc/sys/net/core/rmem_default. If you want to increase RCVBUF size above 65535, the default max value, you have to raise first the absolute limit in /proc/sys/net/core/rmem_max. To do that at the run time, execute this command to raise it to 128k:

    % echo 131072 > /proc/sys/net/core/rmem_max

    You probably want to put this command into /etc/rc.d/rc.local so the change would take effect at the system reboot.

    On Linux OS with kernel 2.2.5 the maximum and default values are either 32k or 64k. You can also change the default and maximum values during the kernel compilation, for that you should alter the SK_RMEM_DEFAULT and SK_RMEM_MAX definitions respectively.

  • FreeBSD

    Under FreeBSD it's possible to configure the kernel to have bigger socket buffers:

    % sysctl -w kern.ipc.maxsockbuf=2621440
  • Solaris

    Under Solaris this upper limit is specified by tcp_max_buf parameter and equals to 256k as reported.

  • Non Listed OSes

    If you use OS that is not listed here and know the required for this section details, please submit them to me.

When you tell the kernel to use bigger sockets you can set bigger values for ProxyReceiveBufferSize. e.g. 1Mb (1048576).

Hacking the Code

Some folks have patched the Apache source code to make the application buffer configurable as well. After the patch there were two configuration directives available:

  • ProxyReceiveBufferSize -- sets the socket buffer size

  • ProxyInternalBufferSize sets the application buffer size

To patch the source rename ap_breate() to ap_bcreate_size() and add a size parameter, which defaults to IOBUFSIZE if 0 is passed. Then add

#define ap_bcreate(p,flags) ap_bcreate(p,flags,0)

and add a new ap_bcreate() which calls ap_bcreate_size() for binary compatibility.

Actually the ProxyReceiveBufferSize should be called ProxySocketBufferSize. This would also remove some of the confusion about what it actually does.

Caching

META: complete the conf details

Apache does caching as well. It's relevant to mod_perl only if you produce proper headers, so your scripts' output can be cached. See the Apache documentation for more details on configuration of this capability.

Building process

To build mod_proxy into Apache just add --enable-module=proxy during the Apache ./configure stage.

Front-end Back-end Proxying with Virtual Hosts

This section explains a configuration setup for proxying your back-end mod_perl servers when you need to use Virtual Hosts.

The approach is to use unique port number for each virtual host at the back-end server, so you can redirect from the front-end server to localhost::1234, and name-based virtual servers on the front end, though any technique on the front-end will do.

If you run the front-end and the back-end servers on the same machine you can prevent any direct outside connections to the back-end server if you bind tightly to address 127.0.0.1 (localhost) as you will see in the following configuration example.

The front-end (light) server configuration:

<VirtualHost 10.10.10.10>
  ServerName www.example.com
  ServerAlias example.com
  RewriteEngine On
  RewriteOptions 'inherit'
  RewriteRule \.(gif|jpg|png|txt)$ - [last]
  RewriteRule ^/(.*)$ http://localhost:4077/$1 [proxy]
</VirtualHost>

<VirtualHost 10.10.10.10>
  ServerName foo.example.com
  RewriteEngine On
  RewriteOptions 'inherit'
  RewriteRule \.(gif|jpg|png|txt)$ - [last]
  RewriteRule ^/(.*)$ http://localhost:4078/$1 [proxy]
</VirtualHost>

The above front-end configuration handles two virtual hosts: www.example.com and foo.example.com. The two setups are almost identical.

The front-end server will handle files with the extensions .gif, .jpg, .png and .txt internally, the rest will be proxified to be handled by the back-end server.

The only difference between the two virtual hosts settings is that the former rewrites requests to the port 4077 at the back-end machine and the latter to the port 4078.

If your server is configured to run traditional CGI scripts (mod_cgi) as well as mod_perl CGI programs, then it would be beneficial to configure the front-end server to run the traditional CGI scripts directly. This can be done by altering the gif|jpg|png|txt Rewrite rule to add |cgi at the end, or adding a new rule to handle all /cgi-bin/* locations locally. Similarly, static HTML pages can be served by the front-end server by adding |html to the rule.

The back-end (heavy) server configuration:

Port 80

PerlPostReadRequestHandler My::ProxyRemoteAddr

Listen 4077
<VirtualHost localhost:4077>
  ServerName www.example.com
  DocumentRoot /home/httpd/docs/www.example.com	
  DirectoryIndex index.shtml index.html
</VirtualHost>

Listen 4078
<VirtualHost localhost:4078>
  ServerName foo.example.com
  DocumentRoot /home/httpd/docs/foo.example.com
  DirectoryIndex index.shtml index.html
</VirtualHost>

The back-end server knows to tell which virtual host the request is made to, by checking the port number the request was proxified to and using the appropriate virtual host section to handle it.

We set "Port 80" so that any redirects don't get sent directly to the back-end port.

To get the real remote IP addresses from proxy, the My::ProxyRemoteAddr handler is used based on the mod_proxy_add_forward Apache module. Prior to mod_perl 1.22+ this setting must have been set per-virtual host, since it wasn't inherited by the virtual hosts.

The following configuration is yet another useful example showing the other way around. It specifies what to be proxified and than the rest is served by the front end:

RewriteEngine     on
RewriteLogLevel   0
RewriteRule       ^/(perl.*)$  http://127.0.0.1:8052/$1   [P,L]
RewriteRule       ^proxy:.*       -                         [F]
ProxyRequests     on
NoCache           *
ProxyPassReverse  /  http://www.example.com/

So we don't have to specify the rule for the static object to be served by the front-end as we did in the previous example to handle files with the extensions .gif, .jpg, .png and .txt internally.

Getting the Remote Server IP in the Back-end server in the Proxy Setup

Ask Bjoern Hansen has written the mod_proxy_add_forward module for Apache. It sets the X-Forwarded-For field when doing a ProxyPass, similar to what Squid can do. Its location is specified in the download section.

Basically, this module adds an extra HTTP header to proxying requests. You can access that header in the mod_perl-enabled server, and set the IP address of the remote server. You won't need to compile anything into the back-end server.

Build

Download the module and use its location as a value of the --activate-module argument for ./configure utility within the Apache source code, so the module could be found.

./configure \
"--with-layout=Apache" \
"--activate-module=src/modules/extra/mod_proxy_add_forward.c" \
"--enable-module=proxy_add_forward" \
... other options ...

--enable-module=proxy_add_forward enables this module as you have guessed already.

Use

If you are using Apache::{Registry,PerlRun} just put something like the following into startup.pl:

sub My::ProxyRemoteAddr ($) {
  my $r = shift;
 
  # we'll only look at the X-Forwarded-For header if the requests
  # comes from our proxy at localhost
  return OK unless ($r->connection->remote_ip eq "127.0.0.1");

  # Select last value in the chain -- original client's ip
  if (my ($ip) = $r->headers_in->{'X-Forwarded-For'} =~ /([^,\s]+)$/) {
    $r->connection->remote_ip($ip);
  }
      
  return OK;
}

And in httpd.conf:

PerlPostReadRequestHandler My::ProxyRemoteAddr

Otherwise you retrieve it directly in your code.

Security

Different sites have different needs. If you use the header to set the IP address, Apache believes it. This is reflected in the logging for example. You really don't want anyone but your own system to set the header, that's why the above "recommended code" checks where the request really came from before changing remote_ip.

Generally you shouldn't trust the X-Forwarded-For header. You only want to rely on X-Forwarded-For headers from proxies you control yourself. If you know how to spoof a cookie you've probably got the general idea on making HTTP headers and can spoof the X-Forwarded-For header as well. The only address you can count on as being a reliable value is the one from r->connection->remote_ip.

From that point on, the remote IP address is correct. You should be able to access REMOTE_ADDR as usual.

Caveats

It was reported that Ben Laurie's Apache-SSL does not seem to put the IP addresses in the X-Forwarded-For header--it does not set up such a header at all. However, the REMOTE_ADDR it sets up and contains the IP address of the original client machine.

You could do the same thing with other environment variables, although I think several of them are preserved. You should run some tests or, maybe better, inspect the code to see which.

Prior to mod_perl 1.22+ there was a need to repeat PerlPostReadRequestHandler My::ProxyRemoteAddr directive per each virtual host, since it wasn't inherited by the virtual hosts.

mod_proxy_add_forward Module's Order Precedence Importance

Some users report that they cannot get this module to work as advertised; they verify that the module is built in, but the front-end server is not generating the X-Forwarded-For header when requests are being proxied to the back-end server, and as a result, the back-end server has no idea what the remote IP is.

As it turns out, mod_proxy_add_forward needs to be configured in Apache before mod_proxy in order to operate properly, since Apache gives highest precedence to the last defined module.

Moving the two build options required to enable mod_proxy_add_forward while compiling Apache build appears to have no effect on the default configuration order of modules, since in each case, the builds show mod_proxy_add_forward last in the list (or first via /server-info).

The solution is to explicitly define the configuration order in the http.conf file, so that mod_proxy_add_forward appears before mod_proxy, and therefore gets executed after mod_proxy. (Modules are being executed in reverse order, i.e. module that was Added first will be executed last.)

Obviously, this list would need to be tailored to match the build environment, but to easy this task just insert AddModule directive before each entry reported by httpd -l (and removing httpd_core.c, of course):

ClearModuleList
AddModule mod_env.c
[more modules snipped]
AddModule mod_proxy_add_forward.c
AddModule mod_proxy.c
AddModule mod_rewrite.c
AddModule mod_setenvif.c

Note that the above snippet is added to httpd.conf.

With this change, the X-Forwarded-For header is now being sent to the back-end server, and the remote IP appears in the back-end server's access log.

HTTP Authentication With Two Servers Plus a Proxy

Assuming that you have a setup of one "front-end" server, which proxies the "back-end" (mod_perl) server, if you need to perform authentication in the "back-end" server it should handle all authentication itself. If Apache proxies correctly, it will pass through all authentication information, making the "front-end" Apache somewhat "dumb", as it does nothing but pass through the information.

In the configuration file your Auth stuff needs to be inside <Directory ...> ... </Directory> sections because if you use a <Location ...> ... </Location> section the proxy server will take the authentication information for itself and not pass it on.

The same applies to mod_ssl, if plugged into a front-end server. It will properly encode/decode all the SSL requests.

mod_rewrite Examples

In the mod_proxy + mod_perl servers scenario, ProxyPass was used to redirect all requests to the mod_perl server, by matching the beginning of the relative URI (e.g. /perl). What should you do if you want everything, but files with extensions like .gif or .cgi, to be proxypassed to the mod_perl server. These files are to to be served by the light Apache server which carries the mod_procy module.

RewriteEngine On
# handle GIF and JPG images, traditional CGI's directly
RewriteRule \.(gif|jpg|png|css|txt|cgi)$ - [last]
RewriteRule ^/cgi-bin - [last]
# pass off everything but images to the heavy-weight server via proxy
RewriteRule ^/(.*)$ http://localhost:4077/$1 [proxy]

The following example rewrites everything to mod_perl server. It handles locally all the requests to the files with extensions gif, jpg, png, css, txt, cgi and relative URIs starting with /cgi-bin (e.g. if you want some scripts to be executed under mod_cgi)

That is, first, handle locally what you want to handle locally, then hand off everything else to the back-end guy.

These is the configuration of the logging facilities.

RewriteLogLevel 1
RewriteLog "| /usr/local/apache_proxy/bin/rotatelogs \
/usr/local/apache-common/logs/r_log 86400"

It says: log all the rewrites thru the pipe to rotatelogs utility which will rotate the logs every 2 hours (86400 secs).

More examples:

Redirect all those ie5 requests for favicon.ico to a central image:

RewriteRule .*favicon.ico /wherever/favicon.ico [PT,NS]

A quick way to make dynamic pages look static:

RewriteRule ^/wherever/([a-zA-Z]+).html /perl-bin/$1.cgi [PT]

Caching in mod_proxy

This is not really a mod_perl related, so I'll just stress one point. If you want the caching to work the following HTTP headers should be supplied: Last-Modified, Content-Length and Expires.