Code Unloading
We urge you to preload as much code as possible, as it helps to increase the amount of memory which is shared and so reduces the memory footprint. But sometimes we want to unload code that was previously loaded. For example, you could load many modules to do some configuration or initialization work at server startup, but none of the children will need these modules later. As the code is no longer needed, you can unload it.
For example if you use XML::Parser in a
section only, you could remove it with:<Perl
delete $INC{'XML/Parser.pm'};
Apache::PerlRun->flush_namespace('XML::Parser');
##########################################################
One day this should become a nice intro into the Perl sharing/Multithreading stuff. I've added the section below but then removed it, since Shane's reply includes flaws. You don't want to have the number of active processes as the number of processors, because CPU is not the only resource the process consumes. If we let a single process possess a CPU and it goes to sleep on a lengthy IO from Disk or db query, the CPU cycles will be wasted.
Multithreading or not Multithreading
I want to quote this question that showed up and the mod_perl list and the answer in their entireness. By the way, this question has been asked before perl5.6 and apache2.0 have been released (both with multithread support).
John Henckel has asked:
Thanks for your help. I am slowly coming to terms with the fact that Perl is not multithreaded and so it will never be able to scale the way I need it to. It is only a couple thousand lines of code. I will have to rewrite it as a servlet in Java or else a C program using the FastCGI protocol.
Many people thing FCGI can't handle multiple concurrent requests, however, that is only a perl limitation. The C-FCGI interface, with multithreading, can process hundreds of CGI request simulataneously in ONE process with ONE socket to the Apache server.
Shane has replied:
Arg...! Okay, lets backup. Multithreading is not a good idea. What your talking about is a process which takes 4 seconds to complete..., what are we talking about, remote system communication? You have a process which takes 4 seconds to finish, it doesn't matter what you use as your design... it still takes 4 seconds. If you have 100 requests for a 4 second process... its going to take 400 seconds. (Actually, due to context switching, probably more like 500)
Multi-threading ADDs, not subtracts from the load. But you see it all depends on what sort of architecture your going to use too. If you going to use a single processor machine, multithreading is going to slow the whole process down. If your using a computer that has 100s of processors, then clearly, multithreading is the approach to take.
The perl "limitation" of which is you speak is not a limitation per se. Its a limitation if your only using one instance of the perl interpretor. But you are free to use more than one instance of that interpretor. (perldoc perlembed) However, it makes zero sense to have more interpretors than processors... due the context switching issue.
The best way to solve the problem is to have one "processing" thread per processor. (I.e. the thread that does work on the request that is supposed to take 4 seconds) That thread can be directly written in c, or you can write some code for a perl interpretor to process. No big deal.., just keep in mind, if you start opening up more processing threads than processors you have and start cramming requests down throat faster than once per four seconds, the things going to tumble like dominos. The best thing you could possibly do is this: Setup an engine of your own to handle this 4 second long process. Initiate as many threads as you have processors. Start a queue of processes that can be however long. Hand out 4 second long processes to the queue.
This design will keep things from crashing. Even better would be to have a series of computers that grab requests from others. I.e. setup a central thread written in c, then have it act as a queue. Processing threads grab new requests from that queue, and deliver them. This is of course based on your effort of running a process that takes 4 seconds.
If this is about remote communication, which I have no idea why it wouldn't be unless your doing some strange number crunching, then what you want to do is read up on select(), and poll(). Or if you want to have it work really well, read up on rt signal queues, and run on the 2.3.x linux kernel, or some other unix variant.
I'll reiterate... having more threads than processors in ANY language is a bad idea, if it can be avoided. Which it can if you use the newer programming stuff. Moving to Java will not solve your problems, it will create 100x worse ones in terms of performance. Moving to c will not solve your problems. Moving to Perl will not solve your problem. Investigating your problem clearly will help eventually solve your problem. (BTW: No CGI request should take 4 seconds to process, unless you are querying a database on the other side of the planet. Unless its some sort of mathematics lab searching for prime numbers or something. I might sound like I'm joking..., but I most certainly am not.)
########################################### An attempt to explain why memory goes unshared:
From: Stas Bekman <sbekman@stason.org>
The Perl is a language that uses weak data types, i.e. you don't specify variable size (type) like you do in the strong typed languages like C. Therefore Perl uses heap memory by allocating memory on demand, rather (unmodifiable) text and (modifiable) data memory pages, used by C. The latter get allocated when the program is loaded into memory before it starts to run, the former is allocated at the run-time.
On heap there is no separation for data and text pages, when you call malloc it just allocates you a chunk you have asked for, this leads to the case where the code (static in most cases) and the variables (dynamic) land on the same memory pages.
From: shane@isupportlive.com
Well... the structs for the "Code Values" are mixed with both code, and variables in the case of Lexicals. However, during runtime these are not altered. Their are structs within the main code value struct that will be altered during run time, namely the recursive lexical array. (That's what I call it, but I'm sure Malcolm uses a more corect word :->) However, the actual Code (a series of highly optimized opcodes... instruction set workalike type stuff that take a few clockcycles each to do depending on the op and the architecture) is not memory inline with the structs for the recursive lexical array.
What I'm saying is that if you include all your code at the very begining the design of perl will not alter that code, so it should be allowed to be fixed and shared. Basically it just holds an opcode pointer as to which opcode its working at within the CV. The recursive lexical array itself just has a pointer within the "code value" struct to itself. So basically that main Code struct should never need to be realloc'd so it's fairly unlikely that it would need to be non-shared. However maybe someone that understands how something is "unshared" within the kernel could be quite helpful. If you were to change where something was pointing within a struct would that cause it to be unshared? I think that it's fairly unlikely, but I suppose it's possible. If that's the case then it's quite likely that code pieces could become unshared I suppose. However the main hunk of actual function opcodes would remain fixed, only the execution pointer (where it's pointing at within the present program) would change. So, in final (!) the code should always be shared. However if you change the file and it checks the date on it and reloads it, obviously it won't be shared :-).
> The Perl is a language that uses weak data types, i.e. you don't > specify variable size (type) like you do in the strong typed > languages like C. Therefore Perl uses heap memory by allocating > memory on demand, rather (unmodifiable) text and (modifiable) data > memory pages, used by C. The latter get allocated when the program > is loaded into memory before it starts to run, the former is > allocated at the run-time.
Yes that's true. There is some compile time stuff where it organizes the variable names within the lexical array, but I'm not sure whether or not it actually reserves space for those things at that time. I'm really not sure about that item.
> On heap there is no separation for data and text pages, when you > call malloc it just allocates you a chunk you have asked for, this > leads to the case where the code (static in most cases) and the > variables (dynamic) land on the same memory pages.
This is a weak area in my knowledge. I'm not certain how the kernel actually marks segments as shared and not... so I'll refrain from commenting.
> I'm not sure this a very good explanation. Perl gurus are very > welcome to correct/improve my attempt to explain this.
I've tryed to explain what I can. The best book for this is "Advanced Perl Programming"... published by Oreilly. (of course) There is a chapter in there written by Malcolm Beatie (well pieces of the chapter) that are pretty good..., but I'm afraid they might not go into enough depth on these exact issues. Not only that, but these are also very kernel related too..., you have to understand how both pieces fit together, and frankly I couldn't answer that, and I don't know a person alive that could :-). (I'm sure there are some, but who?)
###########################################
Ged has written this:
"=head2 URI Translation
For many reasons, a server can never allow access to its entire directory hierarchy. Although there is really no indication of this given to the Web browser, the path given in a requested URI is therefore a virtual path, and early in the processing of a request the virtual path given in the request must be translated to a path rooted at the filesystem root so that Apache can determine what resource is really being requested. This path can be considered to be a physical path, although it may not physically exist.
Especially in mod_perl systems, you may intend that the translated path does not physically exist, because your module responds when it sees a request for this non-existent path by sending a virtual document. It writes this document "on the fly", especially for this request, and the document then vanishes. Many of the documents you see on the Web, for example most documents which change their appearance depending on what the browser asks for, do not physically exist. This is one of the most important features of the Web, and it is one of the great powers of mod_perl that it allows you complete flexibility to create virtual documents."
He also said: > Anyway, I think it needs to be said somewhere, and > preferably so that the reader will see it before he sees things like > the Alias directives. It kinda gets assumed by all the other docs and > newbies flounder around without grasping the concept. I speak from > personal experience!
###########################################
> > META: Why cover IO::File when Symbol is quicker & lighter? > > > > It provides other functionality as well. Do you think that it might be > > better just to mention it and not delve into the details? > > Either just mention it or alternatively, put Symbol first (with note > regarding 5.6.0) and then follow with IO::File with a leading remark > like "In some situations you may want to take a fully object oriented > approach to file handling." And then have the IO::File stuff.
########################################### You must rewrite the
performance.pod =head2 Preload Perl modules - Real Numbers
It's a crap!!!
First define the goal of testing and probably using GTop to make the process easier.
###########################################
debug.pod:
Collect all these notes about stack traces generation (here in this file)
the stacktrace looks right. it would be more useful to see the line number, which you can see if you follow this tip for building mod_perl from the SUPPORT doc:
- CORE DUMPS
-
If you get a core dump, please send a backtrace if possible. Before you try, build mod_perl with perl Makefile.PL PERL_DEBUG=1 which will: -add `-g' to EXTRA_CFLAGS -turn on PERL_TRACE -set PERL_DESTRUCT_LEVEL=2 (additional checks during Perl cleanup) -link against libperld if it exists
-----------------
Apache::DB/ httpd -X -D DEBUG
> if you set OPTMIZE => '-g', in the Makefile.PL and start httpd under gdb, > it's easy to debug.
###########################################
eagle book appendix B lists all the Makefile.PL options (we better include them all)
###########################################
A reader suggested:
> It might be helpful to replicate some information about the the Apache Life > Cycle ( Eagle book pg 56 ), before talking about server startup file.
config.pod
###########################################
debug: =head3 Safe Resource Locking
You must review the section and correct this issue:
If the file is reopened in the next script invocation, the previous fh will be closed and unlocked, but only from within the same process.
Regarding leakages, if you use open IN, ... probably there is no leakage because it's the same handler. In case of gensym or IO::File use you should check this issue.
All the above is tentative and should be validated.
it seems to be duplicated as well in a few places.
I think that the Symbol is being duplicated in two places!
###########################################
From: "Joseph R. Junkin" <jjunkin@datacrawler.com> Subject: Re: [summary+rfc] When One Machine is not Enough...
I doubt it will help, but you are free to look at a talk I gave:
Analysis of the Open Source Application Platform http://www.datacrawler.com/talk/tech/platform/
It addresses the scalability issues you have already covered.
###########################################
die() issue:
merge the porting.html#die_and_mod_perl snippets.html#Redirecting_Errors_to_the_Client
add notes from Matt's email: Warning: $SIG{__DIE__} considered dangerous regarding of eval {} if $@ try/catch use
###########################################
Makefile.PL: build.pl gets called twice when running from CPAN shell. Because it does 'make' and 'make install' and both call 'manifypods'. Need to add a source change control, so if manifypods was called once for make, it won't be called for 'make install'
########################################### (merge of status.pod and debug.pod)
I think of merging the Apache::Status and the Debug sections since both are very relative and Apache::Status allows you to debug the code in some extend.
#####################################################################
Stick this para at the beginning of the performance chapter.
One of the most important issues in improving performance is the reduction of memory usage. The less memory each server uses, the more server processes you can start, and thus the more performance you have (from the user's point of view, the speed of response).
See Global vs Fully Qualified Variables
#####################################################################
Important for both book and guide: The strategy chapter talks about performance improve among other things. The performance chapter doesn't mention it (refer to it). But this is a very important part of it.
#####################################################################
Include very important performance improve notes from:
http://www.apache.org/docs/misc/perf-tuning.html http://www.apache.org/docs/misc/perf.html
#####################################################################
add benchmarks with keep-alive and without them!
#####################################################################
Notice this package in debug section.
Devel::Symdump - dump symbol names or the symbol table
Apache::Symdump - shows
Apache::Status uses it to show the process' internals
#####################################################################
Describe the BackLog (performance...)
On that note you might want to set the BackLog parameter (I forget the precise name), it depends on whether you want users to wait indefinitely or just get an error.
#####################################################################
> > What is the best way to have a Location directive apply to an entire > > site except for a single directory? > > Set the site-wide handler in a <Location "/"> and override the handler > for the "register" dir by setting the default handler in <Location > "/register">. Unfortuntaly, I don't know the name of the default > handler.
SetHandler default-handler
#####################################################################
META: add a section about setting and passing environment variables: It includes and merges (PerlSetVar, SetVar and Pass*), %ENV, (creating your own directives?), subprocess
Notes:
* I'd suggest using $r->subprocess_env() instead. I guess %ENV will work in many situations, but it might bite you later when you can't figure out why a particular env variable isn't getting set in certain situations (speaking from experience).
* I was going to suggest that too. %ENV controls the environment of the currently running Perl process, but child processes come from the "subprocess env", which only the call above sets.
#######################################################################
Add a MOD_PERL_TRACE=all example...
An email:
> > > Any suggestions? How might I debug this? > > > > hmm, can you put a warn() trace in your sub SiteMap, I wonder if it's > > called the first time, but util.pm is not reloaded when Apache restarts > > itself on startup. > > any difference if you turn Off PerlFreshRestart? > > is mod_perl configured as a dso or static? > > > > -Doug > > mod_perl is static (my initial message included commands I used to build > mod_perl/apache). > > PerlFreshRestart Off has no effect. > > It does look like it's failing to load on the second pass, though, since I > get one response from the "warn" you suggested: > > # bin/httpd -X > util.pm: MSELproxy::util about to bootstrap MSELproxy::util ... > [Fri Oct 1 00:43:05 1999] null: ...saw SiteMap... > Syntax error on line 14 of /usr/local/apache/conf/perl.conf: > Invalid command 'SiteMap', perhaps mis-spelled or defined by a > module not included in the server configuration
... more evidence ... output of # MOD_PERL_TRACE=all bin/httpd -X
perl_parse args: '/dev/null' ...allocating perl interpreter...ok constructing perl interpreter...ok ok running perl interpreter...ok mod_perl: 0 END blocks encountered during server startup perl_cmd_require: conf/perl-startup.pl attempting to require `conf/perl-startup.pl' loading perl module 'Apache::Constants::Exports'...ok loading perl module 'Apache'...ok loading perl module 'MSELproxy::util'...[Fri Oct 1 00:54:26 1999] util.pm: MSELproxy::util about to bootstrap MSELproxy::util ... ok loading perl module 'Apache'...ok loading perl module 'MSELproxy::AccessManager'...ok loading perl module 'Apache'...ok loading perl module 'MSELproxy::OCLC'...ok loading perl module 'Apache'...ok loading perl module 'MSELproxy::RLG'...ok blessing cmd_parms=(0xbfffdb2c) [Fri Oct 1 00:54:26 1999] null: ...saw SiteMap... <--- [root@pembroke apache]# loading perl module 'Apache'...ok perl_startup: perl aleady running...ok loading perl module 'Apache'...ok cmd_cleanup: SvREFCNT($MSELproxy::util::$obj) == 1 cmd_cleanup: SvREFCNT($MSELproxy::util::$obj) == 1 loading perl module 'Apache'...ok perl_cmd_require: conf/perl-startup.pl attempting to require `conf/perl-startup.pl' loading perl module 'Apache'...ok loading perl module 'MSELproxy::util'...ok loading perl module 'Apache'...ok loading perl module 'MSELproxy::AccessManager'...ok loading perl module 'Apache'...ok loading perl module 'MSELproxy::OCLC'...ok loading perl module 'Apache'...ok loading perl module 'MSELproxy::RLG'...ok Syntax error on line 14 of /usr/local/apache/conf/perl.conf: Invalid command 'SiteMap', perhaps mis-spelled or defined by a module not included in the server configuration
#######################################################################
This is a stuff to be integrated into the DB section, all mostly by jwb:
Date: Thu, 14 Oct 1999 17:21:18 -0700 From: Jeffrey Baker <jwb@cp.net> To: modperl@apache.org, dbi-users@isc.org Cc: sbekman@iil.intel.com Subject: More on web application performance with DBI
Hi all,
I have converted my text-only guide to web application performance using mod_perl and DBI into HTML. The guide now lives alongside my DBI examples page at http://www.saturn5.com/~jwb/dbi-performance.html .
I have also conducted a silly benchmark to see how all of these optimization affect performance. Please remember that it is dangerous to extrapolate the results of a benchmark, especially one as rudimentary as this. With that said please consider the following data.
Environment: DB Server: Oracle 8.0.6, Sun Ultra2, 2 CPUs, 2GB RAM, Sun A1000 disks App Server: Linux, PII 350, 128MB RAM, Apache 1.3.6, mod_perl 1.19 Benchmark Client: ApacheBench on same machine as application server
Each benchmark consisted of a single request selecting one row from the database with a randomly selected primary key. The benchmark was run through 1000 requests with 10 simultaneous clients. The results were recorded using each level of optimization from my tutorial.
Zero optimization: 41.67 requests/second Stage 1 (persistent connections): 140.17 requests/second Stage 2 (bound parameters): 139.20 requests/second Stage 3 (persistent statement handles): 251.13 requests/second
It is interesting that the Stage 2 optimization didn't gain anything over Stage 1. I think this is because of the relative simplicity of my query, the small size of the test database (1000 rows), and the lack of other clients connecting to the database at the same time. In a real application, the cache thrashing that is caused by dynamic SQL statements would probably be detrimental to performance. In any case Stage 2 paves the way for Stage 3, which certainly does increase the request rate!
So, check it out at http://www.saturn5.com/~jwb/dbi-performance.html
Date: Wed, 23 Feb 2000 23:18:23 -0800 From: Jeffrey W. Baker <jwbaker@acm.org> To: modperl@apache.org Subject: Re: Database connection pooling... (beyond Apache::DBI)
Greg Stark wrote: > > Sean Chittenden <sean@serverninjas.com> writes: > > > Howdy. We're all probably pretty familiar with Apache::DBI and > > the fact that it opens a database connection per apache process. Sounds > > groovy and works well with only one or two servers. Everything is gravy > > until you get a cluster of servers, ie 20-30 machines, each with 300+ > > processes. > > 300+ perl processes per machine? No way. The only way that would make _any_ > sense is if your perl code is extremely i/o dependent and your perl code is > extremely light. Even then you're way better off having the i/o operations > queued quickly and processed asynchronously.
This conversation happens on an approximately biweekly schedule, either on modperl or dbi-users, or some other list I have the misfortune of frequenting. Please allow me to expand upon this subject a bit.
I have not yet gotten a satisfactory answer from anyone who starts these threads regarding why they want connection pooling. I suspect that people think it is needed because everyone else (Netscape, Microsoft, Bea) is doing it. There is a particular kind of application where pooled connections are useful, and there are particular situations where it is a waste. Every project I have ever done falls into the latter category, and I can only think of a few cases that fall under the former.
Connection pooling is a system where your application server threads or processes, which number n on a single machine, share a pool of database connections which number fewer than n. This is done to minimize the number of database connections which are open at once, which in turn is supposed to reduce the load on the database server. This is effective when database activity is a small fraction of the total load on the application server. For example, if your application server mostly performs matrix manipulation, and only occassionally hits the database, it would make sense for it to relinquish the connection when it is not in use.
The downside to connection pooling is that it imposes some overhead. The connections must be managed properly, and the scheme should be transparent to the programmer whose code is using it. So when a piece of code requests a database connection, the connection manager needs to decide which one to return. It may have to wait for one to free up, or it may have to open one based on some low-water-mark hueristic. It may also need to decide that a connection consumer has died or gone away, possibly taking the connection with it. So you can see that opening a pooled connection is more computationally expensive than opening a dedicated connection.
This pooling overhead is a total waste of time when the majority of what your application is doing is database-related. If your program will issue 100 queries and performa transaction during the course of fulfilling a request, pooled connections will not make sense. The reason is that Apache already provides a mechanism for killing off database connections in this scenario. If a process or thread is sitting about idle, Apache will come along an terminate it, freeing the database connection in the process. For database-bound or transactional programs, the one-to-one mapping of processes to database connections is ideal.
Pooling is also less attractive because modern databases can handle many connections. Oracle with MTS will run fine with just as many connections as you care to open. The application designer should study how many connections he realistically plans to open. If your application is bound by database performance, it makes sense to cap the number of clients, so you would not allow you applications to open too many connections. If your application is transactional, you don't have any choice but to give each processes its own dedicated connection. If your application is compute-bound, then your database is lightly loaded and you won't mind opening a lot of connections.
The summary is that if your application is database-bound, or is processing transactions, you don't need or even want connection pooling.
###################################
=> Security
It's a good idea to protect your various monitors like perl-status and alike by password. The less information you provide for intruders, the harder their break in task would be!!! (One of the biggest helps you can provide for these bad guys is showing them all the scripts you use if some of them are in public domain, while they can find out most of them by browsing your site. The moment they know the name of the script, they can grab the source of the script from the web (where the script has come from) and learn the source and probably find a few or even many security breaches. Security but obscurity doesn't really works against a determined intruder but it definitely helps to wave away some of the less determined malicious fellas.
e.g:
<Location /sys-monitor> SetHandler perl-script PerlHandler Apache::VMonitor AuthUserFile /home/httpd/perl/.htpasswd AuthGroupFile /dev/null AuthName "SH Admin" AuthType Basic <Limit GET POST> require user foo bar </Limit> </Location>
And the passwd file: /home/httpd/perl/.htpasswd: foo:1SA3h/d27mCp bar:WbWQhZM3m4kl
###################################
> THere's nothing wrong with Ralf's guide per se, but I think > you should mention in your Adding a proxy server section that > mod_rewrite might be necessary if dynamic content is intermixed > with static content.
That sounds reasonable indeed. I'll add it. Don't understand me wrong - I'm not against adding more things, I'm again duplication, which creates a mess. So once you made it clear, we need that - I'll certainly add it.
Would you add something about using mod_rewrite to handle my scenario to the guide?
Perhaps what you're looking for resembles this:
RewriteRule ^/(images|static)/ - [S=1] RewriteRule (.+) http://backend$1 [P,L]
John D Groenveld wrote: > > I've been using mod_proxy > to proxypass my static content away from my /modperl > directories. Now, I'd like to make my root > dynamic and thus pass everything except /images and > /static. > I've looked at the guide and tuning docs, as well > as the mod_proxy docs, but I must be missing > something.
###################################
Just a snippet to try...
try this (in the mod_perl-x.xx directory):
% make start_httpd % strace -o strace.out -p `cat t/logs/httpd.pid` & % make run_tests % grep open stace.out | grep .htaccess > send_to_modperl_list % make kill_httpd
and send us that file. I have the feeling there's a .htaccess in your tree that the process can't read.
###################################
Apache::RegistryNG is just waiting for more people to bang on it. so, if you make your module a sub-class of Apache::RegistryNG, that will help things move forward a bit :)
###################################
At the strategy sections put (first work on it):
REDUCING THE NUMBER OF LARGE PROCESSES
Unfortunately, simply reducing the size of each HTTPD process is not enough on a very busy site. You also need to reduce the quantity of these processes. This reduces memory consumption even more, and results in fewer processes fighting for the attention of the CPU. If you can reduce the quantity of processes to fit into RAM, your response time is increased even more.
The idea of the techniques outlined below is to offload the normal document delivery (such as static HTML and GIF files) from the mod_perl HTTPD, and let it only handle the mod_perl requests. This way, your large mod_perl HTTPD processes are not tied up delivering simple content when a smaller process could perform the same job more efficiently.
In the techniques below where there are two HTTPD configurations, the same httpd executable can be used for both configurations; there is no need to build HTTPD both with and without mod_perl compiled into it. With Apache 1.3 this can be done with the DSO configuration -- just configure one httpd invocation to dynamically load mod_perl and the other not to do so.
These approaches work best when most of the requests are for static content rather than mod_perl programs. Log file analysis become a bit of a challenge when you have multiple servers running on the same host, since you must log to different files.
TWO MACHINES
The simplest way is to put all static content on one machine, and all mod_perl programs on another. The only trick is to make sure all links are properly coded to refer to the proper host. The static content will be served up by lots of small HTTPD processes (configured not to use mod_perl), and the relatively few mod_perl requests can be handled by the smaller number of large HTTPD processes on the other machine.
The drawback is that you must maintain two machines, and this can get expensive. For extremely large projects, this is the best way to go.
TWO IP ADDRESSES
Similar to above, but one HTTPD runs bound to one IP address, while the other runs bound to another IP address. The only difference is that one machine runs both servers. Total memory usage is reduced because the majority of files are served by the smaller HTTPD processes, so there are fewer large mod_perl HTTPD processes sitting around.
This is accomplished using the httpd.conf directive BindAddress
to make each HTTPD respond only to one IP address on this host. One will have mod_perl enabled, and the other will not.
USING ProxyPass WITH TWO SERVERS
To overcome the limitation of the alternate port above, you can use dual Apache HTTPD servers with just slight difference in configuration. Essentially, you set up two servers just as you would with the two port on same IP address method above. However, in your primary HTTPD configuration you add a line like this:
ProxyPass /programs http://localhost:8042/programs
Where your mod_perl enabled HTTPD is running on port 8042, and has only the directory programs within its DocumentRoot. This assumes that you have included the mod_proxy module in your server when it was built.
Now, when you access http://www.domain.com/programs/printenv it will internally be passed through to your HTTPD running on port 8042 as the URL http://localhost:8042/programs/printenv and the result relayed back transparently. To the client, it all seems as if it is just one server running. This can also be used on the dual-host version to hide the second server from view if desired.
The directory structure assumes that F is the C directory, and the the mod_perl programs are in F and F. I start them as follows: daemon httpd daemon httpd -f conf/httpd+perl.conf SQUID ACCELERATOR
Another approach to reducing the number of large HTTPD processes on one machine is to use an accelerator such as Squid (which can be found at http://squid.nlanr.net/Squid/ on the web) between the clients and your large mod_perl HTTPD processes. The idea here is that squid will handle the static objects from its cache while the HTTPD processes will handle mostly just the mod_perl requests once the cache is primed. This reduces the number of HTTPD processes and thus reduces the amount of memory used. To set this up, just install the current version of Squid (at this writing, this is version 1.1.22) and use the RunAccel script to start it. You will need to reconfigure your HTTPD to use an alternate port, such as 8042, rather than its default port 80. To do this, you can either change the F line C or add a C directive to match the port specified in the F file. Your URLs do not need to change. The benefit of using the C directive is that redirected URLs will still use the default port 80 rather than your alternate port, which might reveal your real server location to the outside world and bypass the accelerator. In the F file, you will probably want to add C and C to the C parameter so that these are always passed through to the HTTPD server under the assumption that they always produce different results. This is very similar to the two port, ProxyPass version above, but the Squid cache may be more flexible to fine tune for dynamic documents that do not change on every view. The Squid proxy server also seems to be more stable and robust than the Apache 1.2.4 proxy module. One drawback to using this accelerator is that the logfiles will always report access from IP address 127.0.0.1, which is the local host loopback address. Also, any access permissions or other user tracking that requires the remote IP address will always see the local address. The following code uses a feature of recent mod_perl versions (tested with mod_perl 1.16 and Apache 1.3.3) to trick Apache into logging the real client address and giving that information to mod_perl programs for their purposes. First, in your F file add the following code: use Apache::Constants qw(OK); sub My::SquidRemoteAddr ($) { my $r = shift; if (my ($ip) = $r->header_in('X-Forwarded-For') =~ /([^,\s]+)$/) { $r->connection->remote_ip($ip); } return OK; } Next, add this to your F file: PerlPostReadRequestHandler My::SquidRemoteAddr This will cause every request to have its C address overridden by the value set in the C header added by Squid. Note that if you have multiple proxies between the client and the server, you want the IP address of the last machine before your accelerator. This will be the right-most address in the X-Forwarded-For header (assuming the other proxies append their addresses to this same header, like Squid does.) If you use apache with mod_proxy at your frontend, you can use Ask Bjørn Hansen's mod_proxy_add_forward module from ftp://ftp.netcetera.dk/pub/apache/ to make it insert the C header. ################################### config.pod: use Eric's presentation: http://conferences.oreilly.com/cd/apache/presentations/echolet/contents.html ################################### mod_perl Humour. * mod_perl for embedded devices: Q: mod_perl for my Palm Pilot dumps core when built as a DSO, and the Palm lacks the memory to build statically, what should I do? A: you should get another Palm Pilot to act as a reverse proxy by Eric Cholet. ################################################# DBI tips to improve performance: Need to work on the snippets below: What if the user_id has something that needs to be quoted? I speak of the general case. User data should not get anywhere *near* an SQL line... it should always be inserted via placeholders or very very careful consideration to quoting. Ahh, I see. I basically do the latter, with $dbh->quote. The contents of $Session are entirely system-generated. The user gives a ticket through the URL, yes, but that is parsed and validated and checked for presence in the DB before you even get to code that works like I had described. I agree - but you should always be aware of the issues with using placeholders for the database engine that you use. Sybase in particular has a deficient implementation, which tends to run out of space and creates locking contention. Using stored procs instead is a lot better (although it doesn't solve the quoting problems). OTOH, Oracle caches compiled SQL, and using placeholders means it's not caching SQL with specific data in it. The values can get bound into the compiled SQL just as easily, and it speeds things up by a noticable amount (factor of ~3 in my tests) If we are on this topic, I have a few questions. I've just read the DBI manpage, there is a prepare_cached() call. It's useless in mod_cgi if used only once with the same params across the script. If I use Apache::DBI, and replace all prepare statements (which include placeholders) with prepare_cached(). Does it mean that like with modules preloading , the prepare will be called only once per unique statements thru the whole life of the child? Otherwise a usage of placeholders is useless, if you do only one execute() call per unique prepare() statement. The only benefit is of DBI taking handle of quoting the values for you. I don't remember someone mentioned prepare_cached() ever. What's the verdict? Simply adding the "_cached" to "prepare()" in one of my utilities increased the performance eight fold (Oracle non-mod_perl environment). I don't know the fine points of if it is possible to share cached prepares across children (can you even fork with db connections?), but if your code is doing the same query(ies) over and over, definitly give it a try. Not necessarily; it depends on your database. Oracle does caching which persists until it needs the space for something else; if you're finding information about customers, it's much more efficinet for there to be one entry in the library cache like this: select * from customers where customer_id = :p1 than it is for there to be lots of them like: select * from customers where customer_id = 123 select * from customers where customer_id = 465 select * from customers where customer_id = 789 since Oracle has to parse, compile and cache each one separatley. I don't know if other databases do this kind of caching. Ok, this makes sense. I just read the MySQL manual - with all grief, it doesn't cache :( So, I still think to use prepare_cached() to cache on the DBI behalf, but it's said to work thru the life of $dbh and since my $dbh is my() lexicall variable, I don't understand whether I get this benefit or not? I know that Apache::DBI maintains a pool of connections, does it preserver the cache of prepare statements as well (I mean does it preserve the whole $dbh object )? If it does, I get a speedup at least a speedup for the whole life of a single connection. I think that the speedup is even better than the one you have been talking about, since if Oracle caches the prepare statement, DBI still reachs out for Oracle, if it's local cache we get a little more save ups. Anyone deployes the scenario I have tried to present here? Seems like a good candidate for a performance chapter of the guide if it really makes speed better... The statement cursors will be cached per $dbh, which Apache::DBI caches, so there is an extreme performance boost... as your application runs caching all its cursors, database queries will become execution speed, no query parsing will be involved anymore. On Oracle, the performance improvement I saw was 100% by using prepare_cached functionality. If you have just a small number of web servers, the caching difference between Oracle & MySQL will be small on the db end. Its when you have a lot of DBI handles that things might get inefficient. But I'm sure you are running a proxy front end, right Stas? :) Be warned: there are some pitfalls associated with prepare_cached(). It actually gives you a reference to the *same* cached statement handle, not just a similar copy. So you can't do this: my $sth1 = $dbh->prepare_cached('select name from table where id=?'); my $sth2 = $dbh->prepare_cached('select name from table where id=?'); $sth1 & $sth2 are now the same object! If you try to use them independently, they'll stomp all over each other. That said, prepare_cached() can be a huge win when using a slow database like Oracle. For mysql, it doesn't seem to help much, since mysql is so darn fast at preparing its statements. Sometimes you have to be careful about that, yes. For instance, I was repeatedly executing a statement to insert data into a varchar column. The first value to insert just happened to be a number, so DBD::mysql thought that it was a numeric column, and subsequent insertions failed using that same statement handle. I'm not sure what the correct solution should have been in that case, but I reverted back to calling $dbh->quote($val) and putting it directly into the SQL. My opinion is that mysql should do a better job of figuring out which fields are actually numeric and which are strings - i.e. get the info from the database, not from the format of the data I'm passing it. Actually, I'm a big fan of placeholders. I think they make the programming task a lot easier, since you don't have to worry about quoting data values. They can also be quite nice when you've got values in a nice data structure and you want to pass them all to the database - just put them in the bound-vars list, and forget about constructing some big SQL string. I believe mysql just emulates true placeholders by doing the quoting, etc. behind the scenes. So it's probably not much faster to use placeholders than direct embedded values. But I think placeholders are cleaner, generally, and more fun. In my experience, prepare_cached() is just a judgment call. It hasn't seemed to be a big performance win for mysql, so sometimes I use it, sometimes I don't. I always use it with Oracle, though. prepare_cached is implemented by the database handle (and really the database itself). For example, in Oracle it speeds things up. In MySQL, it is exactly the same as prepare() because DBD::mysql does not implement it because MySQL itself has no mechanism for doing this. As I said in a previous message, prepare_cached() don't cache anything under MySQL. However, you can implement your own statement handle caching scheme pretty easily by either subclassing DBI or writing a DB access module of your own (my preferred method). my $db = MyDB->new; my $sql = 'SELECT 1'; my $sth = $db->get_sth($sql); $sth->execute or die $dbh->errstr; my ($numone) = $sth->fetchrow_array; $sth->finish or die $dbh->errstr; # This is doubly necessary with this caching scheme! sub get_sth { my $self = shift; my $sql = shift; return $self->{sth_cache}->{$sql} if exists $self->{sth_cache}->{$sql}; $self->{sth_cache}->{$sql} = $self->{dbh}->prepare($sql) or die $self->{dbh}->errstr; return $self->{sth_cache}->{$sql}; } I've used that in a few situations and it appears to speed things up a bit. For mod_perl, we would probably want to make $self->{sth_cache} global. You know, I just benchmarked this on a machine running PostgreSQL and it didn't actually speed things up (or slow it down). However, I suspect that under mod_perl if this were something that were globally shared inside a child process it might make a difference. Plus it also depends on the database used. (Contributors: Randal L. Schwartz, Steve Willer, Michael Peppler, Mark Cogan, Eric Hammond, Russell D. Weiss, Joshua Chamas, Ken Williams, Peter Grimes) ################################################# As a quick side note, I actually found that it's faster to write the logs directly into a .gz, and read them out of the .gz, through pipes. It takes longer (significantly, by my experience) to read 100 megs from the drive than it does to compress or uncompress 5 megs of data. ################################################# performance.pod - extend on Apache::TimeIt package ################################################# Add a new section - contributing to the guide - with incentives and guidelines of contributions (diff against pod...) ################################################# security.pod : add Apache:Auth* modules ################################################# examples of Apache::Session::DBI code: use strict; use DBI; use Apache::Session::DBI; use CGI; use CGI::Carp qw(fatalsToBrowser); # Recommendation from mod_perl_traps: use Carp (); local $SIG{__WARN__} = \&Carp::cluck; [...] # Initiate a session ID my $session = (); my $opts = { autocommit => 0, lifetime => 3600 }; # 3600 is one hour # Read in the cookie if this is an old session my $r = Apache->request; my $no_cookie = ''; my $cookie = $r->header_in('Cookie'); { # eliminate logging from Apache::Session::DBI's use of `warn' local $^W = 0; if (defined($cookie) && $cookie ne '') { $cookie =~ s/SESSION_ID=(\w*)/$1/; $session = Apache::Session::DBI->open($cookie, $opts); $no_cookie = 'Y' unless defined($session); } # Could have been obsolete - get a new one $session = Apache::Session::DBI->new($opts) unless defined($session); } # Might be a new session, so let's give them a cookie back if (! defined($cookie) || $no_cookie) { local $^W = 0; my $session_cookie = "SESSION_ID=$session->{'_ID'}"; $r->header_out("Set-Cookie" => $session_cookie); }3 POD Errors
The following errors were encountered while parsing the POD:
- Around line 336:
'=item' outside of any '=over'
- Around line 820:
You forgot a '=back' before '=head1'
- Around line 982:
Non-ASCII character seen before =encoding in 'Bjørn'. Assuming CP1252