A Must Read!
This new document was born because some users are reluctant to learn Perl, prior to jumping into mod_perl. I will try to cover some of the most frequent pure perl questions being asked at the list.
Update: I'm moving most of the pure Perl related topics from everywhere in the Guide to this chapter. From now on other chapters will refer to sections in this chapter if required.
Before you decide to skip this chapter make sure you know all the information provided here. The rest of the Guide assumes that you have read this chapter and understood it.
perldoc's Rarely Known But Very Useful Options
First of all, I want to stress that you cannot become a Perl hacker without knowing how to read Perl documentation and search through it. Books are good, but an easily accessible and searchable Perl reference is at your fingertips and is a great time saver.
While you can use online Perl documentation at the Web, the perldoc
utility provides you with access to the documentation installed on your system. To find out what Perl manpages are available execute:
% perldoc perl
To find what functions perl has, execute:
% perldoc perlfunc
To learn the syntax and to find examples of a specific function, you would execute (e.g. for open()
):
% perldoc -f open
Note: In perl5.00503 and earlier, there is a bug in this and the -q
options of perldoc
. It won't call pod2man
, but will display the section in POD format instead. Despite this bug it's still readable and very useful.
To search through the Perl FAQ (perlfaq manpage) sections you would (e.g for the open
keyword) execute:
% perldoc -q open
This will show you all the matching Q&A sections, still in POD format.
To read the perldoc manpage you execute:
% perldoc perldoc
Tracing Warnings Reports
Sometimes it's very hard to understand what a warning is complaining about. You see the source code, but you cannot understand why some specific snippet produces that warning. The mystery often results from the fact that the code can be called from different places if it's located inside a subroutine.
Here is an example:
warnings.pl
-----------
#!/usr/bin/perl -w
correct();
incorrect();
sub correct{
print_value("Perl");
}
sub incorrect{
print_value();
}
sub print_value{
my $var = shift;
print "My value is $var\n";
}
In the code above, print_value() prints the passed value, correct() passes the value to print and in incorrect() we forgot to pass it. When we run the script:
% ./warnings.pl
we get the warning:
Use of uninitialized value at ./warnings.pl line 16.
Perl complains about an undefined variable $var
at the line that attempts to print its value:
print "My value is $var\n";
But how do we know why it is undefined? The reason here obviously is that the calling function didn't pass the argument. But how do we know who was the caller? In our example there are two possible callers, in the general case there can be many of them, perhaps located in other files.
We can use the caller() function, which tells who has called us, but even that might not be enough: it's possible to have a longer sequence of called subroutines, and not just two. For example, here it is sub third() which is at fault, and putting sub caller() in sub second() would not help us very much:
sub third{
second();
}
sub second{
my $var = shift;
first($var);
}
sub first{
my $var = shift;
print "Var = $var\n"
}
The solution is quite simple. What we need is a full calls stack trace to the call that triggered the warning.
The Carp
module comes to our aid with its cluck() function. Let's modify the script by adding a couple of lines. The rest of the script is unchanged.
warnings2.pl
-----------
#!/usr/bin/perl -w
use Carp ();
local $SIG{__WARN__} = \&Carp::cluck;
correct();
incorrect();
sub correct{
print_value("Perl");
}
sub incorrect{
print_value();
}
sub print_value{
my $var = shift;
print "My value is $var\n";
}
Now when we execute it, we see:
Use of uninitialized value at ./warnings2.pl line 19.
main::print_value() called at ./warnings2.pl line 14
main::incorrect() called at ./warnings2.pl line 7
Take a moment to understand the calls stack trace. The deepest calls are printed first. So the second line tells us that the warning was triggered in print_value(); the third, that print_value() was called by incorrect() subroutine.
script => incorrect() => print_value()
We go into incorrect()
and indeed see that we forgot to pass the variable. Of course when you write a subroutine like print_value
it would be a good idea to check the passed arguments before starting execution. We omitted that step to contrive an easily debugged example.
Sure, you say, I could find that problem by simple inspection of the code!
Well, you're right. But I promise you that your task would be quite complicated and time consuming if your code has some thousands of lines. In addition, under mod_perl, certain uses of the eval
operator and "here documents" are known to throw off Perl's line numbering, so the messages reporting warnings and errors can have incorrect line numbers. (See Finding the Line Which Triggered the Error or Warning for more information).
Getting the trace helps a lot.
Variables Globally, Lexically Scoped And Fully Qualified
META: complete
META: I should say something in here first about symbol tables. Advanced Perl Programming/Perl Guts chapter is a good source for that
Also see the clarification of my()
vs. use vars
- Ken Williams writes:
Yes, there is quite a bit of difference! With use vars(), you are
making an entry in the symbol table, and you are telling the
compiler that you are going to be referencing that entry without an
explicit package name.
With my(), NO ENTRY IS PUT IN THE SYMBOL TABLE. The compiler
figures out C<at compile time> which my() variables (i.e. lexical
variables) are the same as each other, and once you hit execute time
you cannot go looking those variables up in the symbol table.
And my()
vs. local()
- Randal Schwartz writes:
local() creates a temporal-limited package-based scalar, array,
hash, or glob -- when the scope of definition is exited at runtime,
the previous value (if any) is restored. References to such a
variable are *also* global... only the value changes. (Aside: that
is what causes variable suicide. :)
my() creates a lexically-limited non-package-based scalar, array, or
hash -- when the scope of definition is exited at compile-time, the
variable ceases to be accessible. Any references to such a variable
at runtime turn into unique anonymous variables on each scope exit.
Additional reading references
For more information see: Using global variables and sharing them between modules/packages and an article by Mark-Jason Dominus about how Perl handles variables and name-spaces, and the difference between use vars()
and my()
- http://www.plover.com/~mjd/perl/FAQs/Namespaces.html .
my() Scoped Variable in Nested Subroutines
Before we proceed let's make the assumption that we want to develop the code under the strict
pragma. We will use lexically scoped variables (with help of the my() operator) whenever it's possible.
The Poison
Let's look at this code:
nested.pl
-----------
#!/usr/bin/perl
use strict;
sub print_power_of_2 {
my $x = shift;
sub power_of_2 {
return $x ** 2;
}
my $result = power_of_2();
print "$x^2 = $result\n";
}
print_power_of_2(5);
print_power_of_2(6);
Don't let the weird subroutine names to fool you, the print_power_of_2() subroutine should print the square of the passed number. Let's run the code and see whether it works:
% ./nested.pl
5^2 = 25
6^2 = 25
Ouch, something is wrong. May be there is a bug in Perl and it doesn't work correctly with number 6? Let's try again using the 5 and 7:
print_power_of_2(5);
print_power_of_2(7);
And run it:
% ./nested.pl
5^2 = 25
7^2 = 25
Wow, does it works only for 5? How about using 3 and 5:
print_power_of_2(3);
print_power_of_2(5);
and the result is:
% ./nested.pl
3^2 = 9
5^2 = 9
Now we start to understand--only the first call to the print_power_of_2() function works correctly. Which makes us think that our code has some kind of memory for results of the first execution, or it ignores the arguments in subsequent executions.
The Diagnosis
Let's follow the guidelines and use the -w
flag. Now execute the code:
% ./nested.pl
Variable "$x" will not stay shared at ./nested.pl line 9.
5^2 = 25
6^2 = 25
We have never seen such a warning message before and we don't quite understand what it means. The diagnostics
pragma will certainly help us. Let's prepend this pragma before the strict
pragma in our code:
#!/usr/bin/perl -w
use diagnostics;
use strict;
And execute it:
% ./nested.pl
Variable "$x" will not stay shared at ./nested.pl line 10 (#1)
(W) An inner (nested) named subroutine is referencing a lexical
variable defined in an outer subroutine.
When the inner subroutine is called, it will probably see the value of
the outer subroutine's variable as it was before and during the
*first* call to the outer subroutine; in this case, after the first
call to the outer subroutine is complete, the inner and outer
subroutines will no longer share a common value for the variable. In
other words, the variable will no longer be shared.
Furthermore, if the outer subroutine is anonymous and references a
lexical variable outside itself, then the outer and inner subroutines
will never share the given variable.
This problem can usually be solved by making the inner subroutine
anonymous, using the sub {} syntax. When inner anonymous subs that
reference variables in outer subroutines are called or referenced,
they are automatically rebound to the current values of such
variables.
5^2 = 25
6^2 = 25
Well, now everything is clear. We have the inner subroutine power_of_2() and the outer subroutine print_power_of_2() in our code.
When the inner power_of_2() subroutine is called for the first time, it sees the value of the outer print_power_of_2() subroutine's $x
variable. On subsequent calls the $x
variable won't be updated, no matter what the value of it in the outer subroutine. There are two copies of the $x
variable, no longer a single one shared by the two routines.
The Remedy
The diagnostics
pragma suggests that the problem can be solved by making the inner subroutine anonymous.
An anonymous subroutine can act as a closure with respect to lexically scoped variables. Basically this means that if you define a subroutine in a particular lexical context at a particular moment, then it will run in that same context later, even if called from outside that context. The upshot of this is that when the subroutine runs, you get the same copies of the lexically scoped variables which were visible when the subroutine was defined. So you can pass arguments to a function when you define it, as well as when you invoke it.
Let's rewrite the code to use this technique:
anonymous.pl
--------------
#!/usr/bin/perl
use strict;
sub print_power_of_2 {
my $x = shift;
my $func_ref = sub {
return $x ** 2;
};
my $result = &$func_ref();
print "$x^2 = $result\n";
}
print_power_of_2(5);
print_power_of_2(6);
Now $func_ref
contains a reference to an anonymous function, which we later use when we need to get the power of two. (In Perl, a function is the same thing as a subroutine.) Since it is anonymous, the function will automatically be rebound to the new value of the outer scoped variable $x, and the results will now be as expected.
Let's verify:
% ./anonymous.pl
5^2 = 25
6^2 = 36
Indeed, anonymous.pl worked as we expected.
When You Cannot Get Rid of The Inner Subroutine
First you might wonder, why in the world will someone need to define an inner subroutine? Well, for example to reduce some of Perl's script startup overhead you might decide to write a daemon that will compile the scripts and modules only once, and cache the pre-compiled code in memory. When some script is to be executed, you just tell the daemon the name of the script to run and it will do the rest and do it much faster.
Seems like an easy task, and it is. The only problem is once the script is compiled, how do you execute it? Or let's put it the other way: after it was executed for the first time and it stays compiled in the daemon memory, how do you call it again? If you could get all developers to code the scripts so each has a subroutine called run() that will actually execute the code in the script then you have half of the problem solved.
But how does the daemon know to refer to some specific script if they all run in the main::
name space? One solution might be to ask the developers to declare a package in each and every script, and for the package name to be derived from the script name. However, since there is chance that there will be more than one script with the same name but residing in different directories, then in order to prevent name-space collisions the directory has to be a part of the package name too. And don't forget that script may be moved from one directory to another, so you will have to make sure that the package name is corrected every time the script gets moved.
But why enforce these strange rules on developers, when we can arrange for our daemon to do this work? For every script that daemon is about to execute for the first time, it should be wrapped inside the package whose name is constructed from the mangled path to the script and a subroutine called run(). For example if the daemon is about to execute the script /tmp/hello.pl:
hello.pl
--------
#!/usr/bin/perl
print "Hello\n";
Prior to running it, the daemon will change the code to be:
wrapped_hello.pl
----------------
package cache::tmp::hello_2epl;
sub run{
#!/usr/bin/perl
print "Hello\n";
}
The package name is constructed from the prefix cache::
, each directory separation slash is replaced with ::
, and non alphanumeric characters are encoded so that for example .
(a dot) becomes _2e
(an underscore followed by the ASCII code for a dot in hex representation).
% perl -e 'printf "%x",ord(".")'
prints: 2e
. The underscore is the same you see in URL encoding where %
character is used instead (%2E
), but since %
has a special meaning in Perl (prefix of hash variable) it couldn't be used.
Now when the daemon is requested to execute the script /tmp/hello.pl, all it has to do is to build the package name as before based on the location of the script and call its run() subroutine:
use cache::tmp::hello_2epl;
cache::tmp::hello_2epl::run();
We have just written a partial prototype of the daemon we desired. The only method now remaining undefined is how to pass the path to the script to the daemon. This detail is left to the reader as an exercise.
If you are familiar with the Apache::Registry
module, you know that it works in almost the same way. It uses a different package prefix and the generic function is called handler() and not run(). The scripts to run are passed through the HTTP protocol's headers.
Now you understand that there are cases where your normal subroutines can become inner, since if your script was a simple:
simple.pl
---------
#!/usr/bin/perl
sub hello { print "Hello" }
hello();
Wrapped into a run() subroutine it becomes:
simple.pl
---------
package cache::simple_2epl;
sub run{
#!/usr/bin/perl
sub hello { print "Hello" }
hello();
}
Therefore, hello() is an inner subroutine and if you have used my() scoped variables defined and altered outside and used inside hello(), it won't work as you expect starting from the second call, as was explained in the previous section.
Remedies for Inner Subroutines
First of all there is nothing to worry about, as long as you don't forget to turn the warnings On. If you do happen to have the "my() Scoped Variable in Nested Subroutines" problem, Perl will always alert you.
Given that you have a script that has this problem, what are the ways to solve it? There are many of them and we will discuss some of them here.
We will use the following code to show the different solutions.
multirun.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
my $counter = 0;
increment_counter();
increment_counter();
sub increment_counter{
$counter++;
print "Counter is equal to $counter !\n";
}
} # end of sub run
This code executes the run() subroutine three times, which in turn initializes the $counter
variable to 0, every time it executed and then calls the inner subroutine increment_counter() twice. Sub increment_counter() prints $counter
's value after incrementing it. One might expect to see the following output:
run: [time 1]
Counter is equal to 1 !
Counter is equal to 2 !
run: [time 2]
Counter is equal to 1 !
Counter is equal to 2 !
run: [time 3]
Counter is equal to 1 !
Counter is equal to 2 !
But as we have already learned from the previous sections, this is not what we are going to see. Indeed, when we run the script we see:
% ./multirun.pl
Variable "$counter" will not stay shared at ./nested.pl line 18.
run: [time 1]
Counter is equal to 1 !
Counter is equal to 2 !
run: [time 2]
Counter is equal to 3 !
Counter is equal to 4 !
run: [time 3]
Counter is equal to 5 !
Counter is equal to 6 !
Obviously, the $counter
variable is not reinitialized on each execution of run(). It retains its value from the previous execution, and sub increment_counter() increments that.
One of the workarounds is to use globally declared variables, with the vars
pragma.
multirun1.pl
-----------
#!/usr/bin/perl -w
use strict;
use vars qw($counter);
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
$counter = 0;
increment_counter();
increment_counter();
sub increment_counter{
$counter++;
print "Counter is equal to $counter !\n";
}
} # end of sub run
If you run this and the other solutions offered below, the expected output will be generated:
% ./multirun1.pl
run: [time 1]
Counter is equal to 1 !
Counter is equal to 2 !
run: [time 2]
Counter is equal to 1 !
Counter is equal to 2 !
run: [time 3]
Counter is equal to 1 !
Counter is equal to 2 !
By the way, the warning we saw before has gone, and so has the problem, since there is no my()
(lexically defined) variable used in the nested subroutine.
Another approach is to use fully qualified variables. This is better, since less memory will be used, but it adds a typing overhead:
multirun2.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
$main::counter = 0;
increment_counter();
increment_counter();
sub increment_counter{
$main::counter++;
print "Counter is equal to $main::counter !\n";
}
} # end of sub run
You can also pass the variable to the subroutine by value and make the subroutine return it after it was updated. This adds time and memory overheads, so it may not be good idea if the variable can be very large, or if speed of execution is an issue.
Don't rely on the fact that the variable is small during the development of the application, it can grow quite big in situations you don't expect. For example, a very simple HTML form text entry field can return a few megabytes of data if one of your users is bored and wants to test how good is your code. It's not uncommon to see users Copy-and-Paste 10Mb core dump files into a form's text fields and then submit it for your script to process.
multirun3.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
my $counter = 0;
$counter = increment_counter($counter);
$counter = increment_counter($counter);
sub increment_counter{
my $counter = shift || 0 ;
$counter++;
print "Counter is equal to $counter !\n";
return $counter;
}
} # end of sub run
Finally, you can use references to do the job. The version of increment_counter() below accepts a reference to the $counter
variable and increments its value after first dereferencing it. When you use a reference, the variable you use inside the function is physically the same bit of memory as the one outside the function. This technique is often used to enable a called function to modify variables in a calling function.
multirun4.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
my $counter = 0;
increment_counter(\$counter);
increment_counter(\$counter);
sub increment_counter{
my $r_counter = shift || 0;
$$r_counter++;
print "Counter is equal to $$r_counter !\n";
}
} # end of sub run
Here is yet another and more obscure reference usage. We modify the value of $counter
inside the subroutine by using the fact that variables in @_
are aliases for the actual scalar parameters. Thus if you called a function with two arguments, those would be stored in $_[0]
and $_[1]
. In particular, if an element $_[0]
is updated, the corresponding argument is updated (or an error occurs if it is not updatable).
multirun5.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
my $counter = 0;
increment_counter($counter);
increment_counter($counter);
sub increment_counter{
$_[0]++;
print "Counter is equal to $_[0] !\n";
}
} # end of sub run
Now you have at least five workarounds to choose from.
For more information please refer to perlref and perlsub manpages.
use(), require(), do(), %INC and @INC Explained
The @INC array
@INC
is a special Perl variable which is the equivalent of the shell's PATH
variable. Whereas PATH
contains a list of directories to search for executables, @INC
contains a list of directories from which Perl modules and libraries can be loaded.
When you use(), require() or do() a filename or a module, Perl gets a list of directories from the @INC
variable and searches them for the file it was requested to load. If the file that you want to load is not located in one of the listed directories, you have to tell Perl where to find the file. You can either provide a path relative to one of the directories in @INC
, or you can provide the full path to the file.
The %INC hash
%INC
is another special Perl variable that is used to cache the names of the files and the modules that were successfully loaded and compiled by use(), require() or do() functions. Before attempting to load a file or a module, Perl checks whether it's already in the %INC
hash. If it's there, the loading and therefore the compilation are not performed at all. Otherwise the file is loaded into memory and an attempt is made to compiled it.
If the file is successfully loaded and compiled, a new key-value pair is added to %INC
. The key is the name of the file or module as it was passed to the one of the three functions we have just mentioned, and if it was found in any of the @INC
directories except "."
the value is the full path to it in the file system.
The following examples will make it easier to understand the logic.
First, let's see what are the contents of @INC
on my system:
% perl -e 'print join "\n", @INC'
/usr/lib/perl5/5.00503/i386-linux
/usr/lib/perl5/5.00503
/usr/lib/perl5/site_perl/5.005/i386-linux
/usr/lib/perl5/site_perl/5.005
.
Notice the .
(current directory) is the last directory in the list.
Now let's load the module strict.pm
and see the contents of %INC
:
% perl -e 'use strict; print map {"$_ => $INC{$_}\n"} keys %INC'
strict.pm => /usr/lib/perl5/5.00503/strict.pm
Since strict.pm
was found in /usr/lib/perl5/5.00503/ directory and /usr/lib/perl5/5.00503/ is a part of @INC
, %INC
includes the full path as the value for the key strict.pm
.
Now let's create the simplest module in /tmp/test.pm
:
test.pm
-------
1;
It does nothing, but returns a true value when loaded. Now let's load it in different ways:
% cd /tmp
% perl -e 'use test; print map {"$_ => $INC{$_}\n"} keys %INC'
test.pm => test.pm
Since the file was found relative to .
(the current directory), the relative path is inserted as the value. If we alter @INC
, by adding /tmp to the end:
% cd /tmp
% perl -e 'BEGIN{push @INC, "/tmp"} use test; \
print map {"$_ => $INC{$_}\n"} keys %INC'
test.pm => test.pm
Here we still get the relative path, since the module was found first relative to "."
. The directory /tmp was placed after .
in the list. If we execute the same code from a different directory, the "."
directory won't match,
% cd /
% perl -e 'BEGIN{push @INC, "/tmp"} use test; \
print map {"$_ => $INC{$_}\n"} keys %INC'
test.pm => /tmp/test.pm
so we get the full path. We can also prepend the path with unshift(), so it will be used for matching before "."
and therefore we will get the full path as well:
% cd /tmp
% perl -e 'BEGIN{unshift @INC, "/tmp"} use test; \
print map {"$_ => $INC{$_}\n"} keys %INC'
test.pm => /tmp/test.pm
The code:
BEGIN{unshift @INC, "/tmp"}
can be replaced with the more elegant:
use lib "/tmp";
Which executes the BEGIN block above exactly.
These approaches to modifying @INC
can be labor intensive, since if you want to move the script around in the file-system you have to modify the path. This can be painful, for example, when you move your scripts from development to a production server.
There is a module called FindBin
which solves this problem in the plain Perl world, but unfortunately it won't work under mod_perl, since it's a module and as any module it's loaded only once. So the first script using it will have all the settings correct, but the rest of the scripts will not if located in a different directory from the first.
For a completeness of this section, I'll present this module anyway.
If you use this module, you don't need to write a hard coded path. The following snippet does all the work for you (the file is /tmp/load.pl):
load.pl
-------
#!/usr/bin/perl
use FindBin ();
use lib "$FindBin::Bin";
use test;
print "test.pm => $INC{'test.pm'}\n";
In the above example $FindBin::Bin
is equal to /tmp. If we move the script somewhere else... e.g. /tmp/x in the code above $FindBin::Bin
equals /home/x.
% /tmp/load.pl
test.pm => /tmp/test.pm
Just like with use lib
but no hard coded path required.
You can use this workaround to make it work under mod_perl.
do 'FindBin.pm';
unshift @INC, "$FindBin::Bin";
require test;
#maybe test::import( ... ) here if need to import stuff
You will have a slight overhead because you will load from disk and recompile the FindBin
module on each request. So it can be not worth it.
Modules, Libraries and Files
Before we proceed, let's define what we mean by module, and library or file.
The Library or the File
A file which contains perl subroutines and other code.
It generally doesn't include a package declaration.
Its last statement returns true.
It can be named in any way desired, but generally its extension is .pl or .ph.
Examples:
config.pl ---------- $dir = "/home/httpd/cgi-bin"; $cgi = "/cgi-bin"; 1; mysubs.pl ---------- sub print_header{ print "Content-type: text/plain\r\n\r\n"; } 1;
the Module
A file which contains perl subroutines and other code.
It generally declares a package name at the beginning of it.
Its last statement returns true.
The naming convention requires it to have a .pm extension.
Example:
MyModule.pm ----------- package My::Module; $My::Module::VERSION = 0.01; sub new{ return bless {}, shift;} END { print "Quitting\n"} 1;
require()
require() reads a file containing Perl code and compiles it. Before attempting to load the file it looks up the argument in %INC
to see whether it has already been loaded. If it has, require() just returns without doing a thing. Otherwise an attempt will be made to load and compile the file.
require() has to find the file it has to load. If the argument is a full path to the file, it just tries to read it. For example:
require "/home/httpd/perl/mylibs.pl";
If the path is relative, require() will attempt to search for the file in all the directories listed in @INC
. For example:
require "mylibs.pl";
If there is more than one occurrence of the file with the same name in the directories listed in @INC
the first occurrence will be used.
The file must return TRUE as the last statement to indicate successful execution of any initialization code. Since you never know what changes the file will go through in the future, you cannot be sure that the last statement will always return TRUE. That's why the suggestion is to put "1;
" at the end of file.
Although you should use the real filename for most files, if the file is a module, you may use the following convention instead:
require My::Module;
This is equal to:
require "My/Module.pm";
If require() fails to load the file, either because it couldn't find the file in question or the code failed to compile, or it didn't return TRUE, then the program would die(). To prevent this the require() statement can be enclosed into an eval() block, as in this example:
require.pl
----------
#!/usr/bin/perl -w
eval { require "/file/that/does/not/exists"};
if ($@) {
print "Failed to load, because : $@"
}
print "\nHello\n";
When we execute the program:
% ./require.pl
Failed to load, because : Can't locate /file/that/does/not/exists in
@INC (@INC contains: /usr/lib/perl5/5.00503/i386-linux
/usr/lib/perl5/5.00503 /usr/lib/perl5/site_perl/5.005/i386-linux
/usr/lib/perl5/site_perl/5.005 .) at require.pl line 3.
Hello
We see that the program didn't die(), because Hello was printed. This trick is useful when you want to check whether a user has some module installed, but if she hasn't it's not critical, perhaps the program can run without this module with reduced functionality.
If we remove the eval() part and try again:
require.pl
----------
#!/usr/bin/perl -w
require "/file/that/does/not/exists";
print "\nHello\n";
% ./require1.pl
Can't locate /file/that/does/not/exists in @INC (@INC contains:
/usr/lib/perl5/5.00503/i386-linux /usr/lib/perl5/5.00503
/usr/lib/perl5/site_perl/5.005/i386-linux
/usr/lib/perl5/site_perl/5.005 .) at require1.pl line 3.
The program just die()s in the last example, which is what you want in most cases.
For more information refer to the perlfunc manpage.
use()
use(), just like require(), loads and compiles files containing Perl code, but it works with modules only. The only way to pass a module to load is by its module name and not its filename. If the module is located in MyCode.pm, the correct way to use() it is:
use MyCode
and not:
use "MyCode.pm"
use() translates the passed argument into a file name replacing ::
with /
and appending .pm at the end. So My::Module
becomes My/Module.pm.
use() is exactly equivalent to:
BEGIN { require Module; import Module LIST; }
Internally it calls require() to do the loading and compilation chores. When require() finishes its job, import() is called unless ()
is the second argument. The following pairs are equivalent:
use MyModule;
BEGIN {require MyModule; import MyModule; }
use MyModule qw(foo bar);
BEGIN {require MyModule; import MyModule ("foo","bar"); }
use MyModule ();
BEGIN {require MyModule; }
The first pair exports the default tags. This happens if the module sets @EXPORT
to a list of tags to be exported by default. The module manpage generally describes what modules are exported by default.
The second pair exports all the tags passed as arguments. No default tags are exported unless explicitly told to.
The third pair describes the case where the caller does not want any symbols to be imported.
import()
is not a builtin function, it's just an ordinary static method call into the "MyModule
" package to tell the module to import the list of features back into the current package. See the Exporter manpage for more information.
When you write your own modules, always remember that it's better to use @EXPORT_OK
instead of @EXPORT
, since the former doesn't export symbols unless it was asked to. Exports pollute the namespace of the module user. Also avoid short or common symbol names to reduce the risk of name clashes.
When functions and variables aren't exported you can still access them using their full names, like $My::Module::bar
or $My::Module::foo()
. By convention you can use a leading underscore on names to informally indicate that they are internal and not for public use.
There's a corresponding "no
" command that un-imports symbols imported by use
, i.e., it calls unimport Module LIST
instead of import()
.
do()
While do() behaves almost identically to require(), it reloads the file unconditionally. It doesn't check %INC
to see whether the file was already loaded.
If do() cannot read the file, it returns undef
and sets $!
to report the error. If do() can read the file but cannot compile it, it returns undef
and sets an error message in $@
. If the file is successfully compiled, do() returns the value of the last expression evaluated.
Using Global Variables and Sharing Them Between Modules/Packages
Making Variables Global
When you first wrote $x in your code you created a global variable. It is visible everywhere in the file you have used it. If you defined it inside a package, it is visible inside the package. But it will work only if you do not use strict
pragma and you HAVE to use this pragma if you want to run your scripts under mod_perl. Read The strict pragma to find out why.
Making Variables Global With strict Pragma On
First you use :
use strict;
Then you use:
use vars qw($scalar %hash @array);
Starting from this moment the variables are global only in the package where you defined them. If you want to share global variables between packages, here is what you can do.
Using Exporter.pm to Share Global Variables
Assume that you want to share the CGI.pm
object (I will use $q
) between your modules. For example, you create it in script.pl
, but you want it to be visible in My::HTML
. First, you make $q
global.
script.pl:
----------------
use vars qw($q);
use CGI;
use lib qw(.);
use My::HTML qw($q); # My/HTML.pm is in the same dir as script.pl
$q = new CGI;
My::HTML::printmyheader();
----------------
Note that we have imported $q
from My::HTML
. And My::HTML
does the export of $q
:
My/HTML.pm
----------------
package My::HTML;
use strict;
BEGIN {
use Exporter ();
@My::HTML::ISA = qw(Exporter);
@My::HTML::EXPORT = qw();
@My::HTML::EXPORT_OK = qw($q);
}
use vars qw($q);
sub printmyheader{
# Whatever you want to do with $q... e.g.
print $q->header();
}
1;
-------------------
So the $q
is shared between the My::HTML
package and script.pl
. It will work vice versa as well, if you create the object in My::HTML
but use it in script.pl
. You have true sharing, since if you change $q
in script.pl
, it will be changed in My::HTML
as well.
What if you need to share $q
between more than two packages? For example you want My::Doc to share $q
as well.
You leave My::HTML
untouched, and modify script.pl to include:
use My::Doc qw($q);
Then you write My::Doc
exactly like My::HTML
- except of course that the content is different :).
One possible pitfall is when you want to use My::Doc
in both My::HTML
and script.pl. Only if you add
use My::Doc qw($q);
into My::HTML
will $q
be shared. Otherwise My::Doc
will not share $q
any more. To make things clear here is the code:
script.pl:
----------------
use vars qw($q);
use CGI;
use lib qw(.);
use My::HTML qw($q); # My/HTML.pm is in the same dir as script.pl
use My::Doc qw($q); # Ditto
$q = new CGI;
My::HTML::printmyheader();
----------------
My/HTML.pm
----------------
package My::HTML;
use strict;
BEGIN {
use Exporter ();
@My::HTML::ISA = qw(Exporter);
@My::HTML::EXPORT = qw();
@My::HTML::EXPORT_OK = qw($q);
}
use vars qw($q);
use My::Doc qw($q);
sub printmyheader{
# Whatever you want to do with $q... e.g.
print $q->header();
My::Doc::printtitle('Guide');
}
1;
-------------------
My/Doc.pm
----------------
package My::Doc;
use strict;
BEGIN {
use Exporter ();
@My::Doc::ISA = qw(Exporter);
@My::Doc::EXPORT = qw();
@My::Doc::EXPORT_OK = qw($q);
}
use vars qw($q);
sub printtitle{
my $title = shift || 'None';
print $q->h1($title);
}
1;
-------------------
Using the Perl Aliasing Feature to Share Global Variables
As the title says you can import a variable into a script or module without using Exporter.pm
. I have found it useful to keep all the configuration variables in one module My::Config
. But then I have to export all the variables in order to use them in other modules, which is bad for two reasons: polluting other packages' name spaces with extra tags which increase the memory requirements; and adding the overhead of keeping track of what variables should be exported from the configuration module and what imported, for some particular package. I solve this problem by keeping all the variables in one hash %c
and exporting that. Here is an example of My::Config
:
package My::Config;
use strict;
use vars qw(%c);
%c = (
# All the configs go here
scalar_var => 5,
array_var => [
foo,
bar,
],
hash_var => {
foo => 'Foo',
bar => 'BARRR',
},
);
1;
Now in packages that want to use the configuration variables I have either to use the fully qualified names like $My::Config::test
, which I dislike or import them as described in the previous section. But hey, since we have only one variable to handle, we can make things even simpler and save the loading of the Exporter.pm
package. We will use the Perl aliasing feature for exporting and saving the keystrokes:
package My::HTML;
use strict;
use lib qw(.);
# Global Configuration now aliased to global %c
use My::Config (); # My/Config.pm in the same dir as script.pl
use vars qw(%c);
*c = \%My::Config::c;
# Now you can access the variables from the My::Config
print $c{scalar_val};
print $c{array_val}[0];
print $c{hash_val}{foo};
Of course $c is global everywhere you use it as described above, and if you change it somewhere it will affect any other packages you have aliased $My::Config::c
to.
Note that aliases work either with global or local()
vars - you cannot write:
my *c = \%My::Config::c;
Which is an error. But you can write:
local *c = \%My::Config::c;
For more information about aliasing, refer to the Camel book, second edition, pages 51-52.
The Scope of the Special Perl Variables
Special Perl variables like $|
(buffering), $^T
(time), $^W
(warnings), $/
(input record separator), $\
(output record separator) and many more are all global variables. This means that you cannot scope them with my(). Only local() is permitted to do that. Since the child server doesn't usually exit, if in one of your scripts you modify a global variable it will be changed for the rest of the process' life and will affect all the scripts executed by the same process.
We will demonstrate the case on the input record separator variable. If you undefine this variable, a diamond operator will suck in the whole file at once if you have enough memory. Remembering this you should never write code like the example below.
$/ = undef;
open IN, "file" ....
# slurp it all into a variable
$all_the_file = <IN>;
The proper way is to have a local() keyword before the special variable is changed, like this:
local $/ = undef;
open IN, "file" ....
# slurp it all inside a variable
$all_the_file = <IN>;
But there is a catch. local() will propagate the changed value to any of the code below it. The modified value will be in effect until the script terminates, unless it is changed again somewhere else in the script.
A cleaner approach is to enclose the whole of the code that is affected by the modified variable in a block, like this:
{
local $/ = undef;
open IN, "file" ....
# slurp it all inside a variable
$all_the_file = <IN>;
}
That way when Perl leaves the block it restores the original value of the $/
variable, and you don't need to worry elsewhere in your program about its value being changed here.
Compiled Regular Expressions
When using a regular expression that contains an interpolated Perl variable, if it is known that the variable (or variables) will not change during the execution of the program, a standard optimization technique is to add the /o
modifier to the regexp pattern. This directs the compiler to build the internal table once, for the entire lifetime of the script, rather than every time the pattern is executed. Consider:
my $pat = '^foo$'; # likely to be input from an HTML form field
foreach( @list ) {
print if /$pat/o;
}
This is usually a big win in loops over lists, or when using grep()
or map()
operators.
In long-lived mod_perl scripts, however, the variable can change according to the invocation and this can pose a problem. The first invocation of a fresh httpd child will compile the regex and perform the search correctly. However, all subsequent uses by that child will continue to match the original pattern, regardless of the current contents of the Perl variables the pattern is supposed to depend on. Your script will appear to be broken.
There are two solutions to this problem:
The first is to use eval q//
, to force the code to be evaluated each time. Just make sure that the eval block covers the entire loop of processing, and not just the pattern match itself.
The above code fragment would be rewritten as:
my $pat = '^foo$';
eval q{
foreach( @list ) {
print if /$pat/o;
}
}
Just saying:
foreach( @list ) {
eval q{ print if /$pat/o; };
}
is going to be a horribly expensive proposition.
You can use this approach if you require more than one pattern match operator in a given section of code. If the section contains only one operator (be it an m//
or s///
), you can rely on the property of the null pattern, that reuses the last pattern seen. This leads to the second solution, which also eliminates the use of eval.
The above code fragment becomes:
my $pat = '^foo$';
"something" =~ /$pat/; # dummy match (MUST NOT FAIL!)
foreach( @list ) {
print if //;
}
The only gotcha is that the dummy match that boots the regular expression engine must absolutely, positively succeed, otherwise the pattern will not be cached, and the //
will match everything. If you can't count on fixed text to ensure the match succeeds, you have two possibilities.
If you can guarantee that the pattern variable contains no meta-characters (things like *, +, ^, $...), you can use the dummy match:
"$pat" =~ /\Q$pat\E/; # guaranteed if no meta-characters present
If there is a possibility that the pattern can contain meta-characters, you should search for the pattern or the non-searchable \377 character as follows:
"\377" =~ /$pat|^\377$/; # guaranteed if meta-characters present
Another approach:
It depends on the complexity of the regexp to which you apply this technique. One common usage where a compiled regexp is usually more efficient is to "match any one of a group of patterns" over and over again.
Maybe with a helper routine, it's easier to remember. Here is one slightly modified from Jeffery Friedl's example in his book "Mastering Regex".
#####################################################
# Build_MatchMany_Function
# -- Input: list of patterns
# -- Output: A code ref which matches its $_[0]
# against ANY of the patterns given in the
# "Input", efficiently.
#
sub Build_MatchMany_Function {
my @R = @_;
my $expr = join '||', map { "\$_[0] =~ m/\$R[$_]/o" } ( 0..$#R );
my $matchsub = eval "sub { $expr }";
die "Failed in building regex @R: $@" if $@;
$matchsub;
}
Example usage:
@some_browsers = qw(Mozilla Lynx MSIE AmigaVoyager lwp libwww);
$Known_Browser=Build_MatchMany_Function(@some_browsers);
while (<ACCESS_LOG>) {
# ...
$browser = get_browser_field($_);
if ( ! &$Known_Browser($browser) ) {
print STDERR "Unknown Browser: $browser\n";
}
# ...
}
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 238:
alternative text 'Using global variables and sharing them between modules/packages' contains non-escaped | or /