NAME
Regexp::Common::Apache2 - Apache2 Expressions
SYNOPSIS
use Regexp::Common qw( Apache2 );
use Regexp::Common::Apache2 qw( $ap_true $ap_false );
while( <> )
{
my $pos = pos( $_ );
/\G$RE{Apache2}{Word}/gmc and print "Found a word expression at pos $pos\n";
/\G$RE{Apache2}{Variable}/gmc and print "Found a variable $+{varname} at pos $pos\n";
}
# Override Apache2 expressions by the legacy ones
$RE{Apache2}{-legacy => 1}
# or use it with the Legacy prefix:
if( $str =~ /^$RE{Apache2}{LegacyVariable}$/ )
{
print( "Found variable $+{variable} with name $+{varname}\n" );
}
VERSION
v0.2.1
DESCRIPTION
This is the perl port of Apache2 expressions
The regular expressions have been designed based on Apache2 Backus-Naur Form (BNF) definition as described below in "APACHE2 EXPRESSION"
You can also use the extended pattern by calling Regexp::Common::Apache2 like:
$RE{Apache2}{-legacy => 1}
All of the regular expressions use named capture. See "%+" in perlvar for more information on named capture.
APACHE2 EXPRESSION
comp
BNF:
stringcomp
| integercomp
| unaryop word
| word binaryop word
| word "in" listfunc
| word "=~" regex
| word "!~" regex
| word "in" "{" words "}"
$RE{Apache2}{Comp}
For example:
"Jack" != "John"
123 -ne 456
# etc
This uses other expressions namely "stringcomp", "integercomp", "word", "listfunc", "regex", "words"
The capture names are:
- comp
-
Contains the entire capture block
- comp_binary
-
Matches the expression that uses a binary operator, such as:
==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch
- comp_binaryop
-
The binary op used if the expression is a binary comparison. Binary operator is:
==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch
- comp_integercomp
-
When the comparison is for an integer comparison as opposed to a string comparison.
- comp_list
-
Contains the list used to check a word against, such as:
"Jack" in {"John", "Peter", "Jack"}
- comp_listfunc
-
This contains the listfunc when the expressions contains a word checked against a list function, such as:
"Jack" in listMe("some arguments")
- comp_regexp
-
The regular expression used when a word is compared to a regular expression, such as:
"Jack" =~ /\w+/
Here, comp_regexp would contain
/\w+/
- comp_regexp_op
-
The regular expression operator used when a word is compared to a regular expression, such as:
"Jack" =~ /\w+/
Here, comp_regexp_op would contain
=~
- comp_stringcomp
-
When the comparison is for a string comparison as opposed to an integer comparison.
- comp_unary
-
Matches the expression that uses unary operator, such as:
-d, -e, -f, -s, -L, -h, -F, -U, -A, -n, -z, -T, -R
For example:
-A /some/uri.html # (same as -U) -d /some/folder # file is a directory -e /some/folder/file.txt # file exists -f /some/folder/file.txt # file is a regular file -F /some/folder/file.txt # file is a regular file and is accessible to all (Apache2 does a sub query to check) -h /some/folder/link.txt # true if file is a symbolic link -n %{QUERY_STRING} # true if string is not empty (opposite of -z) -s /some/folder/file.txt # true if file is not empty -L /some/folder/link.txt # true if file is a symbolic link (same as -h) -R 192.168.1.1/24 # remote ip match this ip block; same as %{REMOTE_ADDR} -ipmatch 192.168.1.1/24 -T %{HTTPS} # false if string is empty, "0", "off", "false", or "no" (case insensitive). True otherwise. -U /some/uri.html # check if the uri is accessible to all (Apache2 does a sub query to check) -z %{QUERY_STRING} # true if string is empty (opposite of -n)
- comp_word
-
Contains the word that is the object of the comparison.
- comp_word_in_list
-
Contains the expression of a word checked against a list, such as:
"Jack" in {"John", "Peter", "Jack"}
- comp_word_in_listfunc
-
Contains the word when it is being compared to a listfunc, such as:
"Jack" in listMe("some arguments")
- comp_word_in_regexp
-
Contains the expression of a word checked against a regular expression, such as:
"Jack" =~ /\w+/
Here the word
Jack
(without the parenthesis) would be captured in comp_word - comp_worda
-
Contains the first word in comparison expression
- comp_wordb
-
Contains the second word in comparison expression
cond
BNF:
"true"
| "false"
| "!" cond
| cond "&&" cond
| cond "||" cond
| comp
| "(" cond ")"
$RE{Apache2}{Cond}
For example:
use Regexp::Common::Apache qw( $ap_true $ap_false );
($ap_false && $ap_true)
The capture names are:
- cond
-
Contains the entire capture block
- cond_and
-
Contains the expression like:
($ap_true && $ap_true)
- cond_and_expr1
-
The first expression in a ANDed condition, such as :
$ap_true && $ap_false
- cond_and_expr2
-
The second expression in a ANDed condition, such as :
$ap_true && $ap_false
- cond_comp
-
Contains the comparison expression. See "comp" above.
- cond_expr
-
Expression that is capture after following a negatiion, such as :
!-e /some/folder/file.txt
Here cond_expr would contain
-e /some/folder/file.txt
- cond_false
-
Contains the false expression like:
($ap_false) false # as a litteral word 0 # 0 as a standalone number not surrounded by any number or letter
- cond_neg
-
Contains the expression if it is preceded by an exclamation mark, such as:
!$ap_true
- cond_or
-
Contains the expression like:
($ap_true || $ap_true)
- cond_or_expr1
-
The first expression in a ORed condition, such as :
$ap_true && $ap_false
- cond_or_expr2
-
The second expression in a ORed condition, such as :
$ap_true && $ap_false
- cond_parenthesis
-
Contains the condition when it is embedded within parenthesis.
- cond_true
-
Contains the true expression like:
($ap_true) true # as a litteral word 1 # 1 as a standalone number not surrounded by any number or letter
expr
BNF: cond | string
$RE{Apache2}{Expr}
The capture names are:
- expr
-
Contains the entire capture block
- expr_cond
-
Contains the expression of the condition
- expr_string
-
Contains the expression of a string
function
BNF: funcname "(" words ")"
$RE{Apache2}{Function}
For example:
base64("Some string")
someFunc()
md5( "one arg" )
otherFunc( %{some_var}, "quoted", split( /\w+/, "John Paul" ) )
The capture names are:
- function
-
Contains the entire capture block
- func_args
-
Contains the list of arguments. In the example above, this would be
Some string
- func_name
-
The name of the function . In the example above, this would be
base64
integercomp
BNF:
word "-eq" word | word "eq" word
| word "-ne" word | word "ne" word
| word "-lt" word | word "lt" word
| word "-le" word | word "le" word
| word "-gt" word | word "gt" word
| word "-ge" word | word "ge" word
$RE{Apache2}{IntegerComp}
For example:
123 -ne 456
789 gt 234
# etc
The hyphen before the operator is optional, so you can say eq
instead of -eq
The capture names are:
- stringcomp
-
Contains the entire capture block
- integercomp_op
-
Contains the comparison operator
- integercomp_worda
-
Contains the first word in the string comparison
- integercomp_wordb
-
Contains the second word in the string comparison
listfunc
BNF: listfuncname "(" words ")"
$RE{Apache2}{ListFunc}
For example:
base64("Some string")
someFunc()
md5( "one arg" )
otherFunc( %{some_var}, "quoted", split( /\w+/, "John Paul" ) )
This is quite similar to the "function" regular expression
The capture names are:
- listfunc
-
Contains the entire capture block
- func_args
-
Contains the list of arguments. In the example above, this would be
Some string
- func_name
-
The name of the function . In the example above, this would be
base64
regex
BNF:
"/" regpattern "/" [regflags]
| "m" regsep regpattern regsep [regflags]
$RE{Apache2}{Regexp}
For example:
/\w+/i
# or
m,\w+,i
The capture names are:
- regex
-
Contains the entire capture block
- regflags
-
The regular expression modifiers. See perlre
This can be any combination of:
i, s, m, g
- regpattern
-
Contains the regular expression. See perlre for example and explanation of how to use regular expression. Apache2 uses PCRE, i.e. perl compliant regular expressions.
- regsep
-
Contains the regular expression separator, which can be any of:
/, #, $, %, ^, |, ?, !, ', ", ",", ;, :, ".", _, -
string
BNF: substring | string substring
$RE{Apache2}{String}
For example:
URI accessed is: %{REQUEST_URI}
The capture names are:
- string
-
Contains the entire capture block
stringcomp
BNF:
word "==" word
| word "!=" word
| word "<" word
| word "<=" word
| word ">" word
| word ">=" word
$RE{Apache2}{StringComp}
For example:
"John" == "Jack"
sub(s/\w+/Jack/i, "John") != "Jack"
# etc
The capture names are:
- stringcomp
-
Contains the entire capture block
- stringcomp_op
-
Contains the comparison operator
- stringcomp_worda
-
Contains the first word in the string comparison
- stringcomp_wordb
-
Contains the second word in the string comparison
substring
BNF: cstring | variable | rebackref
$RE{Apache2}{Substring}
For example:
Jack
# or
%{REQUEST_URI}
See "variable" and "word" regular expression for more on those.
The capture names are:
- rebackref
-
Contains a regular expression back reference such as
$1
,$2
, etc up to$9
- substring
-
Contains the entire capture block
variable
BNF:
"%{" varname "}"
| "%{" funcname ":" funcargs "}"
| "v(" varname ")"
$RE{Apache2}{Variable}
# or to enable legacy variable:
$RE{Apache2}{LegacyVariable}
For example:
%{REQUEST_URI}
# or
%{md5:"some string"}
# or
v(REQUEST_URI)
# legacy variable allows extended variable. See LEGACY APACHE2 EXPRESSION below
See "word" and "cond" regular expression for more on those.
The capture names are:
- variable
-
Contains the entire capture block
- var_func
-
Contains the text for the function and its arguments if this is a function.
- var_func_args
-
Contains the function arguments.
- var_func_name
-
Contains the function name.
- varname
-
Contains the variable name without the percent sign or dollar sign (if legacy regular expression is enabled) or the possible surrounding accolades
word
BNF:
digits
| "'" string "'"
| '"' string '"'
| word "." word
| variable
| function
| "(" word ")"
| rebackref
$RE{Apache2}{Word}
This is the most complex regular expression used, since it uses all the others and can recurse deeply
For example:
12
# or
"John"
# or
'Jack'
# or
%{REQUEST_URI}
# or
%{HTTP_HOST}.%{HTTP_PORT}
# or
md5("some string")
# or any word surrounded by parenthesis, such as:
("John")
See "string", "word", "variable", "function" regular expression for more on those.
The capture names are:
- rebackref
-
Contains a regular expression back reference such as
$1
,$2
, etc up to$9
- word
-
Contains the entire capture block
- word_digits
-
If the word is actually digits, ths contains those digits.
- word_dot_word
-
This contains the text when two words are separated by a dot.
- word_enclosed
-
Contains the value of the word enclosed by single or double quotes or by surrounding parenthesis.
- word_function
-
Contains the word containing a "function"
- word_quote
-
If the word is enclosed by single or double quote, this contains the single or double quote character
- word_variable
-
Contains the word containing a "variable"
words
BNF:
word
| word "," word
$RE{Apache2}{Words}
For example:
"Jack"
# or
"John", "Peter", "Paul"
See "word" and "list" regular expression for more on those.
The capture names are:
- words
-
Contains the entire capture block
- words_word
-
Contains the word
- words_list
-
Contains the list
ADVANCED APACHE2 EXPRESSION
comp
BNF:
stringcomp
| integercomp
| unaryop word
| word binaryop word
| word "in" listfunc
| word "=~" regex
| word "!~" regex
| word "in" "{" list "}"
$RE{Apache2}{TrunkComp}
For example:
"Jack" != "John"
123 -ne 456
# etc
This uses other expressions namely "stringcomp", "integercomp", "word", "listfunc", "regex", "list"
This is similar to the regular "comp" in "APACHE2 EXPRESSION", except it uses "list" instead of "words"
The capture names are:
- comp
-
Contains the entire capture block
- comp_binary
-
Matches the expression that uses a binary operator, such as:
==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch
- comp_binaryop
-
The binary op used if the expression is a binary comparison. Binary operator is:
==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch
- comp_integercomp
-
When the comparison is for an integer comparison as opposed to a string comparison.
- comp_list
-
Contains the list used to check a word against, such as:
"Jack" in {"John", "Peter", "Jack"}
- comp_listfunc
-
This contains the listfunc when the expressions contains a word checked against a list function, such as:
"Jack" in listMe("some arguments")
- comp_regexp
-
The regular expression used when a word is compared to a regular expression, such as:
"Jack" =~ /\w+/
Here, comp_regexp would contain
/\w+/
- comp_regexp_op
-
The regular expression operator used when a word is compared to a regular expression, such as:
"Jack" =~ /\w+/
Here, comp_regexp_op would contain
=~
- comp_stringcomp
-
When the comparison is for a string comparison as opposed to an integer comparison.
- comp_unary
-
Matches the expression that uses unary operator, such as:
-d, -e, -f, -s, -L, -h, -F, -U, -A, -n, -z, -T, -R
- comp_word
-
Contains the word that is the object of the comparison.
- comp_word_in_list
-
Contains the expression of a word checked against a list, such as:
"Jack" in {"John", "Peter", "Jack"}
- comp_word_in_listfunc
-
Contains the word when it is being compared to a listfunc, such as:
"Jack" in listMe("some arguments")
- comp_word_in_regexp
-
Contains the expression of a word checked against a regular expression, such as:
"Jack" =~ /\w+/
Here the word
Jack
(without the parenthesis) would be captured in comp_word - comp_worda
-
Contains the first word in comparison expression
- comp_wordb
-
Contains the second word in comparison expression
cond
BNF:
"true"
| "false"
| "!" cond
| cond "&&" cond
| cond "||" cond
| comp
| "(" cond ")"
$RE{Apache2}{TrunkCond}
Same as "cond" in "APACHE2 EXPRESSION"
expr
BNF: cond | string
$RE{Apache2}{TrunkExpr}
Same as "cond" in "APACHE2 EXPRESSION"
function
BNF: funcname "(" words ")"
$RE{Apache2}{TrunkFunction}
Same as "cond" in "APACHE2 EXPRESSION"
integercomp
BNF:
word "-eq" word | word "eq" word
| word "-ne" word | word "ne" word
| word "-lt" word | word "lt" word
| word "-le" word | word "le" word
| word "-gt" word | word "gt" word
| word "-ge" word | word "ge" word
$RE{Apache2}{TrunkIntegerComp}
Same as "cond" in "APACHE2 EXPRESSION"
join
BNF:
"join" ["("] list [")"]
| "join" ["("] list "," word [")"]
$RE{Apache2}{TrunkJoin}
For example:
join({"word1" "word2"})
# or
join({"word1" "word2"}, ', ')
The capture names are:
- join
-
Contains the entire capture block
- join_list
-
Contains the value of the list
- join_word
-
Contains the value for word used to join the list
list
BNF:
split
| listfunc
| "{" words "}"
| "(" list ")
$RE{Apache2}{TrunkList}
For example:
split( /\w+/, "Some string" )
# or
{"some", "words"}
# or
(split( /\w+/, "Some string" ))
# or
( {"some", "words"} )
This uses "split", "listfunc", words and "list"
The capture names are:
- list
-
Contains the entire capture block
- list_func
-
Contains the value if a "listfunc" is used
- list_list
-
Contains the value if this is a list embedded within parenthesis
- list_split
-
Contains the value if the list is based on a split
- list_words
-
Contains the value for a list of words.
listfunc
BNF: listfuncname "(" words ")"
$RE{Apache2}{TrunkListFunc}
Same as "cond" in "APACHE2 EXPRESSION"
regany
BNF: regex | regsub
$RE{Apache2}{TrunkRegany}
For example:
/\w+/i
# or
m,\w+,i
This regular expression includes "regany" and "regsub"
The capture names are:
- regany
-
Contains the entire capture block
- regany_regex
-
Contains the regular expression. See "regex"
- regany_regsub
-
Contains the substitution regular expression. See "regsub"
regex
BNF:
"/" regpattern "/" [regflags]
| "m" regsep regpattern regsep [regflags]
$RE{Apache2}{TrunkRegexp}
Same as "cond" in "APACHE2 EXPRESSION"
regsub
BNF: "s" regsep regpattern regsep string regsep [regflags]
$RE{Apache2}{TrunkRegsub}
For example:
s/\w+/John/gi
The capture names are:
- regflags
-
The modifiers used which can be any combination of:
i, s, m, g
See perlre for an explanation of their usage and meaning
- regstring
-
The string replacing the text found by the regular expression
- regsub
-
Contains the entire capture block
- regpattern
-
Contains the regular expression which is perl compliant since Apache2 uses PCRE.
- regsep
-
Contains the regular expression separator, which can be any of:
/, #, $, %, ^, |, ?, !, ', ", ",", ;, :, ".", _, -
split
BNF:
"split" ["("] regany "," list [")"]
| "split" ["("] regany "," word [")"]
$RE{Apache2}{TrunkSplit}
For example:
split( /\w+/, "Some string" )
This uses "regany", "list" and "word"
The capture names are:
- split
-
Contains the entire capture block
- split_regex
-
Contains the regular expression used for the split
- split_list
-
The list being split. It can also be a word. See below
- split_word
-
The word being split. It can also be a list. See above
string
BNF: substring | string substring
$RE{Apache2}{TrunkString}
Same as "cond" in "APACHE2 EXPRESSION"
stringcomp
BNF:
word "==" word
| word "!=" word
| word "<" word
| word "<=" word
| word ">" word
| word ">=" word
$RE{Apache2}{TrunkStringComp}
Same as "cond" in "APACHE2 EXPRESSION"
sub
BNF: "sub" ["("] regsub "," word [")"]
$RE{Apache2}{TrunkSub}
For example:
sub(s/\w/John/gi,"Peter")
The capture names are:
- sub
-
Contains the entire capture block
- sub_regsub
-
Contains the substitution expression, i.e. in the example above, this would be:
s/\w/John/gi
- sub_word
-
The target for the substitution. In the example above, this would be "Peter"
substring
BNF: cstring | variable
$RE{Apache2}{TrunkSubstring}
For example:
Jack
# or
%{REQUEST_URI}
# or
%{:sub(s/\b\w+\b/Peter/, "John"):}
See "variable" and "word" regular expression for more on those.
This is different from "substring" in "APACHE2 EXPRESSION" in that it does not include regular expression back reference, i.e. $1
, $2
, etc.
The capture names are:
- substring
-
Contains the entire capture block
variable
BNF:
"%{" varname "}"
| "%{" funcname ":" funcargs "}"
| "v('" varname "')"
| "%{:" word ":}"
| "%{:" cond ":}"
| rebackref
$RE{Apache2}{TrunkVariable}
For example:
%{REQUEST_URI}
# or
%{md5:"some string"}
# or
%{:sub(s/\b\w+\b/Peter/, "John"):}
# or a reference to previous regular expression capture groups
$1, $2, etc..
See "word" and "cond" regular expression for more on those.
The capture names are:
- rebackref
-
Contains the regular expression back reference such as
$1
,$2
, etcBut without the leading dollar sign nor the enclosing accolade, if any, thus in the example of
$1
or${1}
rebackref would be1
- variable
-
Contains the entire capture block
- var_backref
-
Contains the regular expression back reference such as
$1
,$2
, etcThis includes the leadaing dollar sign and any enclosing accolade, if any, such as
${1}
- var_cond
-
If this is a condition inside a variable, such as:
%{:$ap_true == $ap_false}
- var_func
-
Contains the text for the function and its arguments if this is a function.
- var_func_args
-
Contains the function arguments.
- var_func_name
-
Contains the function name.
- var_word
-
A variable containing a word. See "word" for more information about word expressions.
- varname
-
Contains the variable name without the percent sign or dollar sign (if legacy regular expression is enabled) or the possible surrounding accolades
word
BNF:
digits
| "'" string "'"
| '"' string '"'
| word "." word
| variable
| sub
| join
| function
| "(" word ")"
$RE{Apache2}{TrunkWord}
This is the most complex regular expression used, since it uses all the others and can recurse deeply
For example:
12
# or
"John"
# or
'Jack'
# or
%{REQUEST_URI}
# or
%{HTTP_HOST}.%{HTTP_PORT}
# or
%{:sub(s/\b\w+\b/Peter/, "John"):}
# or
sub(s,\w+,Paul,gi, "John")
# or
join({"Paul", "Peter"}, ', ')
# or
md5("some string")
# or any word surrounded by parenthesis, such as:
("John")
See "string", "word", "variable", "sub", "join", "function" regular expression for more on those.
The capture names are:
- word
-
Contains the entire capture block
- word_digits
-
If the word is actually digits, thise contains those digits.
- word_dot_word
-
This contains the text when two words are separated by a dot.
- word_enclosed
-
Contains the value of the word enclosed by single or double quotes or by surrounding parenthesis.
- word_function
-
Contains the word containing a "function"
- word_join
-
Contains the word containing a "join"
- word_quote
-
If the word is enclosed by single or double quote, this contains the single or double quote character
- word_sub
-
If the word is a substitution, this contains tha substitution
- word_variable
-
Contains the word containing a "variable"
words
BNF:
word
| word "," list
$RE{Apache2}{TrunkWords}
For example:
"Jack"
# or
"John", {"Peter", "Paul"}
# or
sub(s/\b\w+\b/Peter/, "John"), {"Peter", "Paul"}
See "word" and "list" regular expression for more on those.
It is different from "words" in "APACHE2 EXPRESSION" in that it uses "list" instead of "word"
The capture names are:
- words
-
Contains the entire capture block
- words_word
-
Contains the word
- words_list
-
Contains the list
LEGACY
When using legacy mode, the regular expressions are more laxed in what they accept around 3 types of expressions:
- 1. comp
-
Same as "comp", and it extends it by adding support for legacy regular expression, i.e. without using the tilde (
~
). For example :$HTTP_COOKIES = /lang\%22\%3A\%22([a-zA-Z]+\-[a-zA-Z]+)\%22\%7D;?/
In current version of Apache2 expression this would rather be writen as:
%{HTTP_COOKIES} =~ /lang\%22\%3A\%22([a-zA-Z]+\-[a-zA-Z]+)\%22\%7D;?/
Both are supported in legacy expressions.
The additional capture groups available are:
- 2. cond
-
It is the same as "cond", except it also accepts a vanilla variable as valid condition, such as:
$REQUEST_URI
, so that expression as below would work :!$REQUEST_URI
It adds the following capture groups:
- cond_variable
-
Contains the variable used in the condition, including the leading dollar or percent sign and possible surrounding accolades.
- 3. variable
-
Same as "variable", but is extended to accept vanilla variable such as
$REQUEST_URI
. In current Apache2 expressions, a variable is anoted by using percent sign and potentially surounding it with accolades. For example :%{REQUEST_URI}
Also legacy variable includes regular expression back reference such as
$1
,$2
, etc.Its capture groups names are:
- variable
-
Contains the entire variable.
- varname
-
Contains the variable name without dollar or percent sign no possible surrounding accolades.
- var_backref
-
The regular expression back reference including the dollar sign and possible surrounding accolades. For example:
$1
or${1}
- rebackref
-
The regular expression back reference excluding the dollar sign and possible surrounding accolades. For example:
$1
or${1}
would mean rebackref would contain1
- var_func_name
-
The variable-embedded function name
- var_func_args
-
The variable-embedded function arguments
- 4. word
-
word is extended to also accept a regular expression back refernece such as
$1
,$2
, etc.
CAVEAT
Functions need to have their arguments enclosed in parenthesis. For example:
%{REMOTE_ADDR} -in split s/.*?IP Address:([^,]+)/$1/, PeerExtList('subjectAltName')
will not work, but the following will:
%{REMOTE_ADDR} -in split(s/.*?IP Address:([^,]+)/$1/, PeerExtList('subjectAltName'))
Maybe this will be adjusted in future versions.
CHANGES & CONTRIBUTIONS
Feel free to reach out to the author for possible corrections, improvements, or suggestions.
AUTHOR
Jacques Deguest <jack@deguest.jp>
SEE ALSO
https://httpd.apache.org/docs/current/expr.html and https://httpd.apache.org/docs/trunk/en/expr.html
COPYRIGHT & LICENSE
Copyright (c) 2020 DEGUEST Pte. Ltd.
You can use, copy, modify and redistribute this package and associated files under the same terms as Perl itself.