NAME

Regexp::Common::Apache2 - Apache2 Expressions

SYNOPSIS

use Regexp::Common qw( Apache2 );
use Regexp::Common::Apache2 qw( $ap_true $ap_false );

while( <> )
{
    my $pos = pos( $_ );
    /\G$RE{Apache2}{Word}/gmc      and  print "Found a word expression at pos $pos\n";
    /\G$RE{Apache2}{Variable}/gmc  and  print "Found a variable $+{varname} at pos $pos\n";
}

# Override Apache2 expressions by the legacy ones
$RE{Apache2}{-legacy => 1}
# or use it with the Legacy prefix:
if( $str =~ /^$RE{Apache2}{LegacyVariable}$/ )
{
    print( "Found variable $+{variable} with name $+{varname}\n" );
}

VERSION

v0.2.1

DESCRIPTION

This is the perl port of Apache2 expressions

The regular expressions have been designed based on Apache2 Backus-Naur Form (BNF) definition as described below in "APACHE2 EXPRESSION"

You can also use the extended pattern by calling Regexp::Common::Apache2 like:

$RE{Apache2}{-legacy => 1}

All of the regular expressions use named capture. See "%+" in perlvar for more information on named capture.

APACHE2 EXPRESSION

comp

BNF:

stringcomp
| integercomp
| unaryop word
| word binaryop word
| word "in" listfunc
| word "=~" regex
| word "!~" regex
| word "in" "{" words "}"

$RE{Apache2}{Comp}

For example:

"Jack" != "John"
123 -ne 456
# etc

This uses other expressions namely "stringcomp", "integercomp", "word", "listfunc", "regex", "words"

The capture names are:

comp

Contains the entire capture block

comp_binary

Matches the expression that uses a binary operator, such as:

==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch
comp_binaryop

The binary op used if the expression is a binary comparison. Binary operator is:

==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch
comp_integercomp

When the comparison is for an integer comparison as opposed to a string comparison.

comp_list

Contains the list used to check a word against, such as:

"Jack" in {"John", "Peter", "Jack"}
comp_listfunc

This contains the listfunc when the expressions contains a word checked against a list function, such as:

"Jack" in listMe("some arguments")
comp_regexp

The regular expression used when a word is compared to a regular expression, such as:

"Jack" =~ /\w+/

Here, comp_regexp would contain /\w+/

comp_regexp_op

The regular expression operator used when a word is compared to a regular expression, such as:

"Jack" =~ /\w+/

Here, comp_regexp_op would contain =~

comp_stringcomp

When the comparison is for a string comparison as opposed to an integer comparison.

comp_unary

Matches the expression that uses unary operator, such as:

-d, -e, -f, -s, -L, -h, -F, -U, -A, -n, -z, -T, -R

For example:

-A /some/uri.html # (same as -U)
-d /some/folder # file is a directory
-e /some/folder/file.txt # file exists
-f /some/folder/file.txt # file is a regular file
-F /some/folder/file.txt # file is a regular file and is accessible to all (Apache2 does a sub query to check)
-h /some/folder/link.txt # true if file is a symbolic link
-n %{QUERY_STRING} # true if string is not empty (opposite of -z)
-s /some/folder/file.txt # true if file is not empty
-L /some/folder/link.txt # true if file is a symbolic link (same as -h)
-R 192.168.1.1/24 # remote ip match this ip block; same as %{REMOTE_ADDR} -ipmatch 192.168.1.1/24
-T %{HTTPS} # false if string is empty, "0", "off", "false", or "no" (case insensitive). True otherwise.
-U /some/uri.html # check if the uri is accessible to all (Apache2 does a sub query to check)
-z %{QUERY_STRING} # true if string is empty (opposite of -n)
comp_word

Contains the word that is the object of the comparison.

comp_word_in_list

Contains the expression of a word checked against a list, such as:

"Jack" in {"John", "Peter", "Jack"}
comp_word_in_listfunc

Contains the word when it is being compared to a listfunc, such as:

"Jack" in listMe("some arguments")
comp_word_in_regexp

Contains the expression of a word checked against a regular expression, such as:

"Jack" =~ /\w+/

Here the word Jack (without the parenthesis) would be captured in comp_word

comp_worda

Contains the first word in comparison expression

comp_wordb

Contains the second word in comparison expression

cond

BNF:

"true"
| "false"
| "!" cond
| cond "&&" cond
| cond "||" cond
| comp
| "(" cond ")"

$RE{Apache2}{Cond}

For example:

use Regexp::Common::Apache qw( $ap_true $ap_false );
($ap_false && $ap_true)

The capture names are:

cond

Contains the entire capture block

cond_and

Contains the expression like:

($ap_true && $ap_true)
cond_and_expr1

The first expression in a ANDed condition, such as :

$ap_true && $ap_false
cond_and_expr2

The second expression in a ANDed condition, such as :

$ap_true && $ap_false
cond_comp

Contains the comparison expression. See "comp" above.

cond_expr

Expression that is capture after following a negatiion, such as :

!-e /some/folder/file.txt

Here cond_expr would contain -e /some/folder/file.txt

cond_false

Contains the false expression like:

($ap_false)
false # as a litteral word
0 # 0 as a standalone number not surrounded by any number or letter
cond_neg

Contains the expression if it is preceded by an exclamation mark, such as:

!$ap_true
cond_or

Contains the expression like:

($ap_true || $ap_true)
cond_or_expr1

The first expression in a ORed condition, such as :

$ap_true && $ap_false
cond_or_expr2

The second expression in a ORed condition, such as :

$ap_true && $ap_false
cond_parenthesis

Contains the condition when it is embedded within parenthesis.

cond_true

Contains the true expression like:

($ap_true)
true # as a litteral word
1 # 1 as a standalone number not surrounded by any number or letter

expr

BNF: cond | string

$RE{Apache2}{Expr}

The capture names are:

expr

Contains the entire capture block

expr_cond

Contains the expression of the condition

expr_string

Contains the expression of a string

function

BNF: funcname "(" words ")"

$RE{Apache2}{Function}

For example:

base64("Some string")
someFunc()
md5(  "one arg" )
otherFunc( %{some_var}, "quoted", split( /\w+/, "John Paul" ) )

The capture names are:

function

Contains the entire capture block

func_args

Contains the list of arguments. In the example above, this would be Some string

func_name

The name of the function . In the example above, this would be base64

integercomp

BNF:

word "-eq" word | word "eq" word
| word "-ne" word | word "ne" word
| word "-lt" word | word "lt" word
| word "-le" word | word "le" word
| word "-gt" word | word "gt" word
| word "-ge" word | word "ge" word

$RE{Apache2}{IntegerComp}

For example:

123 -ne 456
789 gt 234
# etc

The hyphen before the operator is optional, so you can say eq instead of -eq

The capture names are:

stringcomp

Contains the entire capture block

integercomp_op

Contains the comparison operator

integercomp_worda

Contains the first word in the string comparison

integercomp_wordb

Contains the second word in the string comparison

listfunc

BNF: listfuncname "(" words ")"

$RE{Apache2}{ListFunc}

For example:

base64("Some string")
someFunc()
md5(  "one arg" )
otherFunc( %{some_var}, "quoted", split( /\w+/, "John Paul" ) )

This is quite similar to the "function" regular expression

The capture names are:

listfunc

Contains the entire capture block

func_args

Contains the list of arguments. In the example above, this would be Some string

func_name

The name of the function . In the example above, this would be base64

regex

BNF:

"/" regpattern "/" [regflags]
| "m" regsep regpattern regsep [regflags]

$RE{Apache2}{Regexp}

For example:

/\w+/i
# or
m,\w+,i

The capture names are:

regex

Contains the entire capture block

regflags

The regular expression modifiers. See perlre

This can be any combination of:

i, s, m, g
regpattern

Contains the regular expression. See perlre for example and explanation of how to use regular expression. Apache2 uses PCRE, i.e. perl compliant regular expressions.

regsep

Contains the regular expression separator, which can be any of:

/, #, $, %, ^, |, ?, !, ', ", ",", ;, :, ".", _, -

string

BNF: substring | string substring

$RE{Apache2}{String}

For example:

URI accessed is: %{REQUEST_URI}

The capture names are:

string

Contains the entire capture block

stringcomp

BNF:

word "==" word
| word "!=" word
| word "<"  word
| word "<=" word
| word ">"  word
| word ">=" word

$RE{Apache2}{StringComp}

For example:

"John" == "Jack"
sub(s/\w+/Jack/i, "John") != "Jack"
# etc

The capture names are:

stringcomp

Contains the entire capture block

stringcomp_op

Contains the comparison operator

stringcomp_worda

Contains the first word in the string comparison

stringcomp_wordb

Contains the second word in the string comparison

substring

BNF: cstring | variable | rebackref

$RE{Apache2}{Substring}

For example:

Jack
# or
%{REQUEST_URI}

See "variable" and "word" regular expression for more on those.

The capture names are:

rebackref

Contains a regular expression back reference such as $1, $2, etc up to $9

substring

Contains the entire capture block

variable

BNF:

"%{" varname "}"
| "%{" funcname ":" funcargs "}"
| "v(" varname ")"

$RE{Apache2}{Variable}
# or to enable legacy variable:
$RE{Apache2}{LegacyVariable}

For example:

%{REQUEST_URI}
# or
%{md5:"some string"}
# or
v(REQUEST_URI)
# legacy variable allows extended variable. See LEGACY APACHE2 EXPRESSION below

See "word" and "cond" regular expression for more on those.

The capture names are:

variable

Contains the entire capture block

var_func

Contains the text for the function and its arguments if this is a function.

var_func_args

Contains the function arguments.

var_func_name

Contains the function name.

varname

Contains the variable name without the percent sign or dollar sign (if legacy regular expression is enabled) or the possible surrounding accolades

word

BNF:

digits
| "'" string "'"
| '"' string '"'
| word "." word
| variable
| function
| "(" word ")"
| rebackref

$RE{Apache2}{Word}

This is the most complex regular expression used, since it uses all the others and can recurse deeply

For example:

12
# or
"John"
# or
'Jack'
# or
%{REQUEST_URI}
# or
%{HTTP_HOST}.%{HTTP_PORT}
# or
md5("some string")
# or any word surrounded by parenthesis, such as:
("John")

See "string", "word", "variable", "function" regular expression for more on those.

The capture names are:

rebackref

Contains a regular expression back reference such as $1, $2, etc up to $9

word

Contains the entire capture block

word_digits

If the word is actually digits, ths contains those digits.

word_dot_word

This contains the text when two words are separated by a dot.

word_enclosed

Contains the value of the word enclosed by single or double quotes or by surrounding parenthesis.

word_function

Contains the word containing a "function"

word_quote

If the word is enclosed by single or double quote, this contains the single or double quote character

word_variable

Contains the word containing a "variable"

words

BNF:

word
| word "," word

$RE{Apache2}{Words}

For example:

"Jack"
# or
"John", "Peter", "Paul"

See "word" and "list" regular expression for more on those.

The capture names are:

words

Contains the entire capture block

words_word

Contains the word

words_list

Contains the list

ADVANCED APACHE2 EXPRESSION

comp

BNF:

stringcomp
| integercomp
| unaryop word
| word binaryop word
| word "in" listfunc
| word "=~" regex
| word "!~" regex
| word "in" "{" list "}"

$RE{Apache2}{TrunkComp}

For example:

"Jack" != "John"
123 -ne 456
# etc

This uses other expressions namely "stringcomp", "integercomp", "word", "listfunc", "regex", "list"

This is similar to the regular "comp" in "APACHE2 EXPRESSION", except it uses "list" instead of "words"

The capture names are:

comp

Contains the entire capture block

comp_binary

Matches the expression that uses a binary operator, such as:

==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch
comp_binaryop

The binary op used if the expression is a binary comparison. Binary operator is:

==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch
comp_integercomp

When the comparison is for an integer comparison as opposed to a string comparison.

comp_list

Contains the list used to check a word against, such as:

"Jack" in {"John", "Peter", "Jack"}
comp_listfunc

This contains the listfunc when the expressions contains a word checked against a list function, such as:

"Jack" in listMe("some arguments")
comp_regexp

The regular expression used when a word is compared to a regular expression, such as:

"Jack" =~ /\w+/

Here, comp_regexp would contain /\w+/

comp_regexp_op

The regular expression operator used when a word is compared to a regular expression, such as:

"Jack" =~ /\w+/

Here, comp_regexp_op would contain =~

comp_stringcomp

When the comparison is for a string comparison as opposed to an integer comparison.

comp_unary

Matches the expression that uses unary operator, such as:

-d, -e, -f, -s, -L, -h, -F, -U, -A, -n, -z, -T, -R
comp_word

Contains the word that is the object of the comparison.

comp_word_in_list

Contains the expression of a word checked against a list, such as:

"Jack" in {"John", "Peter", "Jack"}
comp_word_in_listfunc

Contains the word when it is being compared to a listfunc, such as:

"Jack" in listMe("some arguments")
comp_word_in_regexp

Contains the expression of a word checked against a regular expression, such as:

"Jack" =~ /\w+/

Here the word Jack (without the parenthesis) would be captured in comp_word

comp_worda

Contains the first word in comparison expression

comp_wordb

Contains the second word in comparison expression

cond

BNF:

"true"
| "false"
| "!" cond
| cond "&&" cond
| cond "||" cond
| comp
| "(" cond ")"

$RE{Apache2}{TrunkCond}

Same as "cond" in "APACHE2 EXPRESSION"

expr

BNF: cond | string

$RE{Apache2}{TrunkExpr}

Same as "cond" in "APACHE2 EXPRESSION"

function

BNF: funcname "(" words ")"

$RE{Apache2}{TrunkFunction}

Same as "cond" in "APACHE2 EXPRESSION"

integercomp

BNF:

word "-eq" word | word "eq" word
| word "-ne" word | word "ne" word
| word "-lt" word | word "lt" word
| word "-le" word | word "le" word
| word "-gt" word | word "gt" word
| word "-ge" word | word "ge" word

$RE{Apache2}{TrunkIntegerComp}

Same as "cond" in "APACHE2 EXPRESSION"

join

BNF:

"join" ["("] list [")"]
| "join" ["("] list "," word [")"]

$RE{Apache2}{TrunkJoin}

For example:

join({"word1" "word2"})
# or
join({"word1" "word2"}, ', ')

This uses "list" and "word"

The capture names are:

join

Contains the entire capture block

join_list

Contains the value of the list

join_word

Contains the value for word used to join the list

list

BNF:

split
| listfunc
| "{" words "}"
| "(" list ")

$RE{Apache2}{TrunkList}

For example:

split( /\w+/, "Some string" )
# or
{"some", "words"}
# or
(split( /\w+/, "Some string" ))
# or
( {"some", "words"} )

This uses "split", "listfunc", words and "list"

The capture names are:

list

Contains the entire capture block

list_func

Contains the value if a "listfunc" is used

list_list

Contains the value if this is a list embedded within parenthesis

list_split

Contains the value if the list is based on a split

list_words

Contains the value for a list of words.

listfunc

BNF: listfuncname "(" words ")"

$RE{Apache2}{TrunkListFunc}

Same as "cond" in "APACHE2 EXPRESSION"

regany

BNF: regex | regsub

$RE{Apache2}{TrunkRegany}

For example:

/\w+/i
# or
m,\w+,i

This regular expression includes "regany" and "regsub"

The capture names are:

regany

Contains the entire capture block

regany_regex

Contains the regular expression. See "regex"

regany_regsub

Contains the substitution regular expression. See "regsub"

regex

BNF:

"/" regpattern "/" [regflags]
| "m" regsep regpattern regsep [regflags]

$RE{Apache2}{TrunkRegexp}

Same as "cond" in "APACHE2 EXPRESSION"

regsub

BNF: "s" regsep regpattern regsep string regsep [regflags]

$RE{Apache2}{TrunkRegsub}

For example:

s/\w+/John/gi

The capture names are:

regflags

The modifiers used which can be any combination of:

i, s, m, g

See perlre for an explanation of their usage and meaning

regstring

The string replacing the text found by the regular expression

regsub

Contains the entire capture block

regpattern

Contains the regular expression which is perl compliant since Apache2 uses PCRE.

regsep

Contains the regular expression separator, which can be any of:

/, #, $, %, ^, |, ?, !, ', ", ",", ;, :, ".", _, -

split

BNF:

"split" ["("] regany "," list [")"]
| "split" ["("] regany "," word [")"]

$RE{Apache2}{TrunkSplit}

For example:

split( /\w+/, "Some string" )

This uses "regany", "list" and "word"

The capture names are:

split

Contains the entire capture block

split_regex

Contains the regular expression used for the split

split_list

The list being split. It can also be a word. See below

split_word

The word being split. It can also be a list. See above

string

BNF: substring | string substring

$RE{Apache2}{TrunkString}

Same as "cond" in "APACHE2 EXPRESSION"

stringcomp

BNF:

word "==" word
| word "!=" word
| word "<"  word
| word "<=" word
| word ">"  word
| word ">=" word

$RE{Apache2}{TrunkStringComp}

Same as "cond" in "APACHE2 EXPRESSION"

sub

BNF: "sub" ["("] regsub "," word [")"]

$RE{Apache2}{TrunkSub}

For example:

sub(s/\w/John/gi,"Peter")

The capture names are:

sub

Contains the entire capture block

sub_regsub

Contains the substitution expression, i.e. in the example above, this would be:

s/\w/John/gi
sub_word

The target for the substitution. In the example above, this would be "Peter"

substring

BNF: cstring | variable

$RE{Apache2}{TrunkSubstring}

For example:

Jack
# or
%{REQUEST_URI}
# or
%{:sub(s/\b\w+\b/Peter/, "John"):}

See "variable" and "word" regular expression for more on those.

This is different from "substring" in "APACHE2 EXPRESSION" in that it does not include regular expression back reference, i.e. $1, $2, etc.

The capture names are:

substring

Contains the entire capture block

variable

BNF:

"%{" varname "}"
| "%{" funcname ":" funcargs "}"
| "v('" varname "')"
| "%{:" word ":}"
| "%{:" cond ":}"
| rebackref

$RE{Apache2}{TrunkVariable}

For example:

%{REQUEST_URI}
# or
%{md5:"some string"}
# or
%{:sub(s/\b\w+\b/Peter/, "John"):}
# or a reference to previous regular expression capture groups
$1, $2, etc..

See "word" and "cond" regular expression for more on those.

The capture names are:

rebackref

Contains the regular expression back reference such as $1, $2, etc

But without the leading dollar sign nor the enclosing accolade, if any, thus in the example of $1 or ${1} rebackref would be 1

variable

Contains the entire capture block

var_backref

Contains the regular expression back reference such as $1, $2, etc

This includes the leadaing dollar sign and any enclosing accolade, if any, such as ${1}

var_cond

If this is a condition inside a variable, such as:

%{:$ap_true == $ap_false}
var_func

Contains the text for the function and its arguments if this is a function.

var_func_args

Contains the function arguments.

var_func_name

Contains the function name.

var_word

A variable containing a word. See "word" for more information about word expressions.

varname

Contains the variable name without the percent sign or dollar sign (if legacy regular expression is enabled) or the possible surrounding accolades

word

BNF:

digits
| "'" string "'"
| '"' string '"'
| word "." word
| variable
| sub
| join
| function
| "(" word ")"

$RE{Apache2}{TrunkWord}

This is the most complex regular expression used, since it uses all the others and can recurse deeply

For example:

12
# or
"John"
# or
'Jack'
# or
%{REQUEST_URI}
# or
%{HTTP_HOST}.%{HTTP_PORT}
# or
%{:sub(s/\b\w+\b/Peter/, "John"):}
# or
sub(s,\w+,Paul,gi, "John")
# or
join({"Paul", "Peter"}, ', ')
# or
md5("some string")
# or any word surrounded by parenthesis, such as:
("John")

See "string", "word", "variable", "sub", "join", "function" regular expression for more on those.

The capture names are:

word

Contains the entire capture block

word_digits

If the word is actually digits, thise contains those digits.

word_dot_word

This contains the text when two words are separated by a dot.

word_enclosed

Contains the value of the word enclosed by single or double quotes or by surrounding parenthesis.

word_function

Contains the word containing a "function"

word_join

Contains the word containing a "join"

word_quote

If the word is enclosed by single or double quote, this contains the single or double quote character

word_sub

If the word is a substitution, this contains tha substitution

word_variable

Contains the word containing a "variable"

words

BNF:

word
| word "," list

$RE{Apache2}{TrunkWords}

For example:

"Jack"
# or
"John", {"Peter", "Paul"}
# or
sub(s/\b\w+\b/Peter/, "John"), {"Peter", "Paul"}

See "word" and "list" regular expression for more on those.

It is different from "words" in "APACHE2 EXPRESSION" in that it uses "list" instead of "word"

The capture names are:

words

Contains the entire capture block

words_word

Contains the word

words_list

Contains the list

LEGACY

When using legacy mode, the regular expressions are more laxed in what they accept around 3 types of expressions:

1. comp

Same as "comp", and it extends it by adding support for legacy regular expression, i.e. without using the tilde (~). For example :

$HTTP_COOKIES = /lang\%22\%3A\%22([a-zA-Z]+\-[a-zA-Z]+)\%22\%7D;?/

In current version of Apache2 expression this would rather be writen as:

%{HTTP_COOKIES} =~ /lang\%22\%3A\%22([a-zA-Z]+\-[a-zA-Z]+)\%22\%7D;?/

Both are supported in legacy expressions.

The additional capture groups available are:

comp_in_regexp_legacy

Contains the entire legacy regular expression.

comp_regexp (unchanged)

Contains the regular expression.

comp_regexp_op

Contains the operator, which may be =, or ==, or !=

comp_word (unchanged)

Contains the word being compared

2. cond

It is the same as "cond", except it also accepts a vanilla variable as valid condition, such as: $REQUEST_URI, so that expression as below would work :

!$REQUEST_URI

It adds the following capture groups:

cond_variable

Contains the variable used in the condition, including the leading dollar or percent sign and possible surrounding accolades.

3. variable

Same as "variable", but is extended to accept vanilla variable such as $REQUEST_URI. In current Apache2 expressions, a variable is anoted by using percent sign and potentially surounding it with accolades. For example :

%{REQUEST_URI}

Also legacy variable includes regular expression back reference such as $1, $2, etc.

Its capture groups names are:

variable

Contains the entire variable.

varname

Contains the variable name without dollar or percent sign no possible surrounding accolades.

var_backref

The regular expression back reference including the dollar sign and possible surrounding accolades. For example: $1 or ${1}

rebackref

The regular expression back reference excluding the dollar sign and possible surrounding accolades. For example: $1 or ${1} would mean rebackref would contain 1

var_func_name

The variable-embedded function name

var_func_args

The variable-embedded function arguments

4. word

word is extended to also accept a regular expression back refernece such as $1, $2, etc.

CAVEAT

Functions need to have their arguments enclosed in parenthesis. For example:

%{REMOTE_ADDR} -in split s/.*?IP Address:([^,]+)/$1/, PeerExtList('subjectAltName')

will not work, but the following will:

%{REMOTE_ADDR} -in split(s/.*?IP Address:([^,]+)/$1/, PeerExtList('subjectAltName'))

Maybe this will be adjusted in future versions.

CHANGES & CONTRIBUTIONS

Feel free to reach out to the author for possible corrections, improvements, or suggestions.

AUTHOR

Jacques Deguest <jack@deguest.jp>

SEE ALSO

https://httpd.apache.org/docs/current/expr.html and https://httpd.apache.org/docs/trunk/en/expr.html

COPYRIGHT & LICENSE

Copyright (c) 2020 DEGUEST Pte. Ltd.

You can use, copy, modify and redistribute this package and associated files under the same terms as Perl itself.