NAME
Apache2::Expression - Apache2 Expressions
SYNOPSIS
use Apache2::Expression;
my $exp = Apache2::Expression->new( legacy => 1 );
my $hash = $exp->parse;
VERSION
v0.1.1
DESCRIPTION
Apache2::Expression is used to parse Apache2 expression like the one found in SSI (Server Side Includes).
METHODS
parse
This method takes a string representing an Apache2 expression as argument, and returns an hash containing the details of the elements that make the expression.
It takes an optional hash of parameters, as follows :
legacy
-
When this is provided with a positive value, this will enable Apache2 legacy regular expression. See Regexp::Common::Apache2 for more information on what this means.
trunk
-
When this is provided with a positive value, this will enable Apache2 experimental and advanced expressions. See Regexp::Common::Apache2 for more information on what this means.
For example :
$HTTP_COOKIE = /lang\%22\%3A\%22([a-zA-Z]+\-[a-zA-Z]+)\%22\%7D;?/
would return :
{
elements => [
{
elements => [
{
elements => [
{
elements => [],
name => "HTTP_COOKIE",
raw => "\$HTTP_COOKIE",
re => { variable => "\$HTTP_COOKIE", varname => "HTTP_COOKIE" },
subtype => "variable",
type => "variable",
},
{
elements => [],
flags => undef,
pattern => "lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?",
raw => "/lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
re => {
regex => "/lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
regpattern => "lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?",
regsep => "/",
},
sep => "/",
type => "regex",
},
],
op => "=",
raw => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
re => {
comp => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
comp_in_regexp_legacy => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
comp_regexp => "/lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
comp_regexp_op => "=",
comp_word => "\$HTTP_COOKIE",
},
regexp => "/lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
subtype => "regexp",
type => "comp",
word => "\$HTTP_COOKIE",
},
],
raw => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
re => {
cond => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
cond_comp => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
},
subtype => "comp",
type => "cond",
},
],
raw => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
}
The properties returned in the hash are:
elements
-
An array reference of sub elements contained which provides granular definition.
Whatever the
elements
array reference contains is defined in one of the types below. name
-
The name of the element. For example if this is a function, this would be the function name, or if this is a variable, this would be the variable name without it leading dollar or percent sign nor its possible surrounding accolades.
raw
-
The raw string, or chunk of string that was processed.
re
-
This contains the hash of capture groups as provided by Regexp::Common::Apache2. It is made available to enable finer and granular control.
regexp
subtype
-
A sub type that provide more information about the type of expression processed.
This can be any of the
type
mentioned below plus the following ones : binary (for comparison), list (for word to list comparison), negative, parenthesis, rebackref, regexp, unary (for comparison)See below for possible combinations.
type
-
The main type matching the Apache2 expression. This can be comp, cond, digits, function, integercomp, quote (for quoted words), regex, stringcomp, listfunc, variable, word
See below for possible combinations.
word
-
If this is a word, this contains the word. In th example above,
$HTTP_COOKIE
would be the word used in the regular expression comparison.
parse_args
Given a string that represents typically a function arguments, this method will use PPI to parse it and returns an array of parameters as string.
Parsing a function argument is non-trivial as it can contain function call within function call.
COMBINATIONS
- comp
-
Type: comp
Possible sub types:
binary
-
When a binary operator is used, such as :
==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch
Example :
192.168.2.10 -ipmatch 192.168.2/24
192.168.2.10
would be captured in propertyworda
,ipmatch
(without leading dash) would be captured in propertyop
and192.168.2/24
would be captured in propertywordb
.The array reference in property
elements
will contain more information onworda
andwordb
Also the details of elements for
worda
can be accessed with propertyworda_def
as an array reference and likewise forwordb
withwordb_def
. function
-
This contains the function name and arguments when the lefthand side word is compared to a list function.
For example :
192.168.1.10 in split( /\,/, $ip_list )
In this example,
192.168.1.10
would be captured inword
andsplit( /\,/, $ip_list )
would be captured infunction
with the array referenceelements
containing more information about the word and the function.Also the details of elements for
word
can be accessed with propertyword_def
as an array reference and likewise forfunction
withfunction_def
. list
-
Is true when the comparison is of a word on the lefthand side to a list of words, such as :
%{SOME_VALUE} in {"John", "Peter", "Paul"}
In this example,
%{SOME_VALUE}
would be captured in propertyword
and"John", "Peter", "Paul"
(without enclosing accolades or possible spaces after and before them) would be captured in propertylist
The array reference
elements
will possibly contain more information onword
and each element inlist
Also the details of elements for
word
can be accessed with propertyword_def
as an array reference and likewise forlist
withlist_def
. regexp
-
When the lefthand side word is being compared to a regular expression.
For example :
%{HTTP_COOKIE} =~ /lang\%22\%3A\%22([a-zA-Z]+\-[a-zA-Z]+)\%22\%7D;?/
In this example,
%{HTTP_COOKIE}
would be captured in propertyword
and/lang\%22\%3A\%22([a-zA-Z]+\-[a-zA-Z]+)\%22\%7D;?/
would be captured in propertyregexp
and=~
would be captured in propertyop
Check the array reference in property
elements
for more details about theword
and the regular expression inregexp
.Also the details of elements for
word
can be accessed with propertyword_def
as an array reference and likewise forregexp
withregexp_def
. unary
-
When the following operator is used against a word :
-d, -e, -f, -s, -L, -h, -F, -U, -A, -n, -z, -T, -R
For example:
-A /some/uri.html # (same as -U) -d /some/folder # file is a directory -e /some/folder/file.txt # file exists -f /some/folder/file.txt # file is a regular file -F /some/folder/file.txt # file is a regular file and is accessible to all (Apache2 does a sub query to check) -h /some/folder/link.txt # true if file is a symbolic link -n %{QUERY_STRING} # true if string is not empty (opposite of -z) -s /some/folder/file.txt # true if file is not empty -L /some/folder/link.txt # true if file is a symbolic link (same as -h) -R 192.168.1.1/24 # remote ip match this ip block; same as %{REMOTE_ADDR} -ipmatch 192.168.1.1/24 -T %{HTTPS} # false if string is empty, "0", "off", "false", or "no" (case insensitive). True otherwise. -U /some/uri.html # check if the uri is accessible to all (Apache2 does a sub query to check) -z %{QUERY_STRING} # true if string is empty (opposite of -n)
In this example
-e /some/folder/file.txt
,e
(without leading dash) would be captured inop
and/some/folder/file.txt
would be captured inword
Check the array reference in property
elements
for more information about the word inword
Also the details of elements for
word
can be accessed with propertyword_def
as an array reference.See here for more information: Regexp::Common::Apache2::comp
Available properties:
op
-
Contains the operator used. See Regexp::Common::Apache2::comp, "stringcomp" in Regexp::Common::Apache2 and "integercomp" in Regexp::Common::Apache2
This may be for unary operators :
-d, -e, -f, -s, -L, -h, -F, -U, -A, -n, -z, -T, -R
For binary operators :
==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch
For integer comparison :
-eq, -ne, -lt, -le, -gt, -ge
For string comparison :
==, !=, <, <=, >, >=
In all the possible operators above,
op
contains the value, but without the leading dash, if any. word
-
The word being compared.
worda
-
The first word being compared, and on the left of the operator. For example :
12 -ne 10
wordb
-
The second word, being compared to, and on the right of the operator.
See "comp" in Regexp::Common::Apache2 for more information.
- cond
-
Type: cond
Possible sub types:
and
-
When the condition is an ANDed expression such as :
$ap_true && $ap_false
In this case,
$ap_true
would be captured in propertyexpr1
and$ap_false
would be captured in propertyexpr2
Also the details of elements for the variable can be accessed with property
and_def
as an array reference andand_expr1_def
andand_expr2_def
comp
-
Contains the expression when the condition is actually a comparison.
This will recurse and you can see more information in the array reference in the property
elements
. For more information on what it will contain, check the comp type. cond
-
Default sub type
negative
-
When the condition is negative, ie prefixed by an exclamation mark.
For example :
!-z /some/folder/file.txt
You need to check for the details in array reference contained in property
elements
Also the details of elements for the variable can be accessed with property
negative_def
as an array reference. or
-
When the condition is an ORed expression such as :
$ap_true || $ap_false
In this case,
$ap_true
would be captured in propertyexpr1
and$ap_false
would be captured in propertyexpr2
Also the details of elements for the variable can be accessed with property
and_def
as an array reference andand_expr1_def
andand_expr2_def
parenthesis
-
When the condition is embedded within parenthesis
You need to check the array reference in property
elements
for information about the embedded condition.Also the details of elements for the variable can be accessed with property
parenthesis_def
as an array reference. variable
-
Contains the expression when the condition is based on a variable, such as :
%{REQUEST_URI}
Check the array reference in property
elements
for more details about the variable, especially the propertyname
which would contain the name of the variable; in this case :REQUEST_URI
Also the details of elements for the variable can be accessed with property
variable_def
as an array reference.
Available properties:
args
-
Function arguments. See the content of the
elements
array reference for more breakdown on the arguments provided. is_negative
-
If the condition is negative, this value is true
name
-
Function name
See "cond" in Regexp::Common::Apache2 for more information.
- function
-
Type: function
Possible sub types: none
Available properties:
args
-
Function arguments. See the content of the
elements
array reference for more breakdown on the arguments provided.Also the details of elements for those args can be accessed with property
args_def
as an array reference. name
-
Function name
See "function" in Regexp::Common::Apache2 for more information.
- integercomp
-
Type: integercomp
Possible sub types: none
Available properties:
op
-
Contains the operator used. See "integercomp" in Regexp::Common::Apache2
worda
-
The first word being compared, and on the left of the operator. For example :
12 -ne 10
Also the details of elements for
worda
can be accessed with propertyworda_def
as an array reference. wordb
-
The second word, being compared to, and on the right of the operator.
Also the details of elements for
wordb
can be accessed with propertywordb_def
as an array reference.
See "integercomp" in Regexp::Common::Apache2 for more information.
- join
-
Type: join
Possible sub types: none
Available properties:
list
-
The list of strings to be joined. See the content of the
elements
array reference for more breakdown on the arguments provided.Also the details of elements for those args can be accessed with property
list_def
as an array reference. word
-
The word used to join the list. This parameter is optional.
Details for the word parameter, if any, can be found in the
elements
array reference or can be accessed with theword_def
property.
For example :
join({"John Paul Doe"}, ', ') # or join({"John", "Paul", "Doe"}, ', ') # or just join({"John", "Paul", "Doe"})
See "join" in Regexp::Common::Apache2 for more information.
- listfunc
-
Type: listfunc
Possible sub types: none
Available properties:
args
-
Function arguments. See the content of the
elements
array reference for more breakdown on the arguments provided.Also the details of elements for those args can be accessed with property
args_def
as an array reference. name
-
Function name
See "listfunc" in Regexp::Common::Apache2 for more information.
- regex
-
Type: regex
Possible sub types: none
Available properties:
flags
-
Example:
mgis
pattern
-
Regular expression pattern, excluding enclosing separators.
sep
-
Type of separators used. It can be: /, #, $, %, ^, |, ?, !, ', ", ",", ";", ":", ".", _, and -
See "regex" in Regexp::Common::Apache2 for more information.
- stringcomp
-
Type: stringcomp
Possible sub types: none
Available properties:
op
-
COntains the operator used. See "stringcomp" in Regexp::Common::Apache2
worda
-
The first word being compared, and on the left of the operator. For example :
12 -ne 10
Also the details of elements for
worda
can be accessed with propertyworda_def
as an array reference. wordb
-
The second word, being compared to, and on the right of the operator.
Also the details of elements for
wordb
can be accessed with propertywordb_def
as an array reference.
See "stringcomp" in Regexp::Common::Apache2 for more information.
- variable
-
Type: variable
Possible sub types:
function
-
%{md5:"some arguments"}
rebackref
-
This is a regular expression back reference, such as
$1
,$2
, etc. up to 9 variable
-
%{REQUEST_URI} # or by enabling the legacy expressions ${REQUEST_URI}
Available properties:
args
-
Function arguments. See the content of the
elements
array reference for more breakdown on the arguments provided. name
-
Function name, or variable name.
value
-
The regular expression back reference value, such as
1
,2
, etc
See "variable" in Regexp::Common::Apache2 for more information.
- word
-
Type: word
Possible sub types:
digits
-
When the word contains one or more digits.
dotted
-
When the word contains words sepsrated by dots, such as
192.168.1.10
function
-
When the word is a function.
parens
-
When the word is surrounded by parenthesis
quote
-
When the word is surrounded by single or double quotes
rebackref
-
When the word is a regular expression back reference such as
$1
,$2
, etc up to 9. regex
-
This is an extension I added to make work some function such as
split( /\w+/, $ip_list)
Without it, the regular expression would not be recognised as the Apache BNF stands.
variable
-
When the word is a variable. For example :
%{REQUEST_URI}
, and it can also be a variable like${REQUEST_URI
if the legacy mode is enabled.
Available properties:
flags
-
The regular expression flags used, such as
mgis
parens
-
Contains an array reference of the open and close parenthesis, such as:
["(", ")"]
pattern
-
The regular expression pattern
quote
-
Contains the type of quote used if the sub type is
quote
regex
-
Contains the regular expression
sep
-
The separator used in the regular expression, such as
/
value
-
The value of the digits if the sub type is
digits
orrebackref
word
-
The word enclosed in quotes
See "variable" in Regexp::Common::Apache2 for more information.
CAVEAT
This module supports well Apache2 expressions. However, some expression are difficult to process. For example:
Expressions with functions not using enclosing parenthesis:
%{REMOTE_ADDR} -in split s/.*?IP Address:([^,]+)/$1/, PeerExtList('subjectAltName')
Instead, use:
%{REMOTE_ADDR} -in split(s/.*?IP Address:([^,]+)/$1/, PeerExtList('subjectAltName'))
There is no mechanism yet to prevent infinite recursion. This needs to be implemented.
CHANGES & CONTRIBUTIONS
Feel free to reach out to the author for possible corrections, improvements, or suggestions.
AUTHOR
Jacques Deguest <jack@deguest.jp>
SEE ALSO
Apache2::SSI, Regexp::Common::Apache2, https://httpd.apache.org/docs/current/expr.html
COPYRIGHT & LICENSE
Copyright (c) 2020 DEGUEST Pte. Ltd.
You can use, copy, modify and redistribute this package and associated files under the same terms as Perl itself.