NAME
Apache2::Expression - Apache2 Expressions
SYNOPSIS
use Apache2::Expression;
my $exp = Apache2::Expression->new( legacy => 1 );
my $hash = $exp->parse;
VERSION
v0.1.0
DESCRIPTION
Apache2::Expression is used to parse Apache2 expression like the one found in SSI (Server Side Includes).
METHODS
parse
This method takes a string representing an Apache2 expression as argument, and returns an hash containing the details of the elements that make the expression.
It takes an optional hash of parameters, as follows :
- legacy
-
When this is provided with a positive value, this will enable Apache2 legacy regular expression. See Regexp::Common::Apache2 for more information on what this means.
- trunk
-
When this is provided with a positive value, this will enable Apache2 experimental and advanced expressions. See Regexp::Common::Apache2 for more information on what this means.
For example :
$HTTP_COOKIE = /lang\%22\%3A\%22([a-zA-Z]+\-[a-zA-Z]+)\%22\%7D;?/
would return :
{
elements => [
{
elements => [
{
elements => [
{
elements => [],
name => "HTTP_COOKIE",
raw => "\$HTTP_COOKIE",
re => { variable => "\$HTTP_COOKIE", varname => "HTTP_COOKIE" },
subtype => "variable",
type => "variable",
},
{
elements => [],
flags => undef,
pattern => "lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?",
raw => "/lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
re => {
regex => "/lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
regpattern => "lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?",
regsep => "/",
},
sep => "/",
type => "regex",
},
],
op => "=",
raw => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
re => {
comp => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
comp_in_regexp_legacy => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
comp_regexp => "/lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
comp_regexp_op => "=",
comp_word => "\$HTTP_COOKIE",
},
regexp => "/lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
subtype => "regexp",
type => "comp",
word => "\$HTTP_COOKIE",
},
],
raw => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
re => {
cond => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
cond_comp => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
},
subtype => "comp",
type => "cond",
},
],
raw => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
}
The properties returned in the hash are:
- elements
-
An array reference of sub elements contained which provides granular definition.
Whatever the elements array reference contains is defined in one of the types below.
- name
-
The name of the element. For example if this is a function, this would be the function name, or if this is a variable, this would be the variable name without it leading dollar or percent sign nor its possible surrounding accolades.
- raw
-
The raw string, or chunk of string that was processed.
- re
-
This contains the hash of capture groups as provided by Regexp::Common::Apache2. It is made available to enable finer and granular control.
- regexp
- subtype
-
A sub type that provide more information about the type of expression processed.
This can be any of the type mentioned below plus the following ones : binary (for comparison), list (for word to list comparison), negative, parenthesis, rebackref, regexp, unary (for comparison)
See below for possible combinations.
- type
-
The main type matching the Apache2 expression. This can be comp, cond, digits, function, integercomp, quote (for quoted words), regex, stringcomp, listfunc, variable, word
See below for possible combinations.
- word
-
If this is a word, this contains the word. In th example above,
$HTTP_COOKIE
would be the word used in the regular expression comparison.
parse_args
Given a string that represents typically a function arguments, this method will use PPI to parse it and returns an array of parameters as string.
Parsing a function argument is non-trivial as it can contain function call within function call.
COMBINATIONS
- comp
-
Type: comp
Possible sub types:
- binary
-
When a binary operator is used, such as :
==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch
Example :
192.168.2.10 -ipmatch 192.168.2/24
192.168.2.10
would be captured in property worda,ipmatch
(without leading dash) would be captured in property op and192.168.2/24
would be captured in property wordb.The array reference in property elements will contain more information on worda and wordb
Also the details of elements for worda can be accessed with property worda_def as an array reference and likewise for wordb with wordb_def.
- function
-
This contains the function name and arguments when the lefthand side word is compared to a list function.
For example :
192.168.1.10 in split( /\,/, $ip_list )
In this example,
192.168.1.10
would be captured in word andsplit( /\,/, $ip_list )
would be captured in function with the array reference elements containing more information about the word and the function.Also the details of elements for word can be accessed with property word_def as an array reference and likewise for function with function_def.
- list
-
Is true when the comparison is of a word on the lefthand side to a list of words, such as :
%{SOME_VALUE} in {"John", "Peter", "Paul"}
In this example,
%{SOME_VALUE}
would be captured in property word and"John", "Peter", "Paul"
(without enclosing accolades or possible spaces after and before them) would be captured in property listThe array reference elements will possibly contain more information on word and each element in list
Also the details of elements for word can be accessed with property word_def as an array reference and likewise for list with list_def.
- regexp
-
When the lefthand side word is being compared to a regular expression.
For example :
%{HTTP_COOKIE} =~ /lang\%22\%3A\%22([a-zA-Z]+\-[a-zA-Z]+)\%22\%7D;?/
In this example,
%{HTTP_COOKIE}
would be captured in property word and/lang\%22\%3A\%22([a-zA-Z]+\-[a-zA-Z]+)\%22\%7D;?/
would be captured in property regexp and=~
would be captured in property opCheck the array reference in property elements for more details about the word and the regular expression in regexp.
Also the details of elements for word can be accessed with property word_def as an array reference and likewise for regexp with regexp_def.
- unary
-
When the following operator is used against a word :
-d, -e, -f, -s, -L, -h, -F, -U, -A, -n, -z, -T, -R
For example:
-A /some/uri.html # (same as -U) -d /some/folder # file is a directory -e /some/folder/file.txt # file exists -f /some/folder/file.txt # file is a regular file -F /some/folder/file.txt # file is a regular file and is accessible to all (Apache2 does a sub query to check) -h /some/folder/link.txt # true if file is a symbolic link -n %{QUERY_STRING} # true if string is not empty (opposite of -z) -s /some/folder/file.txt # true if file is not empty -L /some/folder/link.txt # true if file is a symbolic link (same as -h) -R 192.168.1.1/24 # remote ip match this ip block; same as %{REMOTE_ADDR} -ipmatch 192.168.1.1/24 -T %{HTTPS} # false if string is empty, "0", "off", "false", or "no" (case insensitive). True otherwise. -U /some/uri.html # check if the uri is accessible to all (Apache2 does a sub query to check) -z %{QUERY_STRING} # true if string is empty (opposite of -n)
In this example
-e /some/folder/file.txt
,e
(without leading dash) would be captured in op and/some/folder/file.txt
would be captured in wordCheck the array reference in property elements for more information about the word in word
Also the details of elements for word can be accessed with property word_def as an array reference.
See here for more information: Regexp::Common::Apache2::comp
Available properties:
- op
-
Contains the operator used. See Regexp::Common::Apache2::comp, "stringcomp" in Regexp::Common::Apache2 and "integercomp" in Regexp::Common::Apache2
This may be for unary operators :
-d, -e, -f, -s, -L, -h, -F, -U, -A, -n, -z, -T, -R
For binary operators :
==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatch
For integer comparison :
-eq, -ne, -lt, -le, -gt, -ge
For string comparison :
==, !=, <, <=, >, >=
In all the possible operators above, op contains the value, but without the leading dash, if any.
- word
-
The word being compared.
- worda
-
The first word being compared, and on the left of the operator. For example :
12 -ne 10
- wordb
-
The second word, being compared to, and on the right of the operator.
See "comp" in Regexp::Common::Apache2 for more information.
- cond
-
Type: cond
Possible sub types:
- and
-
When the condition is an ANDed expression such as :
$ap_true && $ap_false
In this case,
$ap_true
would be captured in property expr1 and$ap_false
would be captured in property expr2Also the details of elements for the variable can be accessed with property and_def as an array reference and and_expr1_def and and_expr2_def
- comp
-
Contains the expression when the condition is actually a comparison.
This will recurse and you can see more information in the array reference in the property elements. For more information on what it will contain, check the comp type.
- cond
-
Default sub type
- negative
-
When the condition is negative, ie prefixed by an exclamation mark.
For example :
!-z /some/folder/file.txt
You need to check for the details in array reference contained in property elements
Also the details of elements for the variable can be accessed with property negative_def as an array reference.
- or
-
When the condition is an ORed expression such as :
$ap_true || $ap_false
In this case,
$ap_true
would be captured in property expr1 and$ap_false
would be captured in property expr2Also the details of elements for the variable can be accessed with property and_def as an array reference and and_expr1_def and and_expr2_def
- parenthesis
-
When the condition is embedded within parenthesis
You need to check the array reference in property elements for information about the embedded condition.
Also the details of elements for the variable can be accessed with property parenthesis_def as an array reference.
- variable
-
Contains the expression when the condition is based on a variable, such as :
%{REQUEST_URI}
Check the array reference in property elements for more details about the variable, especially the property name which would contain the name of the variable; in this case :
REQUEST_URI
Also the details of elements for the variable can be accessed with property variable_def as an array reference.
Available properties:
- args
-
Function arguments. See the content of the elements array reference for more breakdown on the arguments provided.
- is_negative
-
If the condition is negative, this value is true
- name
-
Function name
See "cond" in Regexp::Common::Apache2 for more information.
- function
-
Type: function
Possible sub types: none
Available properties:
- args
-
Function arguments. See the content of the elements array reference for more breakdown on the arguments provided.
Also the details of elements for those args can be accessed with property args_def as an array reference.
- name
-
Function name
See "function" in Regexp::Common::Apache2 for more information.
- integercomp
-
Type: integercomp
Possible sub types: none
Available properties:
- op
-
Contains the operator used. See "integercomp" in Regexp::Common::Apache2
- worda
-
The first word being compared, and on the left of the operator. For example :
12 -ne 10
Also the details of elements for worda can be accessed with property worda_def as an array reference.
- wordb
-
The second word, being compared to, and on the right of the operator.
Also the details of elements for wordb can be accessed with property wordb_def as an array reference.
See "integercomp" in Regexp::Common::Apache2 for more information.
- join
-
Type: join
Possible sub types: none
Available properties:
- list
-
The list of strings to be joined. See the content of the elements array reference for more breakdown on the arguments provided.
Also the details of elements for those args can be accessed with property list_def as an array reference.
- word
-
The word used to join the list. This parameter is optional.
Details for the word parameter, if any, can be found in the elements array reference or can be accessed with the word_def property.
For example :
join({"John Paul Doe"}, ', ') # or join({"John", "Paul", "Doe"}, ', ') # or just join({"John", "Paul", "Doe"})
See "join" in Regexp::Common::Apache2 for more information.
- listfunc
-
Type: listfunc
Possible sub types: none
Available properties:
- args
-
Function arguments. See the content of the elements array reference for more breakdown on the arguments provided.
Also the details of elements for those args can be accessed with property args_def as an array reference.
- name
-
Function name
See "listfunc" in Regexp::Common::Apache2 for more information.
- regex
-
Type: regex
Possible sub types: none
Available properties:
- flags
-
Example:
mgis
- pattern
-
Regular expression pattern, excluding enclosing separators.
- sep
-
Type of separators used. It can be: /, #, $, %, ^, |, ?, !, ', ", ",", ";", ":", ".", _, and -
See "regex" in Regexp::Common::Apache2 for more information.
- stringcomp
-
Type: stringcomp
Possible sub types: none
Available properties:
- op
-
COntains the operator used. See "stringcomp" in Regexp::Common::Apache2
- worda
-
The first word being compared, and on the left of the operator. For example :
12 -ne 10
Also the details of elements for worda can be accessed with property worda_def as an array reference.
- wordb
-
The second word, being compared to, and on the right of the operator.
Also the details of elements for wordb can be accessed with property wordb_def as an array reference.
See "stringcomp" in Regexp::Common::Apache2 for more information.
- variable
-
Type: variable
Possible sub types:
- function
-
%{md5:"some arguments"}
- rebackref
-
This is a regular expression back reference, such as
$1
,$2
, etc. up to 9 - variable
-
%{REQUEST_URI} # or by enabling the legacy expressions ${REQUEST_URI}
Available properties:
- args
-
Function arguments. See the content of the elements array reference for more breakdown on the arguments provided.
- name
-
Function name, or variable name.
- value
-
The regular expression back reference value, such as
1
,2
, etc
See "variable" in Regexp::Common::Apache2 for more information.
- word
-
Type: word
Possible sub types:
- digits
-
When the word contains one or more digits.
- dotted
-
When the word contains words sepsrated by dots, such as
192.168.1.10
- function
-
When the word is a function.
- parens
-
When the word is surrounded by parenthesis
- quote
-
When the word is surrounded by single or double quotes
- rebackref
-
When the word is a regular expression back reference such as
$1
,$2
, etc up to 9. - regex
-
This is an extension I added to make work some function such as
split( /\w+/, $ip_list)
Without it, the regular expression would not be recognised as the Apache BNF stands.
- variable
-
When the word is a variable. For example :
%{REQUEST_URI}
, and it can also be a variable like${REQUEST_URI
if the legacy mode is enabled.
Available properties:
- flags
-
The regular expression flags used, such as
mgis
- parens
-
Contains an array reference of the open and close parenthesis, such as:
["(", ")"]
- pattern
-
The regular expression pattern
- quote
-
Contains the type of quote used if the sub type is quote
- regex
-
Contains the regular expression
- sep
-
The separator used in the regular expression, such as
/
- value
-
The value of the digits if the sub type is digits or rebackref
- word
-
The word enclosed in quotes
See "variable" in Regexp::Common::Apache2 for more information.
CAVEAT
This module supports well Apache2 expressions. However, some expression are difficult to process. For example:
Expressions with functions not using enclosing parenthesis:
%{REMOTE_ADDR} -in split s/.*?IP Address:([^,]+)/$1/, PeerExtList('subjectAltName')
Instead, use:
%{REMOTE_ADDR} -in split(s/.*?IP Address:([^,]+)/$1/, PeerExtList('subjectAltName'))
There is no mechanism yet to prevent infinite recursion. This needs to be implemented.
CHANGES & CONTRIBUTIONS
Feel free to reach out to the author for possible corrections, improvements, or suggestions.
AUTHOR
Jacques Deguest <jack@deguest.jp>
SEE ALSO
Apache2::SSI, Regexp::Common::Apache2, https://httpd.apache.org/docs/current/expr.html
COPYRIGHT & LICENSE
Copyright (c) 2020 DEGUEST Pte. Ltd.
You can use, copy, modify and redistribute this package and associated files under the same terms as Perl itself.