NAME
PRE - A regular expression base class
SYNOPSIS
use PRE;
DESCRIPTION
This module implements a parallel regular expression parser, and supports the conversion of a PRE to a parallel finite automata. A homegrown recursive descent parser is used to build the parse tree, and the method used to convert the parallel regular expression to a PFA is a modified Thompson's construction that accounts for the additional interleave operator (&).
Recursive Descent-safe Regex Grammar:
R -> P
P -> OP'
P -> '&' OP' | epsilon
O -> CO'
O' -> '|' CO' | epsilon
C -> SC'
C' -> .SC' | epsilon
S -> LS'
S' -> *S' | epsilon
L -> a | b | c |..| 0 | 1 | 2 |..| (R) | epsilon
Terminal symbols: a,b,c,..,z,0,1,2,..,9,|,*,(,)
NOTE: Concatenation operator, '.', is not a terminal symbol
and should not be included in the regex
FAQ:
Q: Does this support Perl regular expressions?
A: No, just the regular expression using the terminal symbols
listed above.
Valid terminal characters include:
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9 + - = , ? [ ] { } . ~ ^ @ % $
: ; < >
Methods
new
-
Create a brand spaking new pRE object; does not accept regex here
set_epsilon
-
Not really for public consumption, but could be in the future. Defines how epsilon (i.e., the null string) is represented in the parse tree.
get_epsilon_symbol
-
Returns the string representation of the null string
set_pre
-
Defines a regular expression for RE to parse through
get_pre
-
Returns the regular expression set by
set_pre
as a string set_current
-
Meant for private consumption. Initializes the stack used to store what part of the regex has yet to be parsed
reset_current
-
Meant for private consumption. Initializes the stack used to store what part of the regex has yet to be parsed
get_current
-
Returns what is in the _CURRENT_STR stack
to_pfa
-
Not implemented, but will create an PFA using Thompson's Method; from there one could do a PFA->to_dfa, and compare the resulting dfa to the one from RE->to_dfa.
thompson
-
Guts of RE->to_pfa; uses depth first parse tree traversal
parse
-
Parses regular expressin set by RE->set_pre, and stores the parse tree in _PARSE_TREE
cat_endmarker
-
Adds '#', or the end of regex marker to the parse tree; not for public consumption
match
-
Matches current terminal symbol with terminal character, and loads up the next lookahead character
lookahead
-
Returns value of current lookahead
nexttoken
-
Sets next token as lookahead
R
-
R -> O
P
-
P -> OP'
P_prime
-
P' -> '&'O | epsilon
O
-
O -> CO'
O_prime
-
O' -> '|'CO' | epsilon
C
-
C -> SC'
C_prime
-
C' -> .SC' | epsilon
S
-
S -> LS'
S_prime
-
S' -> *S' | epsilon
L
-
L -> a | b | c |..| 0 | 1 | 2 |..| (R)
get_next_pos
-
Returns the next position, used in creating leaf nodes for terminal symbols (minus null string)
get_curr_pos
-
Returns the current count of terminal symbols (minus null string)
set_parse_tree
-
Set parse tree
get_parse_tree
-
Return parse tree
get_terminals
-
Returns array of terminal symbols
is_terminal
-
Checks to see if given character is a terminal symbol
is_member
-
General subroutine used to test if an element is already in an array
get_symbols
-
Returns array of all symbols used in current regex
trace_on
-
Turns on tracing - allows to trace the recursive descent parsing
trace_off
-
Turns tracing off
trace
-
Returns value of _TRACE
toggle_cat_state
-
Toggles cat state instead of cat'ing a '.' to everything
get_cat_state
-
Returns $self->{_CAT_STATE} (1|0)
set_error
-
Increments error count for regex parsing
get_error
-
Returns error count
set_done
-
Sets done flag
done
-
Returns if done or not
DESTROY
-
Called automatically when object is destroyed either explicitly or automatically when references go to 0 or when the main program is finished
AUTHOR
Brett D. Estrade - <estrabd AT mailcan DOT com>
CAVEATS
Currently, all states are stored as labels. There is also no integrity checking for consistency among the start, final, and set of all states.
BUGS
Not saying it is bug free, just saying I haven't hit any yet :)
AVAILABILITY
Anonymous CVS Checkout at http://www.brettsbsd.net/cgi-bin/viewcvs.cgi/
ACKNOWLEDGEMENTS
This suite of modules started off as a homework assignment for a compiler class I took for my MS in computer science at the University of Southern Mississippi.
COPYRIGHT
This code is released under the same terms as Perl.