NAME

PRE - A regular expression base class

SYNOPSIS

use PRE;

DESCRIPTION

This module implements a parallel regular expression parser, and supports the conversion of a PRE to a parallel finite automata. A homegrown recursive descent parser is used to build the parse tree, and the method used to convert the parallel regular expression to a PFA is a modified Thompson's construction that accounts for the additional interleave operator (&).

Recursive Descent-safe Regex Grammar:

R  -> P

P  -> OP'

P  -> '&' OP' | epsilon

O  -> CO'

O' -> '|' CO' | epsilon

C  -> SC'

C' -> .SC' | epsilon

S  -> LS'

S' -> *S' | epsilon

L  -> a | b | c |..| 0 | 1 | 2 |..| (R) | epsilon

Terminal symbols: a,b,c,..,z,0,1,2,..,9,|,*,(,)

NOTE: Concatenation operator, '.', is not a terminal symbol
and should not be included in the regex

FAQ:
  Q: Does this support Perl regular expressions?
  A: No, just the regular expression using the terminal symbols
     listed above.

Valid terminal characters include:

a b c d e f g h i j k l m n o p q r s t u v w x y z

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

0 1 2 3 4 5 6 7 8 9 + - = , ? [ ] { } . ~ ^ @ % $

: ; < >

Methods

new

Create a brand spaking new pRE object; does not accept regex here

set_epsilon

Not really for public consumption, but could be in the future. Defines how epsilon (i.e., the null string) is represented in the parse tree.

get_epsilon_symbol

Returns the string representation of the null string

set_pre

Defines a regular expression for RE to parse through

get_pre

Returns the regular expression set by set_pre as a string

set_current

Meant for private consumption. Initializes the stack used to store what part of the regex has yet to be parsed

reset_current

Meant for private consumption. Initializes the stack used to store what part of the regex has yet to be parsed

get_current

Returns what is in the _CURRENT_STR stack

to_pfa

Not implemented, but will create an PFA using Thompson's Method; from there one could do a PFA->to_dfa, and compare the resulting dfa to the one from RE->to_dfa.

thompson

Guts of RE->to_pfa; uses depth first parse tree traversal

parse

Parses regular expressin set by RE->set_pre, and stores the parse tree in _PARSE_TREE

cat_endmarker

Adds '#', or the end of regex marker to the parse tree; not for public consumption

match

Matches current terminal symbol with terminal character, and loads up the next lookahead character

lookahead

Returns value of current lookahead

nexttoken

Sets next token as lookahead

R

R -> O

P

P -> OP'

P_prime

P' -> '&'O | epsilon

O

O -> CO'

O_prime

O' -> '|'CO' | epsilon

C

C -> SC'

C_prime

C' -> .SC' | epsilon

S

S -> LS'

S_prime

S' -> *S' | epsilon

L

L -> a | b | c |..| 0 | 1 | 2 |..| (R)

get_next_pos

Returns the next position, used in creating leaf nodes for terminal symbols (minus null string)

get_curr_pos

Returns the current count of terminal symbols (minus null string)

set_parse_tree

Set parse tree

get_parse_tree

Return parse tree

get_terminals

Returns array of terminal symbols

is_terminal

Checks to see if given character is a terminal symbol

is_member

General subroutine used to test if an element is already in an array

get_symbols

Returns array of all symbols used in current regex

trace_on

Turns on tracing - allows to trace the recursive descent parsing

trace_off

Turns tracing off

trace

Returns value of _TRACE

toggle_cat_state

Toggles cat state instead of cat'ing a '.' to everything

get_cat_state

Returns $self->{_CAT_STATE} (1|0)

set_error

Increments error count for regex parsing

get_error

Returns error count

set_done

Sets done flag

done

Returns if done or not

DESTROY

Called automatically when object is destroyed either explicitly or automatically when references go to 0 or when the main program is finished

AUTHOR

Brett D. Estrade - <estrabd AT mailcan DOT com>

CAVEATS

Currently, all states are stored as labels. There is also no integrity checking for consistency among the start, final, and set of all states.

BUGS

Not saying it is bug free, just saying I haven't hit any yet :)

AVAILABILITY

Anonymous CVS Checkout at http://www.brettsbsd.net/cgi-bin/viewcvs.cgi/

ACKNOWLEDGEMENTS

This suite of modules started off as a homework assignment for a compiler class I took for my MS in computer science at the University of Southern Mississippi.

COPYRIGHT

This code is released under the same terms as Perl.