HISTORY
0.10 -- July 12, 2004
Prerequisite
Damian Conway's NEXT module is now a prerequisite for this module.
It is standard in 5.8 (I believe). It's used to properly re-dispatch
method calls from the __object__ base classes.
Hierarchy Changes
Thanks to Mike Lambert, the inheritence system has had a complete
overhaul so that it actually *works* now. See the documentation on
writing a sub-class in Regexp::Parser::Handlers, as well as the
notes in Regexp::Parser::Hierarchy.
There are now abstract classes *anchor*, *assertion*, and *branch*.
You can't call their new() method directly, you can only call it
through an object that inherits from that class.
There are no longer *star*, *plus*, and *curly* classes; they have
been combined into one class, *quant*. You pass it the min and max,
and the object's "type" is determined dynamically.
Character Class Hashes
Character classes (*anyof* objects) now have another attribute,
"chars", which is a hash reference holding characters (eg. 'A') and
the number of times that character appeared in the character class.
The character class "[A-CB-E]" would have a character map of "{ A =>
1, B => 2, C => 2, D => 1, E => 1}". This will reflect ranges and
embedded classes (such as "[:cntrl:]" or "\p{Print}".
To aid in the "unrolling" of embedded classes, a new method of the
parser object has been added: get_property(). It takes a POSIX or
Unicode property name and returns the string defining the characters
it matches. This string is in the format described in perlunicode.
The *prop* object takes this string and creates a hash reference in
the object's "chars" attribute (as does the *anyof_class* for a
POSIX class, and the built-in Perl classes "\w", "\D", etc.). The
get_property() method relies on utf8_heavy.pl's utf8::SWASHNEW().
To determine the characters matched by your locale's "\w", "\d", and
"\s", a new parser method cache_locale() has been added. This takes
one of 'w', 'd', or 's', and returns a hash reference of non-Unicode
characters (values from 0 to 255) that are matched by that Perl
class. See the documentation for *anyof* in Regexp::Parser::Objects.
Diagnostics and Bug Fixes
"/^+/" was raising the wrong warning ("RPe_ZQUANT" instead of
"RPe_NULNUL").
Quantifier errors ("RPe_EQUANT" and "RPe_NESTED") are now raised at
on the first pass.
There is now a test of the standard diagnostic messages.
I left something out of the unicode property grammar. There can be a
caret ("^") as the first character inside the braces of a property,
negating the sense of that property. However, "\p{^A}" will render
as "\P{A}", and vice versa. This may change in future versions, but
I see no reason (at the present moment) to distinguish between
"\p{^A}" and "\P{A}".
POSIX Classes
You can no longer create your own POSIX character class handlers. I
think this is one thing that should *not* be extended. Use Unicode
properties.
0.021 -- July 3, 2004
*anyof_class* Changed
If an *anyof_class* element is a Unicode property or a Perl class
(like "\w" or "\S"), the object's "data" field points to the
underlying object type (*prop*, *alnum*, etc.). If the element is a
POSIX class, the "data" field is the string "POSIX". POSIX classes
don't exist in a regex outside of a character class, so I'm a little
wary of making them objects in their own right, even if it would
create a better sense of uniformity.
Documentation
Fixed some poor wording, and documented the problem with using
SUPER:: inside MyClass::__object__.
Bug Fixes
Character classes weren't closing properly in the tree. Fixed.
Standard escapes ("\a", "\e", etc.) were being returned as *exact*
nodes instead of *anyof_char* nodes when inside character classes.
Fixed. (Mike Lambert)
Non-grouping parentheses weren't being parsed properly. Fixed. (Mike
Lambert)
Flags weren't being turned off. Fixed.
0.02 -- July 1, 2004
Better Abstracting
The object() method calls force_object(). force_object() creates an
object no matter what pass the parser is making; object() will
return immediately if it's just the first pass. This means that
force_object() should be used to create stand-alone objects.
Each object now has an insert() method that defines how it gets
placed into the regex tree. Most objects inherit theirs from the
base object class.
The walker() method is also now abstracted -- each node it comes
across will have its walk() method called. And the ending node for
stack-type nodes has been abstracted to the ender() method of the
node.
The init() method has been moved to another file to help keep *this*
file as abstract as possible. Regexp::Parser installs its handlers
in Regexp/Parser/Handlers.pm. That file might end up being where
documentation on writing handlers goes.
The documentation on sub-classing includes an ordered list of what
packages a method is looked up in for a given object of type 'OBJ':
YourMod::OBJ, YourMod::__object__, Regexp::Parser::OBJ,
Regexp::Parser::__object__.
Cleaner Grammar Flow
Now the only places 'atom' gets pushed to the queue are after an
opening parenthesis or after 'atom' matches. This makes things flow
more cleanly.
Flag Handlers
Flag handlers now receive an additional argument that says whether
they're being turned on or off. Also, if the flag handler returns 0,
that flag is removed from the resulting object's visual flag set.
That means "(?gi-o)" becomes "(?i)".
Diagnostics and Bug Fixes
More tests added (specifically, making sure "(?(N)T|F)" works
right). In doing so, found that the "too many branches" error wasn't
being raised until the second pass. Figured out how to improve the
grammar to get it to work properly. Also added tests for the new
captures() method.
I changed the field 'class' to 'family' in objects. I was getting
confused by it, so I figured it was a sign that I'd chosen an awful
name for the field. There will still be a class() method in
__object__, but it will throw a "use of class() is deprecated"
warning.
Quantifiers of the form "{n}" were being misrepresented as "{n,}".
It's been corrected. (Mike Lambert)
"\b" was being turned into "b" inside a character class, instead of
a backspace. (Mike Lambert)
Fixed errant "Quantifier unexpected" warning raised by a zero-width
assertion followed by "?", which doesn't warrant the warning.
Added "Unrecognized escape" warnings to *all* escape sequence
handlers.
The 'g', 'c', and 'o' flags now evoke "Useless ..." warnings when
used in flag and non-capturing group constructs.
0.01 -- June 29, 2004
First Release
Documentation not complete, etc.