NAME
Regexp::Parsertron
- Parse a Perl regexp into a data structure of type Tree
Warning: Development version. See "Version Numbers" for details.
Synopsis
This is scripts/synopsis.pl:
#!/usr/bin/env perl
use v5.10;
use strict;
use warnings;
use Regexp::Parsertron;
# ---------------------
my($re) = qr/Perl|JavaScript/i;
my($parser) = Regexp::Parsertron -> new(verbose => 1);
# Return 0 for success and 1 for failure.
my($result) = $parser -> parse(re => $re);
say "Calling append(text => '|C++', uid => 6)";
$parser -> append(text => '|C++', uid => 6);
$parser -> print_raw_tree;
$parser -> print_cooked_tree;
my($as_string) = $parser -> as_string;
say "Original: $re. Result: $result. (0 is success)";
say "as_string: $as_string";
say 'Perl error count: ', $parser -> perl_error_count;
say 'Marpa error count: ', $parser -> marpa_error_count;
And its output:
Test count: 1. Parsing (in qr/.../ form): '(?^i:Perl|JavaScript)'.
Root. Attributes: {text => "Root", uid => "0"}
|--- open_parenthesis. Attributes: {text => "(", uid => "1"}
| |--- question_mark. Attributes: {text => "?", uid => "2"}
| |--- caret. Attributes: {text => "^", uid => "3"}
| |--- flag_set. Attributes: {text => "i", uid => "4"}
| |--- colon. Attributes: {text => ":", uid => "5"}
| |--- character_set. Attributes: {text => "Perl|JavaScript", uid => "6"}
|--- close_parenthesis. Attributes: {text => ")", uid => "7"}
Calling append(text => '|C++', uid => 6)
Root. Attributes: {text => "Root", uid => "0"}
|--- open_parenthesis. Attributes: {text => "(", uid => "1"}
| |--- question_mark. Attributes: {text => "?", uid => "2"}
| |--- caret. Attributes: {text => "^", uid => "3"}
| |--- flag_set. Attributes: {text => "i", uid => "4"}
| |--- colon. Attributes: {text => ":", uid => "5"}
| |--- character_set. Attributes: {text => "Perl|JavaScript|C++", uid => "6"}
|--- close_parenthesis. Attributes: {text => ")", uid => "7"}
Name Uid Text
---- --- ----
open_parenthesis 1 (
question_mark 2 ?
caret 3 ^
flag_set 4 i
colon 5 :
character_set 6 Perl|JavaScript|C++
close_parenthesis 7 )
Original: (?^i:Perl|JavaScript). Result: 0. (0 is success)
as_string: (?^i:Perl|JavaScript|C++)
Perl error count: 0
Marpa error count: 0
Note: The 1st tree is printed due to verbose => 1 in the call to "new([%opts])", while the 2nd is due to the call to "print_raw_tree()". The columnar output is due to the call to "print_cooked_tree()".
The Edit Methods
The edit methods simply means any one or more of these methods, which can all change the text of a node:
The edit methods are exercised in t/get.set.t, as well as scripts/synopsis.pl (above).
Description
Parses a regexp into a tree object managed by the Tree module, and provides various methods for updating and retrieving that tree's contents.
This module uses Marpa::R2 and Moo.
Distributions
This module is available as a Unix-style distro (*.tgz).
See http://savage.net.au/Perl-modules/html/installing-a-module.html for help on unpacking and installing distros.
Installation
Install Regexp::Parsertron
as you would any Perl
module:
Run:
cpanm Regexp::Parsertron
or run:
sudo cpan Regexp::Parsertron
or unpack the distro, and then use:
perl Makefile.PL
make (or dmake or nmake)
make test
make install
Constructor and Initialization
new()
is called as my($parser) = Regexp::Parsertron -> new(k1 => v1, k2 => v2, ...)
.
It returns a new object of type Regexp::Parsertron
.
Key-value pairs accepted in the parameter list (see corresponding methods for details [e.g. "re([$regexp])"]):
- o re => $regexp
-
The
does()
method of Scalar::Does is called to see whatre
is. If it's already of the formqr/$re/
, then it's processed as is, but if it's not, then it's transformed usingqr/$re/
.Warning: Currently, the input is expected to have been pre-processed by Perl via qr/$regexp/.
Default: ''.
- o verbose => $integer
-
Takes values 0, 1 or 2, which print more and more progress reports.
Used for debugging.
Default: 0 (print nothing).
Methods
append(%opts)
Append some text to the text of a node.
%opts is a hash with these (key => value) pairs:
See scripts/synopsis.pl for sample code.
Note: Calling append()
never changes the uids of nodes, so repeated calling of append()
with the same uid
will apply more and more updates to the same node.
See also "get(%opts)", "prepend(%opts)", "set(%opts)" and t/get.set.t.
as_string()
Returns the parsed regexp as a string. The string contains all edits applied with methods such as "append(%opts)".
error_str()
Returns the last error, as a string.
Errors will be in 1 of 2 categories:
- o Perl errors
-
These arise when Perl cannot interpret the string form of the regexp supplied by you, when the code checks it using qr/$re/.
- o Marpa errors
-
These arise when the BNF within the module is such that the string form of the regexp cannot be parsed by Marpa.
If you can use the regexp in Perl code, then you should never get this error. In other words, if Perl accepts the regexp and the module does not, then the BNF in this module is wrong (barring bugs in Perl of course).
See also "marpa_error_count()", "perl_error_count()" and "warning_str()".
get($uid)
Get the text of the node whose uid is $uid.
Returns undef if the given $uid is not found.
See also "append(%opts)", "prepend(%opts)" and "set(%opts)".
marpa_error_count()
Returns an integer count of errors detected by Marpa. This value should always be 0.
See also "error_str()", "perl_error_count()" and "warning_str()".
Used basically for debugging.
new([%opts])
Here, '[]' indicate an optional parameter.
See "Constructor and Initialization" for details on the parameters accepted by "new()".
parse([%opts])
Here, '[]' indicate an optional parameter.
Parses the regexp supplied with the parameter re
in the call to "new()" or in the call to "re($regexp)", or in the call to parse(re => $regexp)
itself. The latter takes precedence.
The hash %opts
takes the same (key => value) pairs as "new()" does.
See "Constructor and Initialization" for details.
perl_error_count()
Returns an integer count of errors detected by perl. This value should always be 0.
See also "error_str()" , "marpa_error_count()" and "warning_str()".
Used basically for debugging.
prepend(%opts)
Prepend some text to the text of a node.
%opts is a hash with these (key => value) pairs:
Note: Calling prepend()
never changes the uids of nodes, so repeated calling of prepend()
with the same uid
will apply more and more updates to the same node.
See also "append(%opts)", "get(%opts)" and "set(%opts)" and t/get.set.t.
print_cooked_tree()
Prints, in a pretty format, the tree built from parsing.
See the </Synopsis> for sample output.
See also "print_raw_tree".
print_raw_tree()
Prints, in a simple format, the tree built from parsing.
See the </Synopsis> for sample output.
See also "print_cooked_tree".
re([$regexp])
Here, '[]' indicate an optional parameter.
Gets or sets the regexp to be processed.
Note: re
is a parameter to "new([%opts])".
reset()
Resets various internal things, except test_count.
Used basically for debugging.
set(%opts)
Set the text of a node to $opt{text}.
%opts is a hash with these (key => value) pairs:
- o text => $string
-
The text to use to overwrite the text of the node.
- o uid => $uid
-
The uid of the node to update.
See also "append(%opts)", "prepend(%opts)" and "get(%opts)".
tree()
Returns an object of type Tree. Ignore the root node.
Each node's meta
method returns a hashref of information about the node. See the "FAQ" for details.
See also the source code for "print_cooked_tree()" and "print_raw_tree()" for ideas on how to use this object.
uid()
Returns the last-used uid.
Each node in the tree is given a uid, which allows methods like "append(%opts)" to work.
verbose([$integer])
Here, '[]' indicate an optional parameter.
Gets or sets the verbosity level, within the range 0 .. 2. Higher numbers print more progress reports.
Used basically for debugging.
Note: verbose
is a parameter to "new([%opts])".
warning_str()
Returns the last Marpa warning, as a string.
See also "error_str()", "perl_error_count()" and "marpa_error_count()".
FAQ
How do I use this module?
Herewith a brief tutorial.
- o Start with a simple program and a simple regexp
-
This code, scripts/tutorial.pl, is a cut-down version of scripts/synopsis.pl:
#!/usr/bin/env perl use v5.10; use strict; use warnings; use Regexp::Parsertron; # --------------------- my($re) = qr/Perl|JavaScript/i; my($parser) = Regexp::Parsertron -> new(verbose => 1); # Return 0 for success and 1 for failure. my($result) = $parser -> parse(re => $re); say "Original: $re. Result: $result. (0 is success)";
Running it outputs:
Test count: 1. Parsing (in qr/.../ form): '(?^i:Perl|JavaScript)'. Root. Attributes: {text => "Root", uid => "0"} |--- open_parenthesis. Attributes: {text => "(", uid => "1"} | |--- question_mark. Attributes: {text => "?", uid => "2"} | |--- caret. Attributes: {text => "^", uid => "3"} | |--- flag_set. Attributes: {text => "i", uid => "4"} | |--- colon. Attributes: {text => ":", uid => "5"} | |--- character_set. Attributes: {text => "Perl|JavaScript", uid => "6"} |--- close_parenthesis. Attributes: {text => ")", uid => "7"} Original: (?^i:Perl|JavaScript). Result: 0. (0 is success)
- o Examine the tree and determine which nodes you wish to edit
-
The nodes are uniquely identified by their uids.
- o Proceed as does scripts/synopsis.pl
-
Add these lines to the end of the tutorial code, and re-run:
$parser -> append(text => '|C++', uid => 6); $parser -> print_raw_tree;
The extra output, showing node uid == 6, is:
Root. Attributes: {text => "Root", uid => "0"} |--- open_parenthesis. Attributes: {text => "(", uid => "1"} | |--- question_mark. Attributes: {text => "?", uid => "2"} | |--- caret. Attributes: {text => "^", uid => "3"} | |--- flag_set. Attributes: {text => "i", uid => "4"} | |--- colon. Attributes: {text => ":", uid => "5"} | |--- character_set. Attributes: {text => "Perl|JavaScript|C++", uid => "6"} |--- close_parenthesis. Attributes: {text => ")", uid => "7"}
- o Test also with "prepend(%opts)" and "set(%opts)"
-
See t/get.set.t for sample code.
- o Since everything works, make a cup of tea
What is the purpose of this module?
- o To provide a stand-alone parser for regexps
- o To help me learn more about regexps
- o To become, I hope, a replacement for the horrendously complex Regexp::Assemble
What is the format of the nodes in the tree build by this module?
Each node's name
is the name of the Marpa-style event which was triggered by detection of some text
within the regexp.
Each node's meta
method returns a hashref with these (key => value) pairs:
- o text => $string
-
This is the text within the regexp which triggered the event just mentioned.
- o uid => $integer
-
This is the unqiue id of the 'current' node.
This <uid> is often used by you to specify which node to work on.
See also the source code for "print_cooked_tree()" and "print_raw_tree()" for ideas on how to use this object.
See the "Synopsis" for sample code and a report after parsing a tiny regexp.
Does this module interpret regexps in any way?
No. You have to run your own Perl code to do that. This module just parses them into a data structure.
And that really means this module does not match the regexp against anything. If I appear to do that while debugging new code, you can't rely on that appearing in production versions of the module.
Does this module re-write regexps?
No, unless you call one of "The Edit Methods".
Does this module handle both Perl 5 and Perl 6?
No. It will only handle Perl 5 syntax.
Does this module handle regexps for various versions of Perl5?
Not yet. Version-dependent regexp syntax will be supported for recent versions of Perl. This is done by having tokens within the BNF which are replaced at start-up time with version-dependent details.
There are no such tokens at the moment.
All debugging is done assuming the regexp syntax as documented online. See "References" for the urls in question.
So which version of Perl is supported?
I'm (2018-01-14) using Perl V 5.20.2 and making the BNF match the Perl regexp docs listed in </References> below.
Is this a (Marpa) exhaustion-hating or exhaustion-loving app?
Exhaustion-loving.
In short, Marpa will always report 'Marpa parse exhausted', but this is not an error.
See https://metacpan.org/pod/distribution/Marpa-R2/pod/Exhaustion.pod#Exhaustion
Will this code be modified to run under Marpa::R3 when the latter is stable?
Yes.
Scripts
This diagram indicates the flow of logic from script to script:
xt/author/re_tests
|
V
xt/author/generate.tests.pl
|
V
xt/authors/perl-5.21.11.tests
|
V
perl -Ilib t/perl-5.21.11.t > xt/author/perl-5.21.11.log 2>&1
If xt/author/perl-5.21.11.log only contains lines starting with 'ok', then all Perl and Marpa errors have been hidden, so t/perl-5.21.11.t is ready to live in t/. Before that time it lives in xt/author/.
TODO
- o How to best define 'code' in the BNF.
- o Things to be aware of:
- o I could traverse the tree and store a pointer to each node in an array
-
This would mean fast access to nodes in random order.
References
http://www.pcre.org/. PCRE - Perl Compatible Regular Expressions.
http://perldoc.perl.org/perlre.html. This is the definitive document.
http://perldoc.perl.org/perlrecharclass.html#Extended-Bracketed-Character-Classes.
http://perldoc.perl.org/perlretut.html. Samples with commentary.
http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators
http://perldoc.perl.org/perlrequick.html
http://perldoc.perl.org/perlrebackslash.html
http://www.nntp.perl.org/group/perl.perl5.porters/2016/02/msg234642.html
See Also
Regexp::SAR. This is vaguely a version of Set::FA::Element.
And many others...
Machine-Readable Change Log
The file Changes was converted into Changelog.ini by Module::Metadata::Changes.
Version Numbers
Version numbers < 1.00 represent development versions. From 1.00 up, they are production versions.
Repository
https://github.com/ronsavage/Regexp-Parsertron
References
https://code.activestate.com/lists/perl5-porters/209610/
Support
Email the author, or log a bug on RT:
https://rt.cpan.org/Public/Dist/Display.html?Name=Regexp::Parsertron.
Author
Regexp::Parsertron was written by Ron Savage <ron@savage.net.au> in 2011.
Marpa's homepage: http://savage.net.au/Marpa.html.
Copyright
Australian copyright (c) 2016, Ron Savage.
All Programs of mine are 'OSI Certified Open Source Software';
you can redistribute them and/or modify them under the terms of
The Artistic License 2.0, a copy of which is available at:
http://opensource.org/licenses/alphabetical.