NAME
Parse::Eyapp::datagenerationtut - Tutorial on Using Parse::Eyapp as a Data Generator for Testing
INTRODUCTION
The examples for this tutorial can be found in the directory examples/generator
in the distribution of Parse::Eyapp
.
To understand the code you will need some familiarity with Test::LectroTest::Generator, however, we will make an attempt to introduce the basics of Test::LectroTest::Generator needed.
While parsing is the process of determining the membership of a string to a language, generation is the reverse problem. Using a context free grammar it is possible to generate strings belonging to the language described by that grammar.
Context free grammars can be used to generate tests. The programmer designs a grammar that defines a set of inputs that will be able to find some set of bugs.
This tutorial shows how to use Parse::Eyapp to generate phrases belonging to the language defined by a given grammar. We will generate inputs to test a simple calculator.
Compiling and Running the Example
The grammar describing the language is in the file Generator.eyp
. Calling eyapp with option -c
will show the contents of the file without the semantic actions:
Parse-Eyapp/examples/generator$ eyapp -c Generator.eyp
# file: Generator.eyp
# compile with: eyapp -b '' Generator.eyp
# then run: ./Generator.pm
%strict
%token NUM VARDEF VAR
%right '='
%left '-' '+'
%left '*' '/'
%left NEG
%right '^'
%%
stmts:
stmt
| stmts ';' stmt
;
stmt:
VARDEF '=' exp
;
exp:
NUM
| VAR
| exp '+' exp
| exp '-' exp
| exp '*' exp
| exp '/' exp
| '-' exp %prec NEG
| exp '^' exp
| '(' exp ')'
;
%%
This grammar defines a language of sequences of semicolon separated assignments. The right hand side of an assignment can be any valid arithmetic expression including numbers and variables.
First we compile the grammar with option -b ''
to produce a modulino:
Parse-Eyapp/examples/generator$ eyapp -b '' Generator.eyp
Now, the module has execution permits and its first line contains the #!
header:
Parse-Eyapp/examples/generator$ ls -ltr | tail -1
-rwxr-xr-x 1 lusasoft lusasoft 7844 2009-01-12 08:30 Generator.pm
Parse-Eyapp/examples/generator$ head -1 Generator.pm
#!/usr/bin/perl
The use of option -b ''
combined with the fact that we have added these lines
67 unless (caller) {
68 __PACKAGE__->main(@ARGV);
69 }
at the end of the grammar file Generator.eyp
provide the generated file with a dual nature: it is a module and an executable at the same time. This is what is know as a modulino (term coined by Brian d Foy).
Here follows the results of several executions. Each run produces a set of assignments. The first output line reports the result of the randomly generated program.
Parse-Eyapp/examples/generator$ ./Generator.pm
# result: -3
SC=-3
# result: error. Division by zero.
M=(-4/6+4)+9+2*8*9+4;
XQ=8/3;
EI=XQ*2/0/0;
BL=5+EI+4/5/XQ
As you can see in the former run, only variables that were defined in previous assignments are used in later assignments. However, the generated source may produce run-time errors and exceptions (which is good thing when testing a calculator).
Parse-Eyapp/examples/generator$ ./Generator.pm
# result: 6
CF=(6)
Parse-Eyapp/examples/generator$ ./Generator.pm
# result: -710.2
I=(3*-8+7/5);
R=2+8*I*4+5*2+I/I
Parse-Eyapp/examples/generator$ ./Generator.pm
# result: Calculator syntax differs from Perl...
RY=2--2+(3+6)+(7*7*4^1+2*0/8*5/3)
GENERATING PHRASES FROM A CONTEXT FREE GRAMMAR
Using YYExpect
and Test::LectroTest::Generator to generate tokens
The basic idea of using Parse::Eyapp to generate phrases for the language defined by a given context free grammar is simple: change the lexer by a token generator. Instead of reading from some input, randomly generate one of the valid tokens.
We can use the method YYExpect
to know what tokens are valid. For versions 1.137 and later of Parse::Eyapp, the method YYExpect
returns the set of valid tokens at the time it is called. For previous versions (and this is also true for Parse::Yapp), YYExpect
only returns a subset of the whole set of valid tokens.
In this example, the token generator has been isolated in the sub gen_lexer
in the file GenSupport.pm
:
47 sub gen_lexer {
48 my $parser = shift;
49
50 my $token = $parser->generate_token;
51 my $attr = $parser->generate_attribute($token);
52 #$attr = $WHITESPACES->generate.$attr;
53
54 return ($token, $attr);
55 }
The token and its attribute are generated in lines 50 and 51. The methods generate_token
and generate_attribute
are also in the module GenSupport.pm
. They are methods of the parser object since the grammar Generator.eyp
not only uses but inherits this module. See line 3 of Generator.eyp
:
Parse-Eyapp/examples/generator$ sed -ne '19,24p' Generator.eyp | cat -n
1 %{
2 use base q{Parse::Eyapp::TokenGen};
3 use base q{GenSupport};
4 %}
5
6 %%
The method generate_token
obtains the set of valid tokens using YYExpect
(line 29). Then uses the Frequency
function in Test::LectroTest::Generator to produce a Test::LectroTest::Generator
object (line 31). The method generate
of such object is used to generate the actual token (line 33).
26 sub generate_token {
27 my $parser = shift;
28
29 my @token = $parser->YYExpect;
30
31 my $tokengen = Frequency( map { [$parser->token_weight($_), Unit($_)] } @token);
32
33 return $tokengen->generate;
34 }
The Parse::Eyapp::TokenGen method token_weight
returns the weight associated with a token, assuming it was previously set using one of the Parse::Eyapp::TokenGen methods like set_tokenweightsandgenerators
or set_tokenweights
. See the code of method main
in GenSupport.pm
:
examples/generator$ sed -ne '98,/^ *)/p' GenSupport.pm | cat -n
1 my $parser = $package->new();
2
3 $parser->set_tokenweightsandgenerators(
4 NUM => [ 2, Int(range=>[0, 9], sized=>0)],
5 VAR => [
6 0, # At the beginning, no variables are defined
7 Gen {
8 return Elements(keys %st)->generate if keys %st;
9 return Int(range=>[0, 9], sized=>0)->generate;
10 },
11 ],
12 VARDEF => [
13 2,
14 String( length=>[1,2], charset=>"A-NP-Z", size => 100 )
15 ],
16 '=' => 2, '-' => 1, '+' => 2,
17 '*' => 4, '/' => 2, '^' => 0.5,
18 ';' => 1, '(' => 1, ')' => 2,
19 '' => 2, 'error' => 0,
20 );
A Brief Introduction to Test::LectroTest::Generator
The module GenSupport.pm
uses Test::LectroTest::Generator to build generators for the required tokens. Thus the call to
Int(range=>[0, 9], sized=>0)
builds a Test::LectroTest::Generator object that produces integers in the range [0,9]. Such objects have a method generate
that produces the actual item. The following debugger session illustrates the way to use Test::LectroTest::Generator:
pl@europa:~/LEyapp$ perl -wde 0
main::(-e:1): 0
DB<1> use Test::LectroTest::Generator qw{:all}
DB<2> $i = Int(range=>[0, 9], sized=>0)
DB<3> p $i->generate
6
DB<4> p $i->generate
9
The String
method builds a Test::LectroTest::Generator object that produces strings:
DB<5> $v = String( length=>[1,2], charset=>"A-NP-Z", size => 100 )
DB<6> p $v->generate
HM
DB<7> p $v->generate
Y
DB<8> p $v->generate
KE
The Elements
method builds a Test::LectroTest::Generator object that produces one of a given list of elements:
DB<9> @a = map { $v->generate } 1..10
DB<10> x @a
0 'UC'
1 'P'
2 'IF'
3 'EJ'
4 'H'
5 'VC'
6 'CF'
7 'K'
8 'T'
9 'IG'
DB<11> $x = Elements(@a)
DB<12> p $x->generate
P
DB<13> p $x->generate
P
DB<14> p $x->generate
EJ
DB<15> p $x->generate
VC
Even more interesting for our purpose is the Frequency
method, which produces one of a given list of elements with a given probability distribution.
The following example illustrates its use. First we build a weight list where the odd elements have weight 2 and the even elements have weight 1:
DB<16> @w = map { $_ % 2 ? 2 : 1 } 0..9
DB<21> @w{@a} = @w
DB<24> x \%w
0 HASH(0xd3cc80)
'CF' => 1
'EJ' => 2
'H' => 1
'IF' => 1
'IG' => 2
'K' => 2
'P' => 2
'T' => 1
'UC' => 1
'VC' => 2
We now use Frequency
to build a Test::LectroTest::Generator object that produces one of the given list of elements @a
according to the specified probability:
DB<29> $f = Frequency( map { [$w{$_}, Unit($_)] } @a)
Let us generate 10 items. We see that odd elements appear more frequently than even elements:
DB<30> @r = map { $f->generate } 1..10
DB<31> p "@r"
VC UC K UC VC VC K EJ P P
Generating Token Attributes
Once the token was generated through the call to generate_token
at line 50:
45 #my $WHITESPACES = String( length=>[0,1], charset=>" \t\n", size => 100 );
46
47 sub gen_lexer {
48 my $parser = shift;
49
50 my $token = $parser->generate_token;
51 my $attr = $parser->generate_attribute($token);
52 #$attr = $WHITESPACES->generate.$attr;
53
54 return ($token, $attr);
55 }
the associated attributed is generated via the generate_attribute
method in GenSupport.pm
. If needed, random combination of white spaces can be added to the generated attribute via an appropriate generator (line 52).
The generate_attribute
method uses the method generate
of the generator associated with such token. If no generator object was set, the attribute returned is the token itself (line 42):
36 sub generate_attribute {
37 my $parser = shift;
38 my $token = shift;
39
40 my $gen = $parser->token_generator($token);
41 return $gen->generate if defined($gen);
42 return $token;
43 }
Holding Semantic Constraints
The attribute generator associated with the token VAR
is more complex than the others. It was defined in the call to set_tokenweightsandgenerators
:
examples/generator$ sed -ne '98,/^ *)/p' GenSupport.pm | cat -n
1 my $parser = $package->new();
2
3 $parser->set_tokenweightsandgenerators(
4 NUM => [ 2, Int(range=>[0, 9], sized=>0)],
5 VAR => [
6 0, # At the beginning, no variables are defined
7 Gen {
8 return Elements(keys %st)->generate if keys %st;
9 return Int(range=>[0, 9], sized=>0)->generate;
10 },
11 ],
12 VARDEF => [
13 2,
14 String( length=>[1,2], charset=>"A-NP-Z", size => 100 )
15 ],
16 '=' => 2, '-' => 1, '+' => 2,
17 '*' => 4, '/' => 2, '^' => 0.5,
18 ';' => 1, '(' => 1, ')' => 2,
19 '' => 2, 'error' => 0,
20 );
The Gen
function of Test::LectroTest::Generator creates a new generator from a given code. Since a variable can't be used unless it is defined, we use a symbol table %st
to keep record of the variables that were defined in previous assignments. If no defined variables exists, the defined generator returns a digit between 0 and 9.
Each time a new assignment to a variable occurs, such variable is added to the symbol table. This is achieved through the semantic action associated with the assignment production rule:
examples/generator$ sed -ne '35,41p' Generator.eyp | cat -n
1 stmt:
2 VARDEF '=' exp
3 {
4 my $parser = shift;
5 $parser->defined_variable($_[0]);
6 "$_[0]=$_[2]";
7 }
The defined_variable
method in GenSupport.pm
simply sets the corresponding entry in the symbol table:
examples/generator$ sed -ne '19,24p' GenSupport.pm | cat -n
1 my %st; # Symbol Table
2 sub defined_variable {
3 my ($parser, $var) = @_;
4
5 $st{$var} = 1;
6 }
The semantic action associated with VARDEF '=' exp
returns the string "$_[0]=$_[2]"
containing the actual phrase. Since this is the semantic action required for most productions we make it our default action:
examples/generator$ sed -ne '13,17p' Generator.eyp | cat -n
1 %defaultaction {
2 my $parser = shift;
3
4 return join '', @_;
5 }
The syntactic variable stmts
generates sequences of stmt
separated by semicolons:
examples/generator$ sed -ne '26,33p' Generator.eyp | cat -n
1 stmts:
2 stmt
3 {
4 $_[0]->deltaweight(VAR => +1); # At least one variable is defined now
5 $_[1];
6 }
7 | stmts ';' { "\n" } stmt
8 ;
The second production is left recursive. As a consequence, the stmt
in the first production (line 2) is the first statement of the sequence. A small derivation can convince you of this property:
stmts-> stmt
stmts => stmts';' stmt => stmts';' stmt ';' stmt => stmt ';' stmt ';' stmt
----
Thus, when the reduction by the production stmts -> stmt
occurs, we are sure that the first statement has been processed. In such case we increase the weight of token VAR
one unit (which was initially zero, see the call to set_tokenweightsandgenerators
),
$_[0]->deltaweight(VAR => +1);
The weight of VAR
is now 1, giving chances for variables to appear in the right hand side of an assignment. The Parse::Eyapp::Tokengen method deltaweight
increases (decreases if negative) the weight of the given tokens using the associated values.
Dynamically Changing the Probability Distribution
The semantic actions for the productions
exp -> '(' exp ')'
and
exp -> '-' exp
show a way to modify the weights associated with some tokens:
43 exp:
44 NUM
45 | VAR
46 | exp '+' exp
47 | exp '-' exp
48 | exp '*' exp
49 | exp '/' exp
50 | '-' { $_[0]->pushdeltaweight('-' => -1) } exp %prec NEG
51 {
52 $_[0]->popweight();
53 "-$_[3]"
54 }
55 | exp '^' exp
56 | '(' { $_[0]->pushdeltaweight('(' => -1, ')' => +1, '+' => +1, ); }
57 exp
58 ')'
59 {
60 $_[0]->popweight;
61 "($_[3])"
62 }
63 ;
After seeing a '('
we decrease by one the weight of '('
to avoid expressions with nested parenthesis. We also increase the weight of token '+'
, since parenthesis are often used to give more priority to a sum over a multiplication or division. This is achieved via the pushdeltaweight
method. The old weight is recovered after the closing parenthesis is seen using the popweight
method.
Computing the Expected Result
Function evaluate_using_perl
in GenSupport.pm
finds the expected value for the generated expression. The calculator expression is roughly translated to a Perl expression and evaluated using the Perl interpreter:
57 sub evaluate_using_perl { # if possible
58 my $perlexp = shift;
59
60 $perlexp =~ s/\b([a-zA-Z])/\$$1/g; # substitute A by $A everywhere
61 $perlexp =~ s/\^/**/g; # substitute power operator: ^ by **
62
63 my $res = eval "no warnings; no strict;$perlexp";
64 if ($@ =~ /Illegal division/) {
65 $res = "error. Division by zero.";
66 }
67 elsif ($@) { # Our calc notation is incompatible with perl in a few gotchas
68 # Perl interprets -- in a different way
69 $@ =~ m{(.*)}; # Show only the first line of error message
70 $res = "Calculator syntax differs from Perl. Can't compute the result: $1";
71 }
72
73 $res;
74 }
The calculator language differs from Perl. In the calculator, two consecutive minus like in 2--3
are interpreted as 2+3
while for Perl the former expression is an error. This limitation is here to illustrate a limitation of the approach: it gives a way to generate complex structured inputs but the programmer must find a way to compute what the expected value is.
APPENDIX: FILES
File GenSupport.pm
Parse-Eyapp/examples/generator$ cat -n GenSupport.pm
1 package GenSupport;
2 use strict;
3 use warnings;
4
5 use Getopt::Long;
6 use Test::LectroTest::Generator qw(:all);
7 use Parse::Eyapp::TokenGen;
8
9 sub _Error {
10 my $parser = shift;
11
12 my $t = $parser->YYCurval;
13 my @e = $parser->YYExpect();
14 my $attr = $parser->YYSemval(0);
15 local $" = " ";
16 warn "Error:\nCurrent attribute: <$attr>\nCurrent token: <$t>\nExpected: <@e>\n";
17 }
18
19 my %st; # Symbol Table
20 sub defined_variable {
21 my ($parser, $var) = @_;
22
23 $st{$var} = 1;
24 }
25
26 sub generate_token {
27 my $parser = shift;
28
29 my @token = $parser->YYExpect;
30
31 my $tokengen = Frequency( map { [$parser->token_weight($_), Unit($_)] } @token);
32
33 return $tokengen->generate;
34 }
35
36 sub generate_attribute {
37 my $parser = shift;
38 my $token = shift;
39
40 my $gen = $parser->token_generator($token);
41 return $gen->generate if defined($gen);
42 return $token;
43 }
44
45 #my $WHITESPACES = String( length=>[0,1], charset=>" \t\n", size => 100 );
46
47 sub gen_lexer {
48 my $parser = shift;
49
50 my $token = $parser->generate_token;
51 my $attr = $parser->generate_attribute($token);
52 #$attr = $WHITESPACES->generate.$attr;
53
54 return ($token, $attr);
55 }
56
57 sub evaluate_using_perl { # if possible
58 my $perlexp = shift;
59
60 $perlexp =~ s/\b([a-zA-Z])/\$$1/g; # substitute A by $A everywhere
61 $perlexp =~ s/\^/**/g; # substitute power operator: ^ by **
62
63 my $res = eval "no warnings; no strict;$perlexp";
64 if ($@ =~ /Illegal division/) {
65 $res = "error. Division by zero.";
66 }
67 elsif ($@) { # Our calc notation is incompatible with perl in a few gotchas
68 # Perl interprets -- in a different way
69 $@ =~ m{(.*)}; # Show only the first line of error message
70 $res = "Calculator syntax differs from Perl. Can't compute the result: $1";
71 }
72
73 $res;
74 }
75
76
77 sub Run {
78 my($self)=shift;
79 my $yydebug = shift || 0;
80
81 return $self->YYParse(
82 yylex => \&gen_lexer,
83 yyerror => \&_Error,
84 yydebug => $yydebug, # 0x1F
85 );
86 }
87
88 sub main {
89 my $package = shift;
90
91 my $debug = shift || 0;
92 my $result = GetOptions (
93 "debug!" => \$debug,
94 );
95
96 $debug = 0x1F if $debug;
97
98 my $parser = $package->new();
99
100 $parser->set_tokenweightsandgenerators(
101 NUM => [ 2, Int(range=>[0, 9], sized=>0)],
102 VAR => [
103 0, # At the beginning, no variables are defined
104 Gen {
105 return Elements(keys %st)->generate if keys %st;
106 return Int(range=>[0, 9], sized=>0)->generate;
107 },
108 ],
109 VARDEF => [
110 2,
111 String( length=>[1,2], charset=>"A-NP-Z", size => 100 )
112 ],
113 '=' => 2, '-' => 1, '+' => 2,
114 '*' => 4, '/' => 2, '^' => 0.5,
115 ';' => 1, '(' => 1, ')' => 2,
116 '' => 2, 'error' => 0,
117 );
118
119 my $exp = $parser->Run( $debug );
120
121 my $res = evaluate_using_perl($exp);
122
123 print "\n# result: $res\n$exp\n";
124 }
125
126 1;
File Generator.eyp
Parse-Eyapp/examples/generator$ cat -n Generator.eyp
1 # file: Generator.eyp
2 # compile with: eyapp -b '' Generator.eyp
3 # then run: ./Generator.pm
4 %strict
5 %token NUM VARDEF VAR
6
7 %right '='
8 %left '-' '+'
9 %left '*' '/'
10 %left NEG
11 %right '^'
12
13 %defaultaction {
14 my $parser = shift;
15
16 return join '', @_;
17 }
18
19 %{
20 use base q{Parse::Eyapp::TokenGen};
21 use base q{GenSupport};
22 %}
23
24 %%
25
26 stmts:
27 stmt
28 {
29 $_[0]->deltaweight(VAR => +1); # At least one variable is defined now
30 $_[1];
31 }
32 | stmts ';' { "\n" } stmt
33 ;
34
35 stmt:
36 VARDEF '=' exp
37 {
38 my $parser = shift;
39 $parser->defined_variable($_[0]);
40 "$_[0]=$_[2]";
41 }
42 ;
43 exp:
44 NUM
45 | VAR
46 | exp '+' exp
47 | exp '-' exp
48 | exp '*' exp
49 | exp '/' exp
50 | '-' { $_[0]->pushdeltaweight('-' => -1) } exp %prec NEG
51 {
52 $_[0]->popweight();
53 "-$_[3]"
54 }
55 | exp '^' exp
56 | '(' { $_[0]->pushdeltaweight('(' => -1, ')' => +1, '+' => +1, ); }
57 exp
58 ')'
59 {
60 $_[0]->popweight;
61 "($_[3])"
62 }
63 ;
64
65 %%
66
67 unless (caller) {
68 __PACKAGE__->main(@ARGV);
69 }
SEE ALSO
Test::LectroTest::Generator by Tom Moertel
The Design and Implementation of a Grammar-based Data Generator (1992) by Peter M. Maurer, Software Practice and Experience http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/./vol22/issue3/spe756pm.pdf
yagg: an easy-to-use generator for structured test inputs by David Coppit and Jiexin Lian. ASE '05: Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering. 2005, pages 356-359.
Writing Randomly by Randall Schwartz. Linux Magazine Column 04 (Sep 1999). http://www.stonehenge.com/merlyn/LinuxMag/col04.html
Generating Test Data with Enhanced Context Free Grammars by Peter M. Maurer http://cs.baylor.edu/~maurer/aida/dgl-source/documentation/gen_tst.pdf
Modules as Programs by Brian d Foy http://www252.pair.com/comdog/mastering_perl/Chapters/18.modulinos.html
How a Script Becomes a Module by Brian d Foy. On Perlmonks: http://www.perlmonks.org/index.pl?node_id=396759.
AUTHOR
Casiano Rodriguez-Leon (casiano@ull.es)
ACKNOWLEDGMENTS
This work has been supported by CEE (FEDER) and the Spanish Ministry of Educacion y Ciencia through Plan Nacional I+D+I number TIN2005-08818-C04-04 (ULL::OPLINK project http://www.oplink.ull.es/). Support from Gobierno de Canarias was through GC02210601 (Grupos Consolidados). The University of La Laguna has also supported my work in many ways and for many years.
A large percentage of code is verbatim taken from Parse::Yapp 1.05. The author of Parse::Yapp is Francois Desarmenien.
I wish to thank Francois Desarmenien for his Parse::Yapp module, to my students at La Laguna and to the Perl Community. Special thanks to my family and Larry Wall.
LICENCE AND COPYRIGHT
Copyright (c) 2006-2008 Casiano Rodriguez-Leon (casiano@ull.es). All rights reserved.
Parse::Yapp copyright is of Francois Desarmenien, all rights reserved. 1998-2001
These modules are free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.