NAME

Parse::Eyapp::debuggingtut - Debugging Parse::Eyapp Programs

INTRODUCTION

The sources of error when programming with eyapp are many and various. Some of them are minor, as having a nonterminal without production rules or a terminal that is never produced by the lexical analyzer. These kind of errors can be catched with the help of the %strict directive.

In the following sections we will discuss three main kind of errors that correspond to three development stages:

Conflict errors:

Conflicts with the grammar: the grammar is ambiguous or is not clear - perhaps due to the fact that eyapp uses only a lookahead symbol - which sort of tree must be built for some inputs
Tree building errors:

There are no conflicts but the parser does not build the syntax tree as expected. May be it rejects correct sentences or accepts incorrect ones. Or may be it accepts correct ones but the syntax tree has not the shape we want (i.e. we have a precedence problem).
Semantic errors:

We have solved the conflicts and trees are satisfactory but we have errors inside the semantic actions.

Each time you discover an error write a test that covers that error. Section "TREE EQUALITY" deals with the problem of checking if the gnerated abstract syntax tree has the correct shape and attributes.

As Andreas Zeller points out in his article "Beautiful Debugging" finding the causes of a failing program must follow the scientific method:

1. Observe the failure (there are conflicts or ambiguity, there are precedence problems, there are semantic errors, the output is wrong)
2. Guess a hypothesis for the failure (if necessary use eyapp -v option, yydebug, the Perl debugger, etc. to build the hypothesis). If you use continous testing it is likely related with the recently written code.
3. Based on your hypothesis make predictions
3. Using appropriate input tests and the available tools (eyapp -v option, yydebug, the Perl debugger, etc.) see if your predictions hold. Reject your hypothesis if they don't hold.
4. Repeat the last two steps until your hypothesis is confirmed. The hypothesis then becomes a theory.
5. Convert the knowledge and informal tests developed during this process in a formal test that covers the failure

THE `%strict` DIRECTIVE

By default, identifiers appearing in the rule section will be classified as terminal if they don't appear in the left hand side of any production rules.

The directive %strict forces the declaration of all tokens. The following eyapp program issues a warning:

pl@nereida:~/LEyapp/examples/eyapplanguageref$ cat -n bugyapp2.eyp
     1  %strict
     2  %%
     3  expr: NUM;
     4  %%
pl@nereida:~/LEyapp/examples/eyapplanguageref$ eyapp bugyapp2.eyp
Warning! Non declared token NUM at line 3 of bugyapp2.eyp

To keep silent the compiler declare all tokens using one of the token declaration directives (%token, %left, etc.)

pl@nereida:~/LEyapp/examples/eyapplanguageref$ cat -n bugyapp3.eyp
     1  %strict
     2  %token NUM
     3  %%
     4  expr: NUM;
     5  %%
pl@nereida:~/LEyapp/examples/eyapplanguageref$ eyapp bugyapp3.eyp
pl@nereida:~/LEyapp/examples/eyapplanguageref$ ls -ltr | tail -1
-rw-r--r-- 1 pl users 2395 2008-10-02 09:41 bugyapp3.pm

It is a good practice to use %strict at the beginning of your grammar.

CONFLICT ERRORS

The following simplified eyapp program has some errors. The generated language is made of lists of declarations (D stands for declaration) followed by lists of sentences (S stands for stament) separated by semicolons:

pl@europa:~/LEyapp/examples/debuggingtut$ cat -n Debug.eyp
   1  %token D S
   2
   3  %{
   4  our $VERSION = '0.01';
   5  %}
   6
   7  %%
   8  p:
   9                  ds ';' ss
  10          |       ss
  11  ;
  12
  13  ds:
  14            D ';' ds
  15          | D
  16      {
  17        print "Reducing by rule:\n";
  18        print "\tds -> D\n";
  19        $_[1];
  20      }
  21  ;
  22
  23  ss:
  24            S ';' ss
  25          | S
  26  ;
  27
  28  %%
  29
  30  my $tokenline = 0;
  31
  32  sub _Error {
  33          my $parser = shift;
  34          my ($token) = $parser->YYCurval;
  35          my ($what) = $token ? "input: '$token'" : "end of input";
  36          die "Syntax error near $what line num $tokenline\n";
  37  }
  38
  39  my $input;
  40
  41  sub _Lexer {
  42
  43          for ($input) {
  44        s{^(\s)}{} and $tokenline += $1 =~ tr{\n}{};
  45              return ('',undef) unless $_;
  46        return ($1,$1) if s/^(.)//;
  47          }
  48          return ('',undef);
  49  }
  50
  51  sub Run {
  52          my ($self) = shift;
  53
  54          $input = shift;
  55
  56          return $self->YYParse( yylex => \&_Lexer, yyerror => \&_Error,
  57                           yydebug => 0xF
  58    );
  59  }

Sometimes the presence of actions, attribute names and support code makes more difficult the readability of the grammar. Yuo can use the -c option of eyapp, to see only the syntactic parts:

$ eyapp -c examples/debuggingtut/Debug.eyp
%token D S

%%

p:
      ds ';' ss
    | ss
;
ds:
      D ';' ds
    | D
;
ss:
      S ';' ss
    | S
;

$

When compiling this grammar, eyapp produces a warning message announcing the existence of a conflict:

pl@nereida:~/LEyapp/examples$ eyapp Debug.eyp
1 shift/reduce conflict (see .output file)
State 4: shifts:
  to state    8 with ';'

The existence of warnings triggers the creation of a file Debug.output containing information about the grammar and the syntax analyzer.

Let us see the contents of the Debug.output file:

pl@nereida:~/LEyapp/examples$ cat -n Debug.output
   1  Warnings:
   2  ---------
   3  1 shift/reduce conflict (see .output file)
   4  State 4: shifts:
   5    to state    8 with ';'
   6
   7  Conflicts:
   8  ----------
   9  State 4 contains 1 shift/reduce conflict
  10
  11  Rules:
  12  ------
  13  0:      $start -> p $end
  14  1:      p -> ds ';' ss
  15  2:      p -> ss
  16  3:      ds -> D ';' ds
  17  4:      ds -> D
  18  5:      ss -> S ';' ss
  19  6:      ss -> S
  20
  21  States:
  22  -------
  23  State 0:
  24
  25          $start -> . p $end      (Rule 0)
  26
  27          D       shift, and go to state 4
  28          S       shift, and go to state 1
  29
  30          p       go to state 2
  31          ss      go to state 3
  32          ds      go to state 5
  33
  ..  .........................................
  55  State 4:
  56
  57          ds -> D . ';' ds        (Rule 3)
  58          ds -> D .       (Rule 4)
  59
  60          ';'     shift, and go to state 8
  61
  62          ';'     [reduce using rule 4 (ds)]
  63
  ..  .........................................
  84  State 8:
  85
  86          ds -> D ';' . ds        (Rule 3)
  87
  88          D       shift, and go to state 4
  89
  90          ds      go to state 11
  91
  ..  .........................................
 112  State 12:
 113
 114          p -> ds ';' ss .        (Rule 1)
 115
 116          $default        reduce using rule 1 (p)
 117
 118
 119  Summary:
 120  --------
 121  Number of rules         : 7
 122  Number of terminals     : 4
 123  Number of non-terminals : 4
 124  Number of states        : 13

The parser generated by Parse::Eyapp is based on a deterministic finite automata. Each state of the automata remembers what production rules are candidates to apply and what have been seen from the right hand side of the production rule. The problem, according to the warning, occurs in state 4. State 4 contains:

55  State 4:
56
57          ds -> D . ';' ds        (Rule 3)
58          ds -> D .       (Rule 4)
59
60          ';'     shift, and go to state 8
61
62          ';'     [reduce using rule 4 (ds)]
63

An state is a set of production rules with a marker (the dot in rules 3 and 4) somewhere in its right hand side. If the parser is in state 4 is because the production rules ds -> D ';' ds and ds -> D are potential candidates to build the syntax tree. That they will be or not depends on what will happen next when more input is processed.

The dot that appears on the right hand side means position in our guessing. The fact that ds -> D .';' ds is in state 4 means that if the parser is in state 4 we have already seen D and we expect to see a semicolon followed by ds (or something derivable from ds). If such thing happens this production will be the right one (will be the handle in the jargon). The comment

60          ';'     shift, and go to state 8

means that if the next token is a semicolon the next state will be state 8:

84  State 8:
85
86          ds -> D ';' . ds        (Rule 3)
87
88          D       shift, and go to state 4
89
90          ds      go to state 11

As we see state 8 has the item ds -> D ';' . ds which means that we have already seen a D and a semicolon.

The fact that ds -> D . is in state 4 means that we have already seen D and since the dot is at the end of the rule, this production can be the right one, even if a semicolon is just waiting in the input.

That is why eyapp talks about a shift/reduce conflict with ';' because in state 4 there are two rules that compete to be the right one:

pl@nereida:~/LEyapp/examples$ eyapp Debug.eyp
1 shift/reduce conflict (see .output file)

We can guess that the right item (the states of the automata are called LR(0) items in the jargon) is ds -> D .';' ds and shift to state 8 consuming the semicolon, expecting to see something derivable from ds later or guess that ds -> D . is the right LR(0) item and reduce for such rule. This is the meaning of the comments in state 4:

60          ';'     shift, and go to state 8
61
62          ';'     [reduce using rule 4 (ds)]

To illustrate the problem let us consider the phrases D;S and D;D;S.

For both phrases, after consuming the D the parser will go to state 4 and the current token will be the semicolon.

For the first phrase D;S the correct decison is to use rule 4 ds -> D (to reduce in the jargon). For the second phrase D;D;S the correct decision is to follow rule 3 ds -> D . ';' ds.

The parser generated by eyapp could know wich rule is correct for each case if it were allowed to look at the token after the semicolon: if it is a S is rule 4, if it is a D is rule 3. But the parsers generated by Eyapp do not lookahead more than the next token (this is what the "1" means when we say that Parse::Eyapp parsers are LALR(1)) and therefore is not in condition to decide which producction rule applies.

To solve the conflict the Eyapp programmer has to reformulate the grammar modifying priorities and reorganizing the rules. Rewriting the recursive rule for ds to be let recursive solves the conflict:

pl@nereida:~/LEyapp/examples$ sed -ne '/^ds:/,/^;/p' Debug1.eyp | cat -n
 1  ds:
 2            ds ';' D
 3          | D
 4      {
 5        print "Reducing by rule:\n";
 6        print "\tds -> D\n";
 7        $_[1];
 8      }
 9  ;

Now, for any phrase matching the pattern D ; ... the action to build the tree is to reduce by ds -> D.

The rightmost antiderivation for D;D;S is:

           Derivation                 |             Tree
--------------------------------------+-----------------------------
D;D;S <= ds;D;S <= ds;S <= ds;ss <= p |  p(ds(ds(D),';',D),';',ss(S))

while the rightmost antiderivation for D;S is:

           Derivation                 |             Tree
--------------------------------------+-----------------------------
D;S <= ds;S <= ds;ss <= p             |      p(ds(D),';',ss(S))

When we recompile the modified grammar no warnings appear:

pl@nereida:~/LEyapp/examples$ eyapp Debug1.eyp
pl@nereida:~/LEyapp/examples$

ERRORS DURING TREE CONSTRUCTION

Let us write the typical client program:

pl@nereida:~/LEyapp/examples$ cat -n usedebug1.pl
 1  #!/usr/bin/perl -w
 2  # usetreebypass.pl prueba2.exp
 3  use strict;
 4  use Debug1;
 5
 6  sub slurp_file {
 7    my $fn = shift;
 8    my $f;
 9
10    local $/ = undef;
11    if (defined($fn)) {
12      open $f, $fn  or die "Can't find file $fn!\n";
13    }
14    else {
15      $f = \*STDIN;
16    }
17    my $input = <$f>;
18    return $input;
19  }
20
21  my $input = slurp_file( shift() );
22
23  my $parser = Debug1->new();
24
25  $parser->Run($input);

When executing the program we observe an abnormal behavior. We activate the option yydebug => 0xF in the call to a YYParser. The integer parameter yydebug of new and YYParse controls the level of debugging. Different levels of verbosity can be obtained by setting the bits of this argument. It works as follows:

/============================================================\
| Bit Value  | Outputs                                       |
|------------+-----------------------------------------------|
|  0x01      |  Token reading (useful for Lexer debugging)   |
|------------+-----------------------------------------------|
|  0x02      |  States information                           |
|------------+-----------------------------------------------|
|  0x04      |  Driver actions (shifts, reduces, accept...)  |
|------------+-----------------------------------------------|
|  0x08      |  Parse Stack dump                             |
|------------+-----------------------------------------------|
|  0x10      |  Error Recovery tracing                       |
\============================================================/

Let us see what happens when the input is D;S. We have introduced some white spaces and carriage returns between the terminals:

pl@nereida:~/LEyapp/examples$ usedebug1.pl
D

;

S

----------------------------------------
In state 0:
Stack:[0]
Need token. Got >D<
Shift and go to state 4.
----------------------------------------
In state 4:
Stack:[0,4]
Don't need token.
Reduce using rule 4 (ds --> D): Reducing by rule:
        ds -> D
Back to state 0, then go to state 5.
----------------------------------------
In state 5:
Stack:[0,5]
Need token. Got ><
Syntax error near end of input line num 1

What's going on? After reading the carriage return

Need token. Got >D<

the parser receives an end of file. ¿Why?. Something is going wrong in the communications between lexer and parser. Let us review the lexical analyzer:

pl@nereida:~/LEyapp/examples$ sed -ne '/sub.*_Lexer/,/^}/p' Debug1.eyp | cat -n
 1  sub _Lexer {
 2
 3          for ($input) {
 4             s{^(\s)}{} and $tokenline += $1 =~ tr{\n}{};
 5             return ('',undef) unless $_;
 6             return ($1,$1) if s/^(.)//;
 7          }
 8          return ('',undef);
 9  }

The error is at line 4. Only a single white space is eaten! The second white in the input does not match lines 5 and 6 and the contextualizing for finishes. Line 8 the returns the ('',undef) signalling the end of input.

Let us write a new version Debug2.eyp that fixes the problem:

pl@nereida:~/LEyapp/examples$ sed -ne '/sub.*_Lexer/,/^}/p' Debug2.eyp | cat -n
 1  sub _Lexer {
 2
 3          for ($input) {
 4             s{^(\s+)}{} and $tokenline += $1 =~ tr{\n}{};
 5             return ('',undef) unless $_;
 6             return ($1,$1) if s/^(.)//;
 7          }
 8          return ('',undef);
 9  }

Now the analysis seems to work:

pl@nereida:~/LEyapp/examples$ usedebug2.pl
D

;

S

----------------------------------------
In state 0:
Stack:[0]
Need token. Got >D<
Shift and go to state 4.
----------------------------------------
In state 4:
Stack:[0,4]
Don't need token.
Reduce using rule 4 (ds --> D): Reducing by rule:
        ds -> D
Back to state 0, then go to state 5.
----------------------------------------
In state 5:
Stack:[0,5]
Need token. Got >;<
Shift and go to state 8.
----------------------------------------
In state 8:
Stack:[0,5,8]
Need token. Got >S<
Shift and go to state 1.
----------------------------------------
In state 1:
Stack:[0,5,8,1]
Need token. Got ><
Reduce using rule 6 (ss --> S): Back to state 8, then go to state 10.
----------------------------------------
In state 10:
Stack:[0,5,8,10]
Don't need token.
Reduce using rule 1 (p --> ds ; ss): Back to state 0, then go to state 2.
----------------------------------------
In state 2:
Stack:[0,2]
Shift and go to state 7.
----------------------------------------
In state 7:
Stack:[0,2,7]
Don't need token.
Accept.

THE LR PARSING ALGORITHM: UNDERSTANDING THE OUTPUT OF `yydebug`

The YYParse methods implements the generic LR parsing algorithm. It very much works Parse::Yapp::YYParse and as yacc/bison yyparse. It accepts almost the same arguments as Class->new (Being Class the name of the generated class).

The parser uses two tables and a stack. The two tables are called the action table and the goto table. The stack is used to keep track of the states visited.

At each step the generated parser consults the action table and takes one decision: To shift to a new state consuming one token (and pushing the current state in the stack) or to reduce by some production rule. In the last case the parser pops from its stack as many states as symbols are on the right hand side of the production rule. Here is a Perl/C like pseudocode summarizing the activity of YYParse:

 1   my $parser = shift; # The parser object
 2   push(@stack, $parser->{startstate});
 3   $b = $parser->YYLexer(); # Get the first token
 4   FOREVER: {
 5     $s = top(0);  # Get the state on top of the stack
 6     $a = $b;
 7     switch ($parser->action[$s->state][$a]) {
 8       case "shift t" : 
 9         my $t;
10         $t->{state} = t;
11         $t->{attr}  = $a->{attr};
12         push($t); 
13         $b = $parser->YYLexer(); # Call the lexical analyzer
14         break;
15       case "reduce A->alpha" : 
16         # Call the semantic action with the attributes of the rhs as args
17         my $semantic  = $parser->Semantic{A ->alpha}; # The semantic action
18         my $r;
19         $r->{attr} = $semantic->($parser, top(|alpha|-1)->attr, ... , top(0)->attr); 
20  
21         # Pop as many states as symbols on the rhs of A->alpha
22         pop(|alpha|);  
23  
24         # Goto next state 
25         $r->{state} = $parser->goto[top(0)][A]; 
26         push($r); 
27         break;
28       case "accept" : return (1); 
29       default : $parser->YYError("syntax error"); 
30     }
31     redo FOREVER;
32   }

Here |alpha| stands for the length of alpha. Function top(k) returns the state in position k from the top of the stack, i.e. the state at depth k. Function pop(k) extracts k states from the stack. The call $state->attr returns the attribute associated with $state. The call $parser->Semantic{A ->alpha} returns the semantic action associated with production A ->alpha.

Let us see a trace for the small gramar in examples/aSb.yp:

pl@nereida:~/LEyapp/examples$ /usr/local/bin/paste.pl aSb.yp aSb.output | head -5
%%                                             | Rules:
S:                 { print "S -> epsilon\n" }  | ------
    |   'a' S 'b'  { print "S -> a S b\n" }    | 0:    $start -> S $end
;                                              | 1:    S -> /* empty */
%%                                             | 2:    S -> 'a' S 'b'

The tables in file aSb.output describe the actions and transitions to take:

pl@nereida:~/LEyapp/examples$ cat -n aSb.output
   .  .........................................
   7  States:
   8  -------
   9  State 0:
  10
  11          $start -> . S $end      (Rule 0)
  12
  13          'a'     shift, and go to state 2
  14
  15          $default        reduce using rule 1 (S)
  16
  17          S       go to state 1
  18
  19  State 1:
  20
  21          $start -> S . $end      (Rule 0)
  22
  23          $end    shift, and go to state 3
  24
  25  State 2:
  26
  27          S -> 'a' . S 'b'        (Rule 2)
  28
  29          'a'     shift, and go to state 2
  30
  31          $default        reduce using rule 1 (S)
  32
  33          S       go to state 4
  34
  35  State 3:
  36
  37          $start -> S $end .      (Rule 0)
  38
  39          $default        accept
  40
  41  State 4:
  42
  43          S -> 'a' S . 'b'        (Rule 2)
  44
  45          'b'     shift, and go to state 5
  46
  47  State 5:
  48
  49          S -> 'a' S 'b' .        (Rule 2)
  50
  51          $default        reduce using rule 2 (S)
  52
  53
  54  Summary:
  55  --------
  56  Number of rules         : 3
  57  Number of terminals     : 3
  58  Number of non-terminals : 2
  59  Number of states        : 6

When executed with yydebug set and input aabb we obtain the following output:

pl@nereida:~/LEyapp/examples$ use_aSb.pl
----------------------------------------
In state 0:
Stack:[0]
aabb                       <----------- user input
Need token. Got >a<
Shift and go to state 2.  
----------------------------------------
In state 2:
Stack:[0,2]
Need token. Got >a<
Shift and go to state 2.
----------------------------------------
In state 2:
Stack:[0,2,2]
Need token. Got >b<
Reduce using rule 1 (S --> /* empty */): S -> epsilon
Back to state 2, then go to state 4.

The output S-> epsilon is consequence of the semantic action associated with such production rule.

----------------------------------------
In state 4:
Stack:[0,2,2,4]
Shift and go to state 5.
----------------------------------------
In state 5:
Stack:[0,2,2,4,5]
Don't need token.
Reduce using rule 2 (S --> a S b): S -> a S b
Back to state 2, then go to state 4.

As a result of reducing by rule 2 the semantic action is executed

{ print "S -> a S b\n" }

and the three last visited states are popped from the stack, and the stack becomes [0,2]. But that means that we are now in state 2 seeing a S. If you look at the table above being in state2 and seeing a S we go to state 4.

----------------------------------------
In state 4:
Stack:[0,2,4]
Need token. Got >b<
Shift and go to state 5.
----------------------------------------
In state 5:
Stack:[0,2,4,5]
Don't need token.
Reduce using rule 2 (S --> a S b): S -> a S b
Back to state 0, then go to state 1.
----------------------------------------
In state 1:
Stack:[0,1]
Need token. Got ><
Shift and go to state 3.
----------------------------------------
In state 3:
Stack:[0,1,3]
Don't need token.
Accept.

ERRORS INSIDE SEMANTIC ACTIONS

A third type of error occurs when the code inside a semantic action does'nt behave as expected.

the semantic actions are translated in anonymous methods of the parser object. Since they are anonymous we ca't use breakpoints as

b subname # stop when arriving at sub ''name''

c subname  # contine up to reach sub ''name''

Furthermore the file loaded by the client program is the generated .pm. The code in Debug.pm is alien to us - Was automatically generated by Parse::Eyapp - and it can be difficult to find where our inserted semantic actions are.

To watch the execution of a semantic action is simple: We use the debugger f file.eyp option to switch the viewing filename to our grammar file.

pl@nereida:~/LEyapp/examples$ perl -wd usedebug2.pl

Loading DB routines from perl5db.pl version 1.28
Editor support available.

Enter h or `h h' for help, or `man perldebug' for more help.

main::(usedebug2.pl:21):        my $input = slurp_file( shift() );
  DB<1> f Debug2.eyp
1       2       #line 3 "Debug2.eyp"
3
4:      our $VERSION = '0.01';
5
6       7       8       9       10

Now we can set a breakpoint at any line of our grammar file. Thus the 18 in the command b 18 refers to line 18 in Debug2.eyp. The command l shows the corresponding lines of the .eyp file

  DB<2> b 18
  DB<3> l
11      12      13      14      15      16      #line 17 "Debug2.eyp"
17
18:b          print "Reducing by rule:\n";
19:           print "\tds -> D\n";
20:           $_[1];

We issue now the command c (continue). The execution continues up to linea 18 of Debug2.eyp:

  DB<3> c
D

;

S

Debug2::CODE(0x85129d8)(Debug2.eyp:18):
18:           print "Reducing by rule:\n";
  DB<3> n
Reducing by rule:

Now we can issue any debugger commands (like x, p, etc.) to investigate the internal state of our program and determine what are the reasons for any abnormal behavior.

Debug2::CODE(0x85129d8)(Debug2.eyp:19):
19:           print "\tds -> D\n";
  DB<3> x $_[0]{GRAMMAR}
0  ARRAY(0x8538360)
   0  ARRAY(0x855aa88)
      0  '_SUPERSTART'
      1  '$start'
      2  ARRAY(0x855ab60)
         0  'p'
         1  '$end'
      3  0
   1  ARRAY(0x855a890)
      0  'p_1'
      1  'p'
      2  ARRAY(0x855a8fc)
         0  'ds'
         1  ';'
         2  'ss'
      3  0
   2  ARRAY(0x855a800)
      0  'p_2'
      1  'p'
      2  ARRAY(0x855a830)
         0  'ss'
      3  0
   3  ARRAY(0x855a764)
      0  'ds_3'
      1  'ds'
      2  ARRAY(0x855a7a0)
         0  'ds'
         1  ';'
         2  'D'
      3  0
   4  ARRAY(0x85421d4)
      0  'ds_4'
      1  'ds'
      2  ARRAY(0x855a6e0)
         0  'D'
      3  0
   5  ARRAY(0x8538474)
      0  'ss_5'
      1  'ss'
      2  ARRAY(0x854f9c8)
         0  'S'
         1  ';'
         2  'ss'
      3  0
   6  ARRAY(0x85383b4)
      0  'ss_6'
      1  'ss'
      2  ARRAY(0x85383f0)
         0  'S'
      3  0
  DB<4>

Using a second c the execution continues until reaching the end of the program:

  DB<3> c
Debugged program terminated.  Use q to quit or R to restart,
  use o inhibit_exit to avoid stopping after program termination,
  h q, h R or h o to get additional info.
  DB<3>

TREE EQUALITY

The more the time invested writing tests the less the time spent debugging. This section deals with the Parse::Eyapp::Node method equal which can be used to test that the trees have the shape we expect.

$node->equal

A call $tree1->equal($tree2) compare the two trees $tree1 and $tree2. Two trees are considered equal if their root nodes belong to the same class, they have the same number of children and the children are (recursively) equal.

In Addition to the two trees the programmer can specify pairs attribute_key => equality_handler:

$tree1->equal($tree2, attr1 => \&handler1, attr2 => \&handler2, ...)

In such case the definition of equality is more restrictive: Two trees are considered equal if

Their root nodes belong to the same class,
They have the same number of children
For each of the specified attributes occur that for both nodes the existence and definition of the key is the same
Assuming the key exists and is defined for both nodes, the equality handlers return true for each of its attributes and
The respective children are (recursively) equal.

An attribute handler receives as arguments the values of the attributes of the two nodes being compared and must return true if, and only if, these two attributes are considered equal. Follows an example:

examples/Node$ cat -n equal.pl
   1  #!/usr/bin/perl -w
   2  use strict;
   3  use Parse::Eyapp::Node;
   4
   5  my $string1 = shift || 'ASSIGN(VAR(TERMINAL))';
   6  my $string2 = shift || 'ASSIGN(VAR(TERMINAL))';
   7  my $t1 = Parse::Eyapp::Node->new($string1, sub { my $i = 0; $_->{n} = $i++ for @_ });
   8  my $t2 = Parse::Eyapp::Node->new($string2);
   9
  10  # Without attributes
  11  if ($t1->equal($t2)) {
  12    print "\nNot considering attributes: Equal\n";
  13  }
  14  else {
  15    print "\nNot considering attributes: Not Equal\n";
  16  }
  17
  18  # Equality with attributes
  19  if ($t1->equal($t2, n => sub { return $_[0] == $_[1] })) {
  20    print "\nConsidering attributes: Equal\n";
  21  }
  22  else {
  23    print "\nConsidering attributes: Not Equal\n";
  24  }

When the former program is run without arguments produces the following output:

examples/Node$ equal.pl

Not considering attributes: Equal

Considering attributes: Not Equal

Using `equal` During Testing

During the development of your compiler you add new stages to the existing ones. The consequence is that the AST is decorated with new attributes. Unfortunately, this implies that tests you wrote using is_deeply and comparisons against formerly correct abstract syntax trees are no longer valid. This is due to the fact that is_deeply requires both tree structures to be equivalent in every detail and that our new code produces a tree with new attributes.

Instead of is_deeply use the equal method to check for partial equivalence between abstract syntax trees. You can follow these steps:

Dump the tree for the source inserting Data::Dumper statements
Carefully check that the tree is really correct
Decide which attributes will be used for comparison
Write the code for the expected value editing the output produced by Data::Dumper
Write the handlers for the attributes you decided. Write the comparison using equal.

Tests using this methodology will not fail even if later code decorating the AST with new attributes is introduced.

See an example that checks an abstract syntax tree produced by the simple compiler (see examples/typechecking/Simple-Types-XXX.tar.gz) for a really simple source:

Simple-Types/script$ cat prueba27.c
int f() {
}

The first thing is to obtain a description of the tree, that can be done executing the compiler under the control of the perl debugger, stopping just after the tree has been built and dumping the tree with Data::Dumper:

pl@nereida:~/Lbook/code/Simple-Types/script$ perl -wd usetypes.pl prueba27.c
main::(usetypes.pl:5):  my $filename = shift || die "Usage:\n$0 file.c\n";
  DB<1> c 12
main::(usetypes.pl:12): Simple::Types::show_trees($t, $debug);
  DB<2> use Data::Dumper
  DB<3> $Data::Dumper::Purity = 1
  DB<4> p Dumper($t)
$VAR1 = bless( {
                 ..............................................
               }, 'PROGRAM' );
...............................................................

Once we have the shape of a correct tree we can write our tests:

examples/Node$ cat -n testequal.pl
   1  #!/usr/bin/perl -w
   2  use strict;
   3  use Parse::Eyapp::Node;
   4  use Data::Dumper;
   5  use Data::Compare;
   6
   7  my $debugging = 0;
   8
   9  my $handler = sub {
  10    print Dumper($_[0], $_[1]) if $debugging;
  11    Compare($_[0], $_[1])
  12  };
  13
  14  my $t1 = bless( {
  15                   'types' => {
  16                                'CHAR' => bless( { 'children' => [] }, 'CHAR' ),
  17                                'VOID' => bless( { 'children' => [] }, 'VOID' ),
  18                                'INT' => bless( { 'children' => [] }, 'INT' ),
  19                                'F(X_0(),INT)' => bless( {
  20                                   'children' => [
  21                                      bless( { 'children' => [] }, 'X_0' ),
  22                                      bless( { 'children' => [] }, 'INT' ) ]
  23                                 }, 'F' )
  24                              },
  25                   'symboltable' => { 'f' => { 'type' => 'F(X_0(),INT)', 'line' => 1 } },
  26                   'lines' => 2,
  27                   'children' => [
  28                                   bless( {
  29                                            'symboltable' => {},
  30                                            'fatherblock' => {},
  31                                            'children' => [],
  32                                            'depth' => 1,
  33                                            'parameters' => [],
  34                                            'function_name' => [ 'f', 1 ],
  35                                            'symboltableLabel' => {},
  36                                            'line' => 1
  37                                          }, 'FUNCTION' )
  38                                 ],
  39                   'depth' => 0,
  40                   'line' => 1
  41                 }, 'PROGRAM' );
  42  $t1->{'children'}[0]{'fatherblock'} = $t1;
  43
  44  # Tree similar to $t1 but without some attributes (line, depth, etc.)
  45  my $t2 = bless( {
  46                   'types' => {
  47                                'CHAR' => bless( { 'children' => [] }, 'CHAR' ),
  48                                'VOID' => bless( { 'children' => [] }, 'VOID' ),
  49                                'INT' => bless( { 'children' => [] }, 'INT' ),
  50                                'F(X_0(),INT)' => bless( {
  51                                   'children' => [
  52                                      bless( { 'children' => [] }, 'X_0' ),
  53                                      bless( { 'children' => [] }, 'INT' ) ]
  54                                 }, 'F' )
  55                              },
  56                   'symboltable' => { 'f' => { 'type' => 'F(X_0(),INT)', 'line' => 1 } },
  57                   'children' => [
  58                                   bless( {
  59                                            'symboltable' => {},
  60                                            'fatherblock' => {},
  61                                            'children' => [],
  62                                            'parameters' => [],
  63                                            'function_name' => [ 'f', 1 ],
  64                                          }, 'FUNCTION' )
  65                                 ],
  66                 }, 'PROGRAM' );
  67  $t2->{'children'}[0]{'fatherblock'} = $t2;
  68
  69  # Tree similar to $t1 but without some attributes (line, depth, etc.)
  70  # and without the symboltable and types attributes used in the comparison
  71  my $t3 = bless( {
  72                   'types' => {
  73                                'CHAR' => bless( { 'children' => [] }, 'CHAR' ),
  74                                'VOID' => bless( { 'children' => [] }, 'VOID' ),
  75                                'INT' => bless( { 'children' => [] }, 'INT' ),
  76                                'F(X_0(),INT)' => bless( {
  77                                   'children' => [
  78                                      bless( { 'children' => [] }, 'X_0' ),
  79                                      bless( { 'children' => [] }, 'INT' ) ]
  80                                 }, 'F' )
  81                              },
  82                   'children' => [
  83                                   bless( {
  84                                            'symboltable' => {},
  85                                            'fatherblock' => {},
  86                                            'children' => [],
  87                                            'parameters' => [],
  88                                            'function_name' => [ 'f', 1 ],
  89                                          }, 'FUNCTION' )
  90                                 ],
  91                 }, 'PROGRAM' );
  92
  93  $t3->{'children'}[0]{'fatherblock'} = $t2;
  94
  95  # Without attributes
  96  if (Parse::Eyapp::Node::equal($t1, $t2)) {
  97    print "\nNot considering attributes: Equal\n";
  98  }
  99  else {
 100    print "\nNot considering attributes: Not Equal\n";
 101  }
 102
 103  # Equality with attributes
 104  if (Parse::Eyapp::Node::equal(
 105        $t1, $t2,
 106        symboltable => $handler,
 107        types => $handler,
 108      )
 109     ) {
 110        print "\nConsidering attributes: Equal\n";
 111  }
 112  else {
 113    print "\nConsidering attributes: Not Equal\n";
 114  }
 115
 116  # Equality with attributes
 117  if (Parse::Eyapp::Node::equal(
 118        $t1, $t3,
 119        symboltable => $handler,
 120        types => $handler,
 121      )
 122     ) {
 123        print "\nConsidering attributes: Equal\n";
 124  }
 125  else {
 126    print "\nConsidering attributes: Not Equal\n";
 127  }

The code defining tree $t1 was obtained from an output using Data::Dumper. The code for trees $t2 and $t3 was written using cut-and-paste from $t1. They have the same shape than $t1 but differ in their attributes. Tree $t2 shares with $t1 the attributes symboltable and types used in the comparison and so equal returns true when compared. Since $t3 differs from $t1 in the attributes symboltable and types the call to equal returns false.

FORMATTING Parse::Eyapp PROGRAMS

I use these rules for indenting Parse::Eyapp programs:

Use uppercase identifiers for tokens, lowecase identifiers for syntactic variables
The syntactic variable that defines the rule must be at in a single line at the leftmost position:
```
synvar:
     'a' othervar 'c'
  |  'b' anothervar SOMETOKEN
;
```
The separation bar | goes indented relative to the left side of the rule. Each production starts two spaces from the bar. The first right hand side is aligned with the rest.
The semicolon ; must also be in its own line at column 0
If there is an empty production it must be the first one and must be commented
```
syntacvar:
     /* empty */
  |  'a' othervar 'c'
  |  'b' anothervar 
;
```

Only very short semantic actions can go in the same line than the production. Semantic actions requiring more than one line must go in its own idented block like in:

  exp:
      $NUM            { $NUM->[0] }
    | $VAR
       {
	 my $id = $VAR->[0];
	 my $val = $s{$id};
	 $_[0]->semantic_error("Accesing undefined variable $id at line $VAR->[1].\n")
	 unless defined($val);
	 return $val;
       }
    | $VAR '=' $exp   { $s{$VAR->[0]} = $exp }
    | exp.x '+' exp.y { $x + $y }
    | exp.x '-' exp.y { $x - $y }
    | exp.x '*' exp.y { $x * $y }
    | exp.x '/'.barr exp.y
      {
	 return($x/$y) if $y;
	 $_[0]->semantic_error("Illegal division by zero at line $barr->[1].\n");
	 undef
      }
    | '-' $exp %prec NEG  { -$exp }
    | exp.x '^' exp.y     { $x ** $y }
    | '(' $exp ')'        { $exp }
  ;

REFERENCES

The classic Dragon's book Compilers: Principles, Techniques, and Tools by Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman (Addison-Wesley 1986)

AUTHOR

Casiano Rodriguez-Leon (casiano@ull.es)

ACKNOWLEDGMENTS

This work has been supported by CEE (FEDER) and the Spanish Ministry of Educacion y Ciencia through Plan Nacional I+D+I number TIN2005-08818-C04-04 (ULL::OPLINK project http://www.oplink.ull.es/). Support from Gobierno de Canarias was through GC02210601 (Grupos Consolidados). The University of La Laguna has also supported my work in many ways and for many years.

A large percentage of code is verbatim taken from Parse::Yapp 1.05. The author of Parse::Yapp is Francois Desarmenien.

I wish to thank Francois Desarmenien for his Parse::Yapp module, to my students at La Laguna and to the Perl Community. Special thanks to my family and Larry Wall.

LICENCE AND COPYRIGHT

These modules are free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 524:: Non-ASCII character seen before =encoding in '¿Why?.'. Assuming CP1252

To install Parse::Eyapp, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Parse::Eyapp

CPAN shell

perl -MCPAN -e shell
install Parse::Eyapp

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)