NAME
design.pod - description PIRC's design.
DESCRIPTION
This document describes the design and implementation of PIRC, a PIR Compiler.
OVERVIEW
PIRC currently consists of a PIR parser, together with a lexer. It also has the beginning of semantic actions in the parser. Through the use of a vtable, several back-ends can be implemented, leaving the parser untouched.
Documentation of the lexer and the parser can be generated by running:
make docs
which will generate html files in the doc
directory.
This document will only provide a high-level overview.
THE LEXER
The lexer is defined in pirlexer.c
. The header file lists all tokens that may be returned by the lexer.
The lexer reads the complete file contents into a buffer, from which it reads the individual words, or tokens. A buffer is much faster than using getc()
for each character, as I/O is relatively slow.
THE PARSER
The parser is defined in pirparser.c
. The header file only predeclares the parser_state
structure, but its definition is written in the C file, to hide the implementation details from other files. Access to specific fields is done through accessor functions, defined in the header file as well.
The parser communicates with the lexer through the lexer's accessor function. Of these, the next_token()
function is most important: it requests the next token from the lexer.
The parser does not know anything about the spelling of tokens, although it can request these through find_keyword()
.
SEMANTIC ACTIONS
The parser calls at a number of places emit
functions. These are hooks to which a function can be hooked, that will be called when the parser calls that function. This is implemented using vtables. Of course, not all hooks need to be used. If a hook is not assigned a function by the user, the default empty function is invoked. This is done to prevent NULL checks; let's just hope the optimizer sees the invoked function is empty, so the overhead of calling it is removed.
Example Vtable methods
This section gives a simple example to show how things are used. Let's consider a simplified version of a long_invocation
. Syntactically, it looks like this (this is the simplified version):
long-invocation -> '.begin_call' '\n'
arguments
'.call' invokable '\n'
results
'.end_call' '\n'
invocant -> IDENTIFIER | PREG
The parsing routine for long-invocation (again, its simplified version) looks as follows:
static void
long_invocation(parser_state *p) {
emit_invocation_start(p); /* indicate start of invocation */
match(p, T_PCC_BEGIN);
match(p, T_NEWLINE);
arguments(p);
match(p, T_PCC_CALL); /* check for token '.call' */
match(p, T_NEWLINE); /* check for a newline */
/* get current token from lexer and store it as the invokable object */
emit_invokable(p, get_current_token(p->lexer));
/* check whether it was an invokable object and get next token */
switch (p->curtoken) {
case T_IDENTIFIER:
case T_PREG:
emit_invokable(p, get_current_token(p->lexer));
break;
default:
syntax_error(p, 1, "invokable object expected");
break;
}
results(p);
match(p, T_PCC_END); /* accept token '.end_call' */
match(p, T_NEWLINE); /* accept the newline token */
emit_invocation_end(p); /* close down invocation sequence */
}
static void
arguments(parser_state *p) {
emit_args_start(p); /* start sequence of arguments */
/* handle arguments */
emit_args_end(p); /* stop sequence of arguments */
}
static void
results(p) {
emit_results_start(p); /* start sequence of results */
/* handle results */
emit_results_end(p); /* stop sequence of results */
}
To each of the emit_* function calls, the writer of the back-end can hook a custom function, that does the Appropiate Thing. What is appropiate, depends on the back-end. Some back-ends need to construct a data structure (AST) (for example the PBC backend would need this), others can just spit out what they get (like the PIR back-end).
Supported back-ends
Currently, there are the following back-end targets:
PAST - textual form of PAST using Data::Dumper format.
PIR - PIR output, which may change PIR syntax into PASM syntax.
JSON - JSON is extremely simple, and adding this back-end was pretty easy.
PBC - but this one is not implemented at all. Just a stub file.
See src/pirvtable.{c,h} for details.
Please note that none of the back-ends is complete.
VTable Methods
Now, you might ask yourself, who or what decides where these hooks, or vtable method calls are done. "Why is there a hook over here?" Well, that's done by figuring out at what moment a back-end might need to get some information of the parser. As the parser continues, the tokens being read are lost, if they're not stored anywhere. So, every once and a while the back-end needs to be able to store stuff, so it can do its job properly.
The vtable is not complete yet. There are a number of parsing routines that do not have associated vtable methods (invocations). Of course, we don't want the parser to do too much vtable invocations. On the other hand, if the parser does too few, constructing a back-end might be impossible. It's a bit of a trade-off.
OVERVIEW
This section gives an overview of what functionality is in what file:
src/pirlexer.{c,h} - implementation of the lexer
src/pirparser.{c,h} - implementation of the parser
src/pirvtable.{c,h} - constructor of an empty vtable
src/pirout.{c,h} - back-end that implements the vtable methods to output PIR
src/pastout.{c,h} - back-end that implements the vtable methods to output PAST (in Data::Dumper format)
src/pbcout.{c,h} - dummy back-end for PBC. This file only creates a vtable, but no implementation yet.
src/jsonout.{c,h} - back-end that implements the vtable methods to output JSON.
src/pirmain.c - main file for
pirc
. Execution starts here.
WHAT NEEDS TO BE DONE
There are some major TODOs:
Check whether an identifier is actually a Parrot op. In IMCC, this is done by calling Parrot_is_builtin(). However, for that, we need a Parrot_Interp. Currently I have problems getting things to link correctly.
Complete at least 1 back-end, to see what more vtable entries we need. And of course, to generate PBC in the end.
Complete the vtable structure with all needed vtable methods.
Memory management; not all memory is freed at this moment. Does it need to be done by the back-end, or by the parser?
AUTHOR
Klaas-Jan Stol <parrotcode at gmail dot com>