NAME
README.txt - Readme file for pirc/new compiler, a fresh implementation of the PIR language using Bison and Flex.
AUTHOR
kjs
DESCRIPTION
pirc/new is a fresh implementation of the PIR language. Maintaining the current default implementation (IMCC) is a bit of a pain, and I wanted to see how far I could come with a fresh implementation. A lot of ugly things could be removed.
Of course, it is not finished yet. A lot of work is needed on the back-end before it can generate Parrot Byte Code files (PBC).
The current set-up is a three-phase compiler:
Heredoc pre-processor
The heredoc pre-processor takes the input, and converts all heredoc strings into normal strings. So, the following:
.sub main foo(<<'HI', <<'BYE') hi there! HI bye for now! BYE .end
is converted into:
.sub main foo(" hi there!\n", " bye for now!\n\n") .end
Currently there is a small issue with the 2nd and later heredoc arguments; they seem to get one newline character too many.
The heredoc pre-processor needs to know about POD comments, because the POD comment may contain a heredoc string, which should not be processed, as it is a comment. For that purpose, all comments (POD and line comments) are stripped in this phase.
The Heredoc pre-processor is located in compilers/pirc/heredoc.
Macro pre-processor
The macro pre-processor takes the output of the heredoc pre-processor, and handles all macro definitions and expansions. The
.include
directive is handled here too. The output of the macro pre-processor is (in case of uses of the.include
directive) one long big file with "pure" PIR code.The macro pre-processor is located in compilers/pirc/macro.
PIR parser
The third pass is done by the PIR parser, which takes the "pure" PIR code from the macro pre-processor. Currently, it's only a parser, but a future extension could be to generate PASM code from the PIR input. This way, it's easy to see what ops are actually executed when running the PIR file.
The PIR parser is located in compilers/pirc/new.
The new implementation also has some unique features with respect to IMCC:
Multiple heredoc arguments
In pirc/new (a new name is yet to be defined) it is allowed to use multiple heredocs as function arguments, like so:
... foo(<<'HI', <<'BYE') ... HI ... BYE
Heredoc arguments for macro expansions
As the heredoc pre-processor handles the input before the macro pre-processor, it is now possible to expand macros specifying heredoc arguments, like so:
.macro foo(a) print .a .end .sub main .foo(<<'HI') Hello world! HI .end
Reentrant
The generated lexer and parser are fully re-entrant. (It does need to be tested, though).
Comments!
The code is provided with comments, so you can actually understand what it does.
Pre-processing option
Although IMCC does define the option '-E', it is not really working correctly. pirc has two pre-processing options: 1) running the heredoc parser only, 2) running both the heredoc and macro processors. The output of option 2 is the code that will be given to the PIR compiler.
Grammar cleanup
This is a nice opportunity to clean up the grammar of the PIR language. Hacking on IMCC's grammar is possible, but not for the faint of heart.
NOTES
Usage
Currently the different compilers/pre-processors are located in different directories. The different pre-processors are invoked from the main driver in pirc.c. The latter assumes all three processors are compiled, as the following executables:
heredoc pre-processor: hdocprep
macro pre-processor: macroparser
Running a file through the whole PIR compiler is then done as follows:
$ ./pirc test.pir
When you want to run the heredoc pre-processor only, do this:
$ ./pirc -H test.pir
When you want to pre-process the file only (heredoc + macro parsing), do this:
$ ./pirc -E test.pir
Cygwin processable lexer spec.
The file pir.l
from which the lexer is generated is not processable by Cygwin's default version of Flex. In order to make a reentrant lexer, a newer version is needed, which can be downloaded from the link below.
Just do:
$ ./configure
$ make
Then make sure to overwrite the supplied flex binary.
BUGS
Having a look at this implementation would be greatly appreciated, and any resulting feedback even more :-)
All, except the first heredoc argument, contains 1 newline character too many.
Memory management needs to be improved.
The three passed should be integrated into 1 C program. This is possible, because the generated lexers and parser can be specified to get a different prefix than "yy". So, although there are 3 lexers and 2 parsers, all generated by Flex/Bison, they can be linked together. This is only necessary if it hugely improves performance w.r.t. pipes. This needs further research.
Braced macro arguments need to be finished.
SEE ALSO
See also:
languages/PIR
for a PGE based implementation.compilers/pirc
, a hand-written, recursive-descent PIR parser.compilers/imcc
, the current standard PIR implementation.docs/imcc/syntax.pod
for a description of PIR syntax.docs/imcc/
for more documentation about the PIR language.