NAME
docs/pdds/draft/pdd29_compiler_tools.pod - Parrot Compiler Tools
VERSION
$Revision: 28231 $
MAINTAINER
Will "Coke" Coleda Klaas-Jan Stol
DEFINITIONS
Compiler
In this document, when we speak of a compiler, we mean PCT-based compilers.
HLL
A High-Level Language. Examples are: Perl, Ruby, Python, Lua, Tcl, etc.
ABSTRACT
This PDD specifies the Parrot Compiler Tools (PCT).
SYNOPSIS
Creating a PCT-based compiler can be done as follows:
.sub 'onload' :anon :load :init
load_bytecode 'PCT.pbc'
$P0 = get_hll_global ['PCT'], 'HLLCompiler'
$P1 = $P0.'new'()
$P1.'language'('Foo')
$P1.'parsegrammar'('Foo::Grammar')
$P1.'parseactions'('Foo::Grammar::Actions')
.end
.sub 'main' :main
.param pmc args
$P0 = compreg 'Foo'
$P1 = $P0.'command_line'(args)
.end
{{ this is the most important part; is this enough? }}
The Parrot distribution contains a Perl script to generate a compiler stub, containing all necessary files. This generated compiler will compile out of the box. It is highly suggested to use this script to get started with the PCT. The script is located in tools/dev/mk_language_shell.pl
.
{{ Not sure whether the mk_language_shell.pl script should be mentioned long term. In a sense, this script can also be considered part of the parrot compiler "tools", as it is used to create a compiler. }}
Parser Synopsis
grammar Foo is PCT::Grammar;
rule TOP {
<statement>*
{*}
}
rule statement {
<ident> '=' <expression>
{*}
}
rule expression is optable { ... }
proto infix:<+> is precedence('1') is pirop('n_add') { ... }
rule 'term:' is tighter(infix:<+>) is parsed(&term) { ... }
rule term {
| <ident> {*} #= ident
| <number> {*} #= number
}
Actions Synopsis
{{ Is this a good idea? }}
class Foo::Grammar::Actions;
method TOP($/) {
my $past := PAST::Block.new( :blocktype('declaration'), :node($/) );
for $<statement> {
$past.push( $( $_ ) );
}
make $past;
}
method statement($/) {
make PAST::Op.new( $( $<ident> ),
$( $<expression> ),
:pasttype('bind'),
:node($/) );
}
method expression($/, $key) {
...
}
method term($/, $key) {
make $( $/{$key} );
}
Running the compiler
Running the compiler is then done as follows:
$ parrot foo.pbc [--target=[parse|past|post|pir]] <file>
{{ other options? Maybe --target=pbc in the future, once PBC can be generated? }}
DESCRIPTION
The Parrot Compiler Tools are specially designed to easily create a compiler targeting the Parrot Virtual Machine. The tools themselves run on Parrot, which implies that no other external programs are needed.
The PCT is a set of libraries and programs, to:
create a parser
create an intermediate data structure (Abstract Syntax Tree)
generate executable Parrot code
{{ Maybe just say it's used to create parrot-targeting compilers, not list these as =items }}
IMPLEMENTATION
The PCT is made up of the following libraries and programs:
Parrot Grammar Engine (PGE)
Parrot Abstract Syntax Tree (PAST) classes
Parrot Opcode Syntax Tree (POST) classes
PCT::HLLCompiler class
PCT::Grammar class
Although strictly speaking it is not part of the PCT, the Not Quite Perl (6) (NQP) language is typically used in all PCT-based compilers. NQP is a subset of the Perl 6 language, and is a high-level language as an alternative for PIR.
Compilation phases
A PCT-based compiler has by default four compilation phases, or transformations. Phases can be removed and added through the API of the HLLCompiler
class. These are:
source to parse tree
The source is read, parsed and stored in a parse tree.
parse tree to PAST
The parse tree is converted into a Parrot Abstract Syntax Tree.
PAST to POST
The PAST is converted into a Parrot Opcode Syntax Tree.
POST to PIR
The POST is converted into executable Parrot Intermediate Representation.
Source to Parse Tree
The first stage of a PCT-based compiler is done by the parser
. The parser is defined as a set of Perl 6 Rules, which is processed by the Perl 6 Rules compiler. This results in a generated PIR file that implements the parser.
{{ Doesn't this make the Perl 6 rules compiler part of the PCT? }}
During the first stage, the source (input string) is parsed, resulting in a parse tree
.
Parse tree to PAST
The second stage converts the parse tree into a Parrot Abstract Syntax Tree (PAST). PAST is a data structure consisting of PAST nodes, each of which represents a common HLL construct. While all languages differ in syntax, many constructs in different HLLs map to the same semantics. This second transformation is executed during the parse stage. The transformations of the parse tree nodes into PAST nodes is done by so-called parse actions, which are methods of a class that is specified through the parseactions
attribute of the HLLCompiler. Such classes are implemented in NQP.
{{ How do we say that this is not obligatory; you could also use PIR, and in the future maybe other languages. }}
PAST to POST
The third transformation converts the PAST into a Parrot Opcode Syntax Tree (POST). PAST nodes represent HLL constructs, which are transformed into a set of low-level POST nodes. A POST node is a low-level node, representing a single instruction, label, or a subroutine. While a PAST is very close to a HLL program, a POST is much closer to PIR code.
POST to PIR
The last transformation generates PIR code from the POST.
The generated PIR is then fed into the Parrot executable, and processed into Parrot Byte Code (PBC) by the PIR compiler.
Parrot Grammar Engine
The Parrot Grammar Engine (PGE) is a component that executes regular expressions. Besides classic regular expressions, it also understands Perl 6 Rules. Such rules are special regular expressions to define a grammar.
The start symbol in a grammar is named TOP
; this is the top-level rule that is executed when the parser is invoked.
Operator precedence parsing
{{ insert stuff about using an operator prec. table here }}
Parrot Abstract Syntax Tree
The PCT includes a set of PAST classes. PAST classes represent common language constructs, such as a while statement
. These are described extensively in "pdds/pdd26_ast.pod" in docs.
Parrot Opcode Syntax Tree
POST::Node
POST::Node is the base class for all other POST classes.
POST::Op
POST::Ops
POST::Label
POST::Sub
PCT::Grammar
The class PCT::Grammar
is a built-in grammar class that can be used as a parent class for a custom grammar. This class defines a number of rules and tokens that are inherited by child classes. Note that the concept of class
and grammar
are equivalent.
The following rules are predefined:
{{ is this necessary, or just a reference to the file? }}
- ident
- ws
PCT::HLLCompiler
All PCT-based compilers use a HLLCompiler object as a compiler driver. It acts as a facade for the compiler. This object invokes the different compiler phases.
HLLCompiler API Methods
{{ TODO: complete this }}
- language
-
$P0.'language'('Foo')
- parsegrammar
-
$P0.'parsegrammar'('Foo::Grammar')
- parseactions
-
$P0.'parseactions('Foo::Grammar::Actions')
- commandline_prompt
-
$P0.'commandline_prompt'($S0)
sets the string in
$S0
as a commandline prompt on the compiler in$P0
. The prompt is the text that is shown on the commandline before a command is entered when the compiler is started in interactive mode. -
$P0.'commandline_banner'($S0)
sets the string in
$S0
as a commandline banner on the compiler in$P0
. The banner is the first text that is shown when the compiler is started in interactive mode. This can be used for a copyright notice or other information.
ATTACHMENTS
None.
REFERENCES
docs/pdd26_ast.pod
http://dev.perl.org/perl6/doc/design/syn/S05.html