YAPC::EU 2010
The Perl Compiler
rurban - Reini Urban <br> Graz, Austria
See the screencast of this talk at http://vimeo.com/14058377
What's new?
- Fixed most bugs (in work) bytecode: 12=>0, c: 6=>1, cc: 9=>5, 5.14 CVs
- .plc platform compatible, almost version compatible (.plc header change)
- added testsuite
- more and better optimisations (in work)
- B::C::Flags (customised extra_cflags + extra_libs)
- removed B::Stash bloat from perlcc, -stash [optional]
Who am I
rurban maintains cygwin perl since 5.8.8 and 3-4 modules, guts, B::* => 5.10
Mostly doing LISP, Perl, C, bash and PHP, and support for custom HW, windows + linux + real-time systems in real-life. Coding in winter, surfing in summer.
1995 first on CPAN with the perl5.hlp file and converter for Windows, and the windows dll versioning.
Contents
Compiler was started 1995 by Malcom Beattie, abandoned 2007 by p5p, revived 2008 by me.
Very dynamic language: magic; tie; eval "require $foo;" -> which packages to import?
- Overview
- Status
- Plans
Why use B::C / perlcc?
- Improved startup time, esp. significant with larger code.
- -fcog: less destruction time, -fno-destruct: no destruction time.
- Reduced memory usage. <br><small> 9% less memory w/ 25000 lines</small>
- Distribute binary only versions
- And with B::CC - Improve run-time
Overview
In the Perl Compiler suite B::C are three seperate compilers:
- B::Bytecode / ByteLoader (freeze/thaw to .plc + .pmc)
- B::C (freeze/thaw to .c)
- B::CC (optimising to .c)
perl toke.c/op.c - B::C - perl op walker run.c
Eliminate the whole parsing and dynamic allocation time.
The Walker (Basics)
After compilation walk the "op tree" - run.c
The Walker (Basics)
Observation
1. The op tree is not a "tree", it is reduced to a simple linked list of ops. Every "op" (a pp_<opname>
function) returns the next op.
2. PERL_ASYNC_CHECK was called after every single op.
Perl Phases - the "Perl Compiler"
- => Parse + Compile to op tree (in three phases, see perlguts and perloptree) <br>
- BEGIN (use ...)
- CHECK (O modules)
- INIT (main phase)
- END (cleanup, perl destructors)
Normal Perl functions start at INIT, after BEGIN and CHECK. <br> The O modules start at CHECK, and skip INIT.
Perl Phases - the "B Compilers"
- Parse + Compile to op tree (in three phases)
- BEGIN (use ...)
- => CHECK (O) => freeze
- compiled INIT (main phase)
- compiled END (cleanup, perl destructors)
Perl Phases - the "B Compilers"
The B::C compiler, invoked via O, freezes the state in CHECK, and invokes then the walker.
$ perl -MO=C,-omyprog.c -e'print $a;' <br>
$ cc_harness -o myprog myprog.c <br>
$ ./myprog
B::C - Unoptimised / the walker
B::CC - The optimiser / unrolled (1)
B::CC - The optimiser / unrolled (2)
B::CC - The optimiser / unrolled (3)
<br><br><br>
- no CALL_FPTR - call by ref
- static direct function call
- prefetched into CPU cache!
- no unneeded stack handling
- PERL_ASYNC_CHECK only at certain ops
Status
5.6.2 and 5.8.9 non-threaded B::C are quite usable and have the least known bugs, but 5.10 and 5.12 became also pretty stable now. 5.14 still has some CV problems.
Best are in the following order: 5.6.2, 5.8.9, 5.10, 5.12 non-threaded.
Status Targets
- Bugfixes for B::C (magic, xsub detection)
- Test top100 CPAN modules (3-5 fail, all with magic)
- Isolate bugs into simple tests (45 cases)
- Test the perl cores suite (~20 fails) Estimated 3-4 more open bugs.
Status Summary
- 5.6.2, 5.8.9, 5.10, 5.12 not-threaded are almost bug free, with B::Bytecode and B::C
- B::C >=5.10 threaded (magic, pads) in work <br> 2-3 minor bugs with certain modules
- With debugging perls there seem to be less bugs than with releases. <small>Normally it's the other way round</small>
- B::CC has some limitations and some more known bugs
-
See testsuite and STATUS
Projects
Which software is compiler critical?
Projects
Which software is compiler critical?
Execution time is the same (sans B::CC)
Projects
Which software is compiler critical?
Execution time is the same (sans B::CC)
Startup time is radical faster
Projects
Which software is compiler critical?
Execution time is the same (sans B::CC)
Startup time is radical faster.
Web Apps with fast response times -
1 sec more or less => good or bad software
Projects
Which software is compiler critical?
Execution time is the same (sans B::CC)
Startup time is radical faster.
Web Apps with fast response times -
Optimise static initialization - strings and arrays
New Optimisations
Optimise static initialization - strings and arrays
non-threaded ! +10-20% performance
ltrace
reveils Gthr_key_ptr
, gv_fetchpv
, savepvn
, av_extend
and safesysmalloc
as major culprits, the later three at startup-time.
New Optimisations
Optimise static initialization - strings and arrays
non-threaded ! +10-20% performance
common constant strings with gcc -Os => automatically optimised
New Optimisations
Optimise static initialization - strings and arrays
non-threaded ! +10-20% performance
common constant strings with gcc -Os => automatically optimised
av_extend - run-time malloc => static arrays ?
New Optimisations
av_extend - run-time malloc => static arrays ?
static arrays are impossible if not Readonly
can not be extended at run-time, need to be realloc'ed into the heap.
But certain arrays can: -fro-inc (Readonly @INC), and compad names and symbols.
New Optimisations
av_extend - run-time malloc => static arrays ?
pre-allocate faster with -fav-init or -O3 with independent_comalloc()
Same for hashes and strings (nyi).
Real Life Applications
cPanel has used B::C compiled 5.6 for a decade, and wants to switch to 5.8.9 (or later).
cPanel offers web hosting automation software that manages provider data, domains, emails, webspace. A typical large webapp. Perl startup time can be too slow for many AJAX calls which need fast initial response times.
Benchmarks (by cPanel)
Larger code base => more significant startup improvements
- 18.78x faster startup for large production applications. (~ 70000 lines)
- 3.52x faster startup on smaller applications. (~8000 lines)
- 3x faster startup on tiny applications < 1024 lines of code
- 2x faster startup for very tiny applications
- Guessed: 2x-10x faster run-time for CC optimised code, esp. arithmetic.
Benchmarks (by cPanel)
Web Service Daemon <br>
Resident Size (perlcc) 9072 <br>
Resident Size (perl) 9756 <br> <br>
DNS Settings Client <br>
Startup Time (perl) 0.074 <br>
Startup Time (perlcc) 0.021 <br> <br>
HTML Template Processor <br>
Startup Time (perlcc) 0.037 <br>
Startup Time (perl) 0.695 <br>
Plans
2010: Find and fix all remaining bugs
2010: Faster testsuite (Now 8 min - 40min - 2 days
)
2011: CC type and sub optimisations
2012: CC unrolling => jit within perl (perl -j
) see my gist
B::C Limitations
run-time ops vs compile-time<br> BEGIN blocks only compile-time side-effects.
BEGIN { <br>
use Package; # okay <BR>
chdir "dir"; # not okay. <BR>
# only done at compile-time, not at the user<BR>
print "stuff"; # okay, only at compile-time <BR>
eval "what"; # hmm; depends <br>
}
Move eval "require Package;" to BEGIN
B::CC Limitations
run-time ops vs compile-time +
dynamic range 1..$foo
goto/next/last $label
Undetected modules behind eval "require": <br> use -uModule to enforce scanning these
B::CC Bugs
Custom sort BLOCK is buggy, wrong queue implementation
B::CC Bugs
Custom sort BLOCK is buggy, wrong queue implementation, causing an endless loop
sort { $a <=> $b } <br>
<small>is optimised away, ok</small><br><br>
sort { $hash{$a} <=> $hash{$b} } <br>
<small>maybe?</small><br><br>
sort { $hash{$a}->{field} <=> $hash{$b}->{field} } <br>
<small>for sure not</small>
Testsuite
user make test (via cpan):
45x (bytecode + c -O0 - O4 + cc -O0 - O2)
=> 8 min
Testsuite
author make test:
45x bytecode + c -O0 - O4 + cc -O0 - O2 (8 min)
modules.t top100 (16 min)
+ testcore.t (16 min)
=> ~40 min
Testsuite
author make test 40 min
for 5-10 perls (5.6, 5.8, 5.10, 5.12, 5.14 / threaded + non-threaded) 5*2=10
on 5 platforms (cygwin, debian, centos, solaris, freebsd)
=> 33 h (10*5*40 = 2000min) = 1-2 days, similar to the gcc testsuite.
Testsuite
top100 modules. See webpage or svn repo for results for all tested perls / modules
With 5.8 non-threaded 3 fails File::Temp B::Hooks::EndOfScope YAML
With blead debugging + threaded 27 fails
log.modules-5.010001:pass MooseX::Types #TODO generally
log.modules-5.012001-nt:fail MooseX::Types #TODO generally
log.modules-5.013003-nt:pass MooseX::Types #TODO generally
log.modules-5.013003d:fail MooseX::Types #TODO generally
CC
- Sub calls - Opcodes (on CPAN)
-
What can we statically leave out per pp_?
Now: arguments passing, return values for 50% ops
Planned: more + direct xsub calls.
- Types - understand declarations
-
Now: Unroll for known static types pp_opname completely into simple arithmetic.
Known static types at compile-time? User declarations or Devel::TypeCheck
CC - User Type declarations
Currently:
my $E<lt>nameE<gt>_i; IV integer <br>
my $E<lt>nameE<gt>_ir; IV integer in a pseudo register <br>
my $E<lt>nameE<gt>_d; NV double
<hr>
Future ideas are type qualifiers such as <br> <code>my (int $foo, double $foo_d); </code>
or attributes such as <br> <code>my ($foo:Cint, $foo:Creg_int, $foo:Cdouble);</code>
Links
http://search.cpan.org/dist/B-C/
http://code.google.com/p/perl-compiler/
mailto:perl-compiler@googlegroups.com