NAME
C::Blocks - embeding a fast C compiler directly into your Perl parser
SYNOPSIS
use strict;
use warnings;
use C::Blocks;
use C::Blocks::PerlAPI; # for printf
print "Before block\n";
cblock {
/* This is bare C code! */
printf("From C block\n");
int foo = 1;
printf("foo = %d, which is %s\n", foo,
(foo % 2 == 1 ? "odd (but not weird)" : "even"));
}
print "After first block\n";
clex {
/* This function will only be visible to other c code blocks in this
* lexical scope. */
void print_location(int block_numb) {
printf("From block number %d\n", block_numb);
}
}
cblock {
print_location(2);
}
# A function that sums all of the arguments
csub csum {
/* get "items" variable, and stack pointer
* variables used by ST() */
dXSARGS;
int i;
double sum = 0.;
/* Sum the given numeric values. */
for (i = 0; i < items; ++i) sum += SvNV( ST(i) );
/* Prepare stack to receive return values. */
XSprePUSH;
/* Push the sum onto the return stack */
mXPUSHn(sum);
/* Indicate we're returning a single value
* on the stack. */
XSRETURN(1);
}
my $limit = shift || 5;
my $return = csum(1 .. $limit);
print "sum of 1 to $limit is $return\n";
### In file My/Fastlib.pm
package My::Fastlib;
use C::Blocks;
use C::Blocks::PerlAPI;
cshare {
/* This function can be imported into other lexical scopes. */
void say_hi() {
printf("Hello from My::Fastlib\n");
}
}
1;
### Back in your main Perl script
# Pull say_hi into this scope
use My::Fastlib;
cblock {
say_hi();
}
### Use Perl to generate code at compile time
# Create a preprocessor string with the full
# path to our configuration file
use File::HomeDir;
use File::Spec;
clex {
#define CONF_FILE_NAME ${ '"' .
File::Spec->catfile(File::HomeDir->my_home, 'myconf.txt')
. '"' }
}
print "All done!\n";
PRE-BETA
This project is currently in pre-beta. Is known to compile and pass its test suite on a number of Windows, Linux, and Mac machines. Once the test suite has been expanded, it will move to Beta, v0.50. For more on goals and milestones, see the distribution's README.
DESCRIPTION
Perl is great, but sometimes I find myself reaching for C to do some of my computational heavy lifting. There are many tools that help you interface Perl and C. This module differs from most others out there by providing a way of inserting your C code directly where you want it called, rather than hooking up a function to C code written elsewhere. This module was also designed from the outset with an emphasis on easily sharing C functions and data structures spread across various packages and source files. Most importantly, the C code you see in your script and your modules is the C code that gets executed when your run your script. It gets compiled by the extremely fast Tiny C Compiler at script parse time.
C::Blocks
achieves all of this by providing new keywords that demarcate blocks of C code. There are essentially three types of blocks: those that indicate a procedural chunk of C code that should be run, those that declare C functions, variables, etc., that are used by other blocks, and those which produce XS functions that get hooked into the presently compiling package.
Procedural Blocks
When you want to execute a block of procedural C code, use a cblock
:
use C::Blocks;
print "1\n";
cblock {
printf("2\n");
}
print "3\n";
This produces the output
1
2
3
Code in cblock
s have access to declarations contained in any clex
or cshare
blocks that precede it. These blocks are discussed in the next section.
You can also use sigiled variable names in your cblock
s, and they will be mapped directly to the correct lexically scoped variables.
use C::Blocks;
my $message = 'Greetings!';
my @array;
cblock {
printf("The message variable contains: [%s]\n",
SvPVbyte_nolen($message));
sv_setnv($message, 5.938);
av_push(@array, newSViv(7));
}
print "After the cblock, message is [$message]\n";
print "and array contains @array\n";
This produces the output
The message variable contains: [Greetings!]
After the cblock, message is [5.938]
and array contains 7
An important low-level detail is that the actual variable name for the SV*, AV*, or HV* in your C code is based on the original variable name with some gentle mangling. This lets you use C-side variables with the "same" name (sans the sigil):
my $N = 100;
my $result;
cblock {
int i;
int result = 0;
int N = SvIV($N); /* notice "same" name N */
for (i = 1; i < N; i++) result += i;
sv_setiv($result, result);
}
print "The brute-force sum from 1 to 100 is $result\n";
print "Gauss would have said ", $N * ($N - 1) / 2, "\n";
Private C Declarations
A great deal of C's power lies in your ability to define compact data structures and reusable chunks of code. When you wish to declare such data structures or functions, use a clex
block. (If you wish to write a module full of functions and data structures for others to use, you will use a cshare
block, which I'll explain shortly.) The declarations in such a block are available to any other cblock
s, clex
s, cshare
s, and csub
s that appaer later in the same lexical scope as the clex
block.
Such a block might look like this:
use C::Blocks;
use C::Blocks::PerlAPI;
clex {
typedef struct _point_t {
double x;
double y;
} point;
double point_distance_from_origin (point * loc) {
/* Uncomment for debugging */
// printf("x is %f, y is %f\n", loc->x, loc->y);
return sqrt(loc->x * loc->x + loc->y * loc->y);
}
/* Assume they have an SV packed with a point struct */
point * _point_from_SV(pTHX_ SV * point_SV) {
return (point*)SvPV_nolen(point_SV);
}
#define point_from_SV(point_sv) _point_from_SV(aTHX_ point_sv)
}
Notice that I need to include PerlAPI
because I use structs and functions defined in the Perl C API (SV*
and SvPVbyte_nolen
). The function sqrt
is defined in libmath, but that gets brought along with the Perl API, so we don't need to explicitly include it.
Later in your file, you could make use of the functions in a cblock
such as:
# Generate some synthetic data;
my @pairs = map { rand() } 1 .. 10;
# Uncomment for debugging:
#print "Pairs are @pairs\n";
# Assume pairs is ($x1, $y1, $x2, $y2, $x3, $y3, ...)
# Create a C array of doubles, which is equivalent to an
# array of points with half as many array elements
my $points = pack 'd*', @pairs;
# Calculate the average distance to the origin:
my $avg_distance;
cblock {
point * points = point_from_SV($points);
int N_points = av_len(@pairs) / 2 + 0.5;
int i;
double length_sum = 0;
for (i = 0; i < N_points; i++) {
length_sum += point_distance_from_origin(points + i);
}
sv_setnv($avg_distance, length_sum / N_points);
}
print "Average distance to origin is $avg_distance\n";
How does this work? First, the code in the clex
block gets compiled down to machine code immediately after it is encountered. Second, C::Blocks copies the C E<symbol table> for the code in the clex
and stores a reference to it in a lexically scoped location. Later blocks consult the symbol tables that are referenced in the current lexical scope, and copy individual symbols on an as-needed basis into their own symbol tables.
This code could be part of a module, but none of the C declarations would be available to modules that use
this module. clex
blocks let you declare private things to be used only within the lexical scope that encloses the clex
block. If you want to share a C API, for others to use in their own cblock
, clex
, cshare
, and csub
code, you should look into the next type of block: cshare
.
Shared C Declarations
I mentioned that the symbol tables of clex
blocks are copied and a lexically scoped reference is made to the copy. The same is true of cshare
blocks, but a reference is also stored in the current package. Later, when somebody use
es the module (or otherwise calls the package's import
method in a BEGIN
block), the references to all cshare
symbol tables are copied into the caller's lexically scoped set of symbol tables.
For example, if the clex
block given in the private declarations example were a cshare
block in a module called My/Module.pm, others could use the functions and struct definition by saying
use My::Module;
They would then be able to call point_from_SV
. Equally important, access to those declarations is lexically scoped. Thus:
{
use My::Module;
cblock {
point * p = point_from_SV($var); /* no problem */
}
}
cblock {
point * p = point_from_SV($var); /* dies: unknown type "point" */
}
The second cblock
is outside of the block in which My::Module
was use
d. This means that its reference of symbol tables does not include the declarations from My::Module
.
Breaking Sharing
How do shared C declarations work? When C::Blocks
encounters a cshare
, it injects a special import
method into the package that's being compiled. This import
method properly copies the symbol table references in a lexically scoped way so when some other code use
s the pacage, the symbol tables are available for use in cblock
s, etc. If your module provides its own import method, or has package-scoped variables such as our $import
, C::Blocks will issue a warning and refrain from injecting the method.
If your module needs to provide its own import functionality, you can still get the code sharing with something like this:
no warnings 'C::Blocks::import';
sub import {
... your code here
C::Blocks::libloader::import(__PACKAGE__);
}
This will perform the requisite magic to make the code from your cshare
blocks visible to whichever packages use
your module, and avoid the warning.
WARNING: At least for now, be sure to declare your import
method before any cshare
blocks in your package. Declaring them after a cshare
block causes Perl to crash, probably because I'm not doing something right.
XSUBs
Since the advent of Perl 5, the XS toolchain has made it possible to write C functions that can be called directly by Perl code. Such functions are referred to as XSUBs. C::Blocks provides similar functionality via the csub
keyword. Just like Perl's sub
keyword, this keyword takes the name of the function and a code block, and produces a function in the current package with the given name.
Writing a functional XSUB requires knowing a fair bit about the Perl argument stack and manipulation macros, so it will have to be discussed at greater depth somewhere else. For now, I hope the example in the "SYNOPSIS" is enough to get you started. For a more in-depth discussion, see http://blog.booking.com/native-extensions-for-perl-without-smoke-and-mirrors.html. Once you've gotten through that, check out perlapi.
Generating C Code
Because C::Blocks
hooks on keywords, it is naturally invoked in cblock
, cshare
, clex
, and csub
blocks which are themselves contained within a string eval. However, string evals compile at runtime, not script parse time. Although it would be easy to generate C code using Perl, writing useful clex
and cshare
blocks is tricky.
For this reason, C::Blocks
provides a bit of notation for an "interpolation block." An interpolation block is a block of Perl code that is run as soon as it is extracted (i.e. during script compile time). The return value is then inserted directly into the text that gets compiled. Thus, these two cblock
s end up doing the same thing:
cblock {
printf("Hello!\n");
}
cblock {
${ 'printf' } ("Hello!\n");
}
The example given in the "SYNOPSIS" is probably more meaningful. It also illustrates that the value returned by the Perl code has to be literal C code, including the left and right double quotes for strings. This arises because sigils (and interpolation blocks by extension, as they are delimited by a sigil) are ignored within strings and C comments.
Note: The current implementation is unpolished. In particular, it does not intelligently handle exceptions thrown during the evaluation of the Perl code. (Indeed, at the moment it suppresses them.)
For the most part, any side effects from the code contained in interpolation blocks behave exactly like side effects from BEGIN blocks. There is an exception, however, for Perls earlier than 5.18. In these older Perls, lexical variables become uninitialized after all interpolation blocks execute, but before any BEGIN blocks run. This only applies to lexically scoped variables, however. Changes to package-scoped variables (including lexically scoped names, i.e. our $package_var
) persist, as would be expected if these variables were set in BEGIN blocks.
Configuring the Compiler
Sometimes you need to configure the compiler. The most common situation that arises involves linking against external libraries. You may also need to include traditional compiler command-line arguments which you obtain from the command-line, or from a configuration module. The current means to do this is by setting special-purposed package variables which get examined during the compilation stage.
NOTE: THIS API IS UNDER DEVELOPMENT. UNTIL C::BLOCKS REACHES v1.0, THIS API IS SUBJECT TO CHANGE, LIKELY WITHOUT NOTICE.
To set most compiler settings, you simply treat $C::Blocks::compiler_options
like the command-line. For example, if you want to set preprocessor definition at runtime (but don't want to use an interpolation block for some reason), you can use
BEGIN { $C::Blocks::compiler_options = '-DDEBUG' }
A very important aspect to remember is that the compiler options, and the shared libraries mentioned next, only apply to the first block they encounter. The process of compiling a block clears these variables.
For C::Blocks, tcc does not handle linking to shared libraries, because it does not know how to open shared libraries on Mac systems. Instead, C::Blocks manages the shared libraries itself, loading libraries and looking up symbols using Dynaloader. For this reason, the shared libraries are not indicated with the typical -L
and -l
flags as compiler options. Instead, each library should be added to the package variable @C::Blocks::libraries_to_link
. Each string in this list should be the full library name, including file extensions. If the library is located in an unconventional location, the full path should be specified.
Compiler and Linker Warnings
Compiler warnings (such as assignment from incompatible pointer type
) and linker warnings (need example...) can be turned on and off using the warnings pragma with categories C::Blocks::compiler
and C::Blocks::linker
. For example:
use warnings; # compiler and linker warnings ON
...
no warnings 'C::Blocks::compiler'; # compiler warnings OFF
...
use warnings 'C::Blocks::compiler'; # back ON
The warnings are handled using Perl's built-in warnings system, so as with all warnings, the reporting of compiler and linker warnings can be controlled lexically.
PERFORMANCE
C::Blocks
is not a silver bullet for performance. Other Perl libraries more tailored to your goal may serve you better. Sometimes they will lead to fewer lines of code, or clearer code, than the corresponding C code. Other times they will be built on solid libraries which are blazing fast already. C::Blocks
is implemented using the Tiny C Compiler, a compiler that compiles fast and produces machine code, but which is of mediocre quality. If you compiled the exact same code with a high-quality compiler such as gcc -O3
, it would take longer to compile, but the resulting machine code would be more efficient. When can you expect a C::Blocks solution to be a good choice?
When not to use C::Blocks
Don't rewrite an existing XS module using C::Blocks. A C::Blocks API to your XS code might be useful, but don't rewrite mature XS code. C::Blocks can save you from the effort of producing a new XS distribution, but if you've already put in that effort, don't throw it away.
Don't replace a handful of Perl statements with their C-API equivalents. Perl's core has been pretty highly optimized and is compiled at high optimization levels. At best, you'll get incremental performance gains, and they will likely come at the expense of many additional lines of code. This probably isn't worth it.
Don't discount the cost of marshalling Perl data into C data. Obtaining C representations of your data will always cost you at least a few clock cycles, and it will usually add lines of code, too. You're likely to see the best performance benefits if you can marshall the data as early as possible and use that C-accessible data many times over. For example, if you have a data-parsing stage in which you build a complex data structure representing that data, try to build a C structure instead of a Perl structure at parse time. All future operations will have access to the C representation.
C::Blocks vs Perl and PDL
In what follows, I assume you have already marshalled your data into a C data structure, like an array or a struct.
C::Blocks
outperforms Perl on O(N) numeric calculations on arrays, often by a factor greater than 10. (An O(N) calculation is any algorithm that only needs to examine each data point once, so the calculation should scale with the number of data points.) In fact, C::Blocks
is competitive with PDL in such calculations. C::Blocks
requires more lines of code, though. For a calculation of the average of a dataset, PDL uses only one line, a Perl implementation uses three, and C::Blocks uses 14. What you gain in speed you lose in lines of code.
Another interesting comparison between PDL and C::Blocks is the calculation of euclidian distance for an N-dimensional vector, where N scales from very small to very large numbers. The calculation is always O(N), but is more complex than the simple average already discussed, and not explicitly implemented as a low-level PDL routine. The PDL implementation is only a single very readable line, highlighting PDL's expresiveness. The C::Blocks implementation is 14 lines of traditional C code, making it straight-forward but lengthy. The C::Blocks has the upper hand in execution rate---always faster than PDL, though never more than by a factor of two---and in predictable scaling---almost perfectly linear in system size, vs slightly nonlinear behavior in the PDL implementation. I'd say the number of lines of code is the primary deciding factor here, but the trade-off might fall differently for more complicated calculations.
The calculation of the Mandelbrot set provides a very interesting benchmark. The algorithm involves a loop that has a fixed maximun number of iterations, but which can exit early if the calculation converges. This exit-early algorithm knocks PDL out of the race. There's no good way to implement this in PDL short of writing a low-level implementation.
The comparsion between C::Blocks and PDL can best be summarized thus. If you have a very small dataset, less than 1000 elements, C::Blocks will out-perform PDL due to PDL's costly method launch mechanism. If you have multiple tightly nested for-loops, where operations within the for loops are based on the indices, then C::Blocks will likely give you a competitive computation rate, at the cost of many more lines of code. If those for-loops have the possibility of an early exit, PDL may run significantly slower than C::Blocks, and may even run slower than pure Perl. Finally, if you have image manipulations or calculations, PDL is almost certainly the better tool, as it has a lot of low-level image manipulation routines already.
C::Blocks vs Graph
I have not had the opportunity to write and run additional benchmarks for C::Blocks. The next obvious choice would be a comparison with Graph, but I have not yet endeavored to produce those calculations.
KEYWORDS
The way that C::Blocks
provides these functionalities is through lexically scoped keywords: cblock
, clex
, cshare
, and csub
. These keywords precede a block of C code encapsulated in curly brackets. Because these use the Perl keyword API, they parse the C code during Perl's parse stage, so any code errors in your C code will be caught during parse time, not during run time.
In addition to these keywords, C::Blocks
lets you indicate types and type conversion with cisa
. Unlike the other keywords, this keyword is not followed by a block of code, but the type and a list of variables.
- cblock { code }
-
C code contained in a
cblock
gets wrapped into a special type of C function and compiled during the compilation stage of the surrounding Perl code. The resulting function is inserted into the Perl op tree at the precise location of the block and is called when the interpreter reaches this part of the code.The code in a
cblock
is wrapped into a function, so function and struct declarations are not allowed. Also, variable declarations and preprocessor definitions are confined to thecblock
and will not be present in latercblock
s. For that sort of behavior, seeclex
.Variables with
$
sigils are interpreted as referring to theSV*
representing the variable in the current lexical scope, unless otherwise specified with acisa
statement.Note: If you need to leave a
cblock
early, you should use areturn
statement without any arguments. This will also bypass the data repacking provided bycisa
types. - clex { code }
-
clex
blocks contain function, variable, struct, enum, union, and preprocessor declarations that you want to use in othercblock
,clex
,cshare
, andcsub
blocks that follow. It is important to note that these are strictly declarations and definitions that are compiled at Perl's compile time and shared with other blocks.Sigil variables in
clex
blocks are currently ignored. -
cshare
blocks are just likeclex
blocks except that the declarations can be shared with other modules when theyuse
the current module. - csub name { code }
-
C code contained in a csub block is wrapped into an xsub function definition. This means that after this code is compiled, it is accessible just like any other xsub.
Currently,
csub
does not work. - cisa type variable-list
-
If you include sigil variables in your
cblock
blocks (notclex
,cshare
, orcsub
, justcblock
), they will normally be resolved to the underlying SV data structure for that variable. Under many circumstances, you do not need to manipulate the SV itself, but merely need the data contained in the SV (or the object pointed to by the SV). Acisa
statement tells C::Blocks that certain variables should be represented by a C data structure other than an SV. The package used for the type (must) have package constants that indicate the C type to use, and how to marshall the data at the beginning and end of your block.cisa
statements also have the runtime responsibility of validating the data in the variables. Failed validations should probably throw exceptions indicating which variables did not satisfy validation, and why they failed. Your validation code can make as much or as little noise as you deem appropriate, from quietly setting$@
to warning to throwing exceptions. Note that you could include validation code in the initialization function, butcisa
validation is only called once percisa
statement, whereas the variable initialization code is called at the beginning of eachcblock
that uses the variable.Packages that represent types must include the package variables
$TYPE
and$INIT
. The first indicates the C type while the second indicates a C macro or function that accepts an SV and returns the data of type$TYPE
.$CLEANUP
is an optional macro or function that takes the original SV and the (presumably revised) data, and updates the contents of the SV. Runtime type checking is performed by the package methodcheck_var_types
, which gets key/value pairs of the variable name and the variable.
SEE ALSO
This module uses a special fork of the Tiny C Compiler. The fork is located at https://github.com/run4flat/tinycc, and is distributed through the Alien package provided by Alien::TinyCCx. To learn more about the Tiny C Compiler, see http://bellard.org/tcc/ and http://savannah.nongnu.org/projects/tinycc. The fork is a major extension to the compiler that provides extended symbol table support.
For other ways of compiling C code in your Perl scripts, check out Inline::C, FFI::TinyCC, C::TinyCompiler, and XS::TCC.
For mechanisms for calling C code from Perl, see FFI::Platypus and FFI::Raw.
If you just want to mess with C struct data from Perl, see Convert::Binary::C.
If you're just looking to write fast code with compact data structures, http://rperl.org/ may be just the ticket. It produces highly optmized code from a subset of the Perl language itself.
AUTHOR
David Mertens (dcmertens.perl@gmail.com)
BUGS
Please report any bugs or feature requests for the Alien bindings at the project's main github page: http://github.com/run4flat/C-Blocks/issues.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc C::Blocks
You can also look for information at:
The Github issue tracker (report bugs here)
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
http://p3rl.org/C::Blocks http://search.cpan.org/dist/C-Blocks/
ACKNOWLEDGEMENTS
This would not be possible without the amazing Tiny C Compiler or the Perl pluggable keyword work. My thanks goes out to developers of both of these amazing pieces of technology.
LICENSE AND COPYRIGHT
Code copyright 2013-2015 Dickinson College. Documentation copyright 2013-2015 David Mertens.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 395:
Unknown E content in E<symbol table>