I have implemented a framework for efficient transformation of CPS code. The code is too big to be presented in a blog post; I have set up a github account to put it (https://github.com/mlemerre/l-lang/). It has been working for several months, but I have spent a long time to improve its structure, write commented interfaces, and document it to make it easily readable, as I have done with the previous modules. Do not hesitate to tell me about any comments you may have, on the code or documentation.
The code is based on the paper "Compiling with continuations, continued" by Andrew Kennedy (which is very well written and easy to read), itself inspired by "Shrinking lambda expressions in linear time", by Andrew W. Appel and Trevor Jim.
The CPS representation in Andrew Kennedy's paper provides many interesting features:
Shrinking reductions rules are very easy to understand, and can be used as a basis for expressing guaranteed optimizations. For instance, it should be easy to state that functions used only once are inlined, that tuples used only to pass information locally are not heap-allocated, etc.
The modules I have implemented provide means to access, print, or change terms in the CPS intermediary language. The main modules, are represented on the figure below.
The Base
module is the entry point for accessing the CPS
representation, and the first module if trying to understand my code.
It provides read-only direct access to the CPS representation, and to
the links between variables, occurrences, and their binding sites. It
also gives access to the other modules that implement the CPS
manipulation framework:
Ast
: Really a part of Base
representation, it provides access to
the CPS representation using simple algebraic datatypes of syntax
tree.
Check
: Allows checking for some invariants of terms in the CPS
representation. Some information in the representation is redundant,
which allows fast access; this module checks that redundant
information is in sync. For instance, if a term contains a variable,
it checks that the variable's uplink also points to that term.
Build
: Provides functions that allows to create new terms in the
CPS representation, without worrying about the complexities of the
representation. The API of this module is based on the idea of
higher-order abstract syntax, i.e. where binding creations in the
destination language (L) language correspond to creation of bindings
in the source language (OCaml).
Traverse
: Allows folding and iteration on the terms, variables,
and occurrences of the CPS representation. Using this module allows,
in particular, code to be independent of future changes to the CPS
data structures, which will be gradually improved.
Change
: Provides high-level functions to change CPS terms. This is
how transformation passes modify terms in the CPS representation.
Print
: Provides a human-readable representation of the CPS form. I
have tried to come up with a representation that is "easy" to read;
the representation looks more like SSA and/or assembly than
classical lambda-calculus (which make it much easier to read on huge
terms). This textual representation should also be easily parsable
(although I did not write the parser).
Def
: Implements and provides accessors and a first level of
abstraction to the CPS data structures. It relies on Var
, which
implements the relationships between variables and their occurrences
(itself based on the Union_find
data structure described earlier
on this blog).
The other modules on the figure are Union_find
and Unique
, which
are "support" modules; and Closure_conversion
and
Shrinking_reductions
, which are transformation passes on the CPS
representation. These two passes are not yet in the github repository.
I have a working closure conversion that gives me a complete basic working compiler for the L programming language, based on this CPS framework. I plan to document it and upload it to github very soon. I also have implemented some basic shrinking reductions.
Next I will concentrate on the parser and the L abstract syntax tree, and I will be will be covering the syntax and semantics of L in a future blog post. But here is already an excerpt of test L code (that uses first-class functions) that can be compiled to LLVM:
assert( { let true = { (x,y) -> x } let false = { (x,y) -> y } let pair = { (first,second) -> boolean -> boolean( first, second) } let first = { p -> p( true)} let second = { p -> p( false)} let p = pair( 7, 5) second( p) * (first( p) + second( p)) } == 60)