A framework for CPS transformation (and a Github account)

by: Matthieu Lemerre  tags: github, cps, and l  published: 30 December 2012

I have implemented a framework for efficient transformation of CPS code. The code is too big to be presented in a blog post; I have set up a github account to put it (https://github.com/mlemerre/l-lang/). It has been working for several months, but I have spent a long time to improve its structure, write commented interfaces, and document it to make it easily readable, as I have done with the previous modules. Do not hesitate to tell me about any comments you may have, on the code or documentation.

The code is based on the paper "Compiling with continuations, continued" by Andrew Kennedy (which is very well written and easy to read), itself inspired by "Shrinking lambda expressions in linear time", by Andrew W. Appel and Trevor Jim.

The CPS representation in Andrew Kennedy's paper provides many interesting features:

  • It is efficiently compilable and can use a stack; see the translation of this representation to the SSA form of LLVM.
  • The representation separates continuation from normal functions; this ensures that continuations do not require heap allocations and are compiled into jumps. This allows to express control flow, and control flow optimizations, in the representation.
  • Appel and Jim, and Kennedy have developed a representation that allows efficient (in-place) rewrite of terms while maintaining the links between variables, their occurrences, and their binding sites. This allows to implement shrinking reductions (and other transformations, such as closure conversion) in linear time.

    Shrinking reductions rules are very easy to understand, and can be used as a basis for expressing guaranteed optimizations. For instance, it should be easy to state that functions used only once are inlined, that tuples used only to pass information locally are not heap-allocated, etc.

The modules I have implemented provide means to access, print, or change terms in the CPS intermediary language. The main modules, are represented on the figure below.

Graph of CPS modules

The Base module is the entry point for accessing the CPS representation, and the first module if trying to understand my code. It provides read-only direct access to the CPS representation, and to the links between variables, occurrences, and their binding sites. It also gives access to the other modules that implement the CPS manipulation framework:

  • Ast: Really a part of Base representation, it provides access to the CPS representation using simple algebraic datatypes of syntax tree.
  • Check: Allows checking for some invariants of terms in the CPS representation. Some information in the representation is redundant, which allows fast access; this module checks that redundant information is in sync. For instance, if a term contains a variable, it checks that the variable's uplink also points to that term.
  • Build: Provides functions that allows to create new terms in the CPS representation, without worrying about the complexities of the representation. The API of this module is based on the idea of higher-order abstract syntax, i.e. where binding creations in the destination language (L) language correspond to creation of bindings in the source language (OCaml).
  • Traverse: Allows folding and iteration on the terms, variables, and occurrences of the CPS representation. Using this module allows, in particular, code to be independent of future changes to the CPS data structures, which will be gradually improved.
  • Change: Provides high-level functions to change CPS terms. This is how transformation passes modify terms in the CPS representation.
  • Print: Provides a human-readable representation of the CPS form. I have tried to come up with a representation that is "easy" to read; the representation looks more like SSA and/or assembly than classical lambda-calculus (which make it much easier to read on huge terms). This textual representation should also be easily parsable (although I did not write the parser).
  • Def: Implements and provides accessors and a first level of abstraction to the CPS data structures. It relies on Var, which implements the relationships between variables and their occurrences (itself based on the Union_find data structure described earlier on this blog).

The other modules on the figure are Union_find and Unique, which are "support" modules; and Closure_conversion and Shrinking_reductions, which are transformation passes on the CPS representation. These two passes are not yet in the github repository.

I have a working closure conversion that gives me a complete basic working compiler for the L programming language, based on this CPS framework. I plan to document it and upload it to github very soon. I also have implemented some basic shrinking reductions.

Next I will concentrate on the parser and the L abstract syntax tree, and I will be will be covering the syntax and semantics of L in a future blog post. But here is already an excerpt of test L code (that uses first-class functions) that can be compiled to LLVM:

assert( { let true = { (x,y) -> x }
	  let false = { (x,y) -> y }
	  let pair = { (first,second) -> boolean -> boolean( first, second) }
	  let first = { p -> p( true)}
	  let second = { p -> p( false)}
	  let p = pair( 7, 5)
	  second( p) * (first( p) + second( p)) } == 60)