I have finished implementing the compilation of pattern matching for
L. L source code can now use ML-style patterns, as used in the
match
, let
, and fun/function
operators of OCaml. See
http://caml.inria.fr/pub/docs/oreilly-book/html/book-ora016.html for
an example in OCaml if you don't know about pattern matching.
Compilation of pattern matching happens together with the CPS transformation phase of the compiler, which translates the "AST" language (used in particular to perform type inference) to the "CPS" language (in which jumps and order of computation is made explicit). CPS transformation fixes the order of evaluation, so it makes sense to perform pattern compilation during this phase.
As the rest of the compiler, this module has been been written in literate programming style (or at least, it is heavily commented, so that it should be understandable by someone with no prior knowledge of the subject). I have extracted the header of the module at the bottom of this post.
The next step is to finish cleaning the rest of the CPS transformation (which is much easier); with this the entire backend of the compiler will have been published.
The match(expression){rules}
expression matches an expression
against a set of rules; its behaviour is to evaluate the expression
once, and try to match the resulting value against the patterns in
rules
until one succeeds, in which case the corresponding body is
executed.
The order of the rules is thus important: a value can be matched by
the patterns of two rules, e.g. (5,6)
is matched by (5,_)
and
(_,6)
; but only the body of the first rule is executed. This
implies that two successive rules in a match
can be reordered if,
and only if, they are disjoint, i.e. they cannot match the
same value.
But when allowed, reordering and factorizing the compilation of rules matching lead to more compact, faster code. Trying to produce the most efficient code for matching against a set of rules is a NP-complete problem (the complexity arise when compiling multiple independent patterns, for instance tuple patterns). Rather than attempting to solve this problem, L specifies how pattern matching is compiled, which allows the developper to visualize the costs of its pattern matching.
The compiler maintains a list of patterns that remain to be matched for each rule, and a list of values against which each rule is matched. The list of the first pattern to be matched in each rules is the first column, and is matched against the first value. Several cases can occur (see in the book "The implementation of functional programming languages" by Simon Peyton Jones, the chapter 5: "Efficient compilation of pattern matching", by Philip Wadler):
When there are no remaining patterns to match, and no remaining values, it means that all the rules match. As pattern matching selects the first rule that matches, we execute the body of the first rule, and discard the other rules with a warning.
For instance in
match(){
() → body1
() → body2
}
body1
matches and is executed; body2
also matches, but is
superseded by body1
, and is just discarded.
In this case, in each rule the variable is bound to the value, and matching continues. For instance in
match(expr1,...){
(a,...) → body1
(b,...) → body2
}
a
is bound to v1
in the first rule, and b
in the second rule;
where v1
is a CPS variable representing the result of computing
expr1
in the condition of the match (computation in the condition
is thus not repeated). Matching then proceeds, starting from the
second column.
This rule can be extended to incorporate wildcard (_
) patterns
(where nothing is bound), and all irrefutable patterns.
A pattern is irrefutable if it does not contain any variant.
For instance, consider
match((expr1,expr2),...){
(a,...) → body1
(_,...) → body2
((c,d),...) → body3
}
The column contains only irrefutable patterns. Let v1
be a
CPS variable containing the evaluation of expr1
, and v2
containing the evaluation of expr2
. Then a
is bound to
(v1,v2)
, c
to v1
, and d
to v2
.
A
constructor is a specific version of a variant type; for
instance 3
is a constructor of Int
, True
a constructor of
Bool
, and Cons(1,Nil)
a constructor of List
<Int
>.
Note that if the column contain a variant, then all the constructors that it contains are of the same type: this is necessary for the pattern matching to typecheck.
When two contiguous rules have different constructors at the same place, they cannot match the same value simultaneously: they are thus disjoint, and can be swapped. This allows to group the rules according to the constructor in their first column (the order of the rules within a group being preserved).
For instance,
match(expr1, expr2){
(Cons(a,Cons(b,Nil)),Nil) → body1
(Nil,Nil) → body2
(Nil,c) → body3
(Cons(d,e),_) → body4
}
can be grouped as (preserving the order between rules 1 and 4, and 2 and 3) :
match(expr1, expr2){
(Cons(a,Cons(b,Nil)),Nil) → body1
(Cons(d,e),_) → body4
(Nil,Nil) → body2
(Nil,c) → body3
}
Then, the matching of contiguous rules with the same constructor can be factorized out, as follow:
let v1 = expr1 and v2 = expr2 in
match(v1){
Cons(hd,tl) → match((hd,tl),v2){
((a,Cons(b,Nil)), Nil) → body1
((d,e),_) → body4
}
Nil → match(v2){
Nil → body2
c → body3
}
}
Note that the L compiler matches the values in the constructor
before matching the other columns of the tuple, as was exemplified
in the Cons
rules.
The construct of matching only the constructors of a single variant
type can be transformed directly into the CPS case
expression. It
is generally compiled very efficiently into a jump table, and
dispatching to any rule is done in constant time. (Note that the
compiler may not generate a jump table if the list of constructors
to check is sparse).
If the first column contains both kind of patterns, the list of rules is split into groups such that the ordering between rules is preserved, and either all the rules in the group have their first pattern that is refutable, or they are all irrefutable.
For instance, the following match:
match((v1,v2),v3){
(_,1) → 1
((a,b),2) → a+b+2
((3,_),3) → ...
((4,_),_) → ...
((_,5),_) → ...
((a,b),c) → a+b+c
}
is split into three groups, with respectively rules 1-2 (_
and
(a,b)
are both irrefutable patterns), rules 3-5, and rule 6. Then
the groups are matched successively, the next group being matched
only if no rule matched in the first one. This amount to performing
the following transformation:
let c = ((v1,v2),v3) in
match(c){
(_,1) → 1
((a,b),2) → a+b+2
_ → match(c){
((3,_),3) → ...
((4,_),_) → ...
((_,5),_) → ...
_ → match(c){
((a,b),c) → a+b+c
}
}
}
Note that rules 3,4,5 also need to be split and transformed further using this method.
The order in which checking is made for a set of patterns is a choice, done by the compiler. L chooses to match the tuples from left to right, and the contents of the constructor as soon as they are matched; and to split rules according to the refutability of the pattern in their first remaining column. This choice may not be optimal in every case (but minimizing the number of matches is a NP-hard problem), but allows for a simple, visual analysis of the cost of pattern matching. The user is free to rearrange the set of patterns to improve performance (possibly guided by compiler hints).
At the beginning of a match, all the components, needed by at least one rule, that can be retrieved (i.e. components in a tuple etc., but not those that are under a specific constructor) are retrieved. When a constructor is matched, all the components that can be retrieved that were under this constructor are retrieved. This behaviour produces the most compact code (avoid duplicating retrieval of elements in the compilation of several rules), but maybe not the most efficient (sometimes elements are retrieved that are not used). Optimizations, such as shrinking reductions, are allowed to move down or even duplicate code performing retrieval of elements into the case.
Compilation of pattern matching is done during the CPS transformation, which transforms source code from the AST language to the CPS language. There are several reasons for that:
As a side note, it makes sense to keep pattern matching in the AST language, because patterns are easy to type, and any typing error can be more easily returned to the user.
&&
and ||
). For instance, compiling:match(v){ (4,5) → 1; _ → 2 }
gives:
let k_not4_5() = { kreturn(2) }
match(#0(v)){
4 → match(#1(v)){
5 → kreturn(1)
_ → k_not4_5()
}
_ → k_not4_5()
}
Matching against (4,5)
can fail at two different steps, and the
action to perform in these two cases are the same, so they should
be factorized using the same continuation.
The L compiler does not yet allow it, but "or-patterns" (i.e. in
match(l){ Cons(Nil∣Cons(_,Nil)) → 0 _ → 1 }
) also need join
points. Finally, there is also a joint point (in
expr_env.context
) to which the value of the bodies in each rule
is returned.
All the functions that involve building CPS code are themselves in
CPS style; see the Cps_transform_expression
module for an
explanation.
Here is a (contrived) exemple of a complete pattern matching:
This pattern is compiled as follows. We begin by creating a join continuation, which is where the result of the match is returned. This allows to factorize the following computation (the addition to 17 in our case).
let kfinal(x) = { let x17 = x + 17 in halt(x17) }
Then, the condition of the match e
is evaluated, and its result
stored in a temporary value.
let v = ... eval e ...
Then, analysis of the patterns show that v
contains a tuple.
let v.0 = #0(v)
let v.1 = #1(v)
Analysis of the patterns also show that v.0
contain a tuple.
v.1
is a variant type, so its elements cannot be retrieved yet.
let v.0.0 = #0(v.0)
let v.0.1 = #1(v.0)
We begin by analysis the whole pattern (i.e. column c0). All the rules are refutable, except the last one, so we split them into two contiguous blocks bi and bii; bii is executed if matching against all the rules in bi fail.
decl kb_ii
All the rules in bi are tuples, so we inspect them from left to right (i.e. we begin by column c1, then proceed with c4). Analysis of column c1 yields three contiguous blocks: the patterns in column c1 are all irrefutable for block bi.a, refutable for block bi.b, and irrefutable again for block bi.c.
decl kb_i.b, kb_i.c
As the patterns of column c1 in rules in bi.a are all
irrefutable, we just have to associate the variable x
to v.0.0
and a
to v.0.1
for the translation of the body of the rules.
(x
and a
are unused in the rules of the example).
We can then proceed with the analysis of column c2 (still in block bi.a). It is a variant, so we can regroup the rules according to the constructor, and perform a simple case analysis.
decl kcons
match(v.1){
Nil → { kfinal(2) }
Cons(x) → { kcons(x) }
}
For the Nil
constructor, we are already done. For Cons
, we have
to discriminate against the patterns inside the Cons
. But first,
we analyze these patterns to retrieve all the elements that are
needed:
let kcons(x) = {
let x.0 = #0(x)
let x.1 = #1(x)
let x.0.0 = #0(x.0)
let x.0.1 = #1(x.0)
There are two contiguous blocks: one with rule 1 and 3 (since rule
2 has been regrouped with the Nil
), and one with rule 4. We begin
with the 1-3 block:
decl knext
match(x.0.0){
1 → { kfinal(1)}
3 → { kfinal(3)}
_ → { knext()}
}
If matching against the 1-3 block fails, we match against rule 4. If this fails, then matching against all the rules in bi.a failed, and we try to match against the rules in bi.b.
let knext() = {
match(x.0.1){
4 → { kfinal(4)}
_ → { kb_i.b()}
}
}
}
The rest of the matching is very similar. In bi.b.1, the
matching against rules 5 and 7 is factorized, because there is a
common constructor. (Note that there is no factorization on Cons
between rules 4 and 5, because they are in different blocks). Then
blocks bi.b.2, bi.b.3, bi.c are tried successively.
In bi.c, the test of Cons
is factorized, but not the test
for Nil
, because testing Nil
is done after testing 10
.
Finally, the pattern matching always succeeds since rule 12 is
irrefutable, so there is no need to introduce code that perform
match_failure
in case nothing succeeds in matching.
Note that the presentation would have been clearer if the patterns had been regrouped differently; in particular, grouping rules who share a constructor matched as the same time (e.g. exchanging rules 1 and 2, and rule 5 and 6) would improve the presentation.