elle
binary; for a
shorter introduction to the main features of the language,
click here. Note that the
language is not yet frozen, so minor elements (especially syntax)
in the following are subject to change; this document will be
updated together with the code.
The L compiler is licensed under the Cecill-C license, a copyleft and LGPL-compatible license that allows linking without restriction.
The source code of the compiler is has been specially written to make it as readable as a book and welcome new contributors, or anyone that would like to study its internals.
Great languages have not only a nice (simple and powerful) semantic, but also a good (simple and readable) syntax. However it is easy to just accumulate syntax extensions as new features are added. In contrast, there has been many iterations that led to the current syntax, leading to a more consistent (i.e. simpler and less surprising) language. Actually, the syntax is not frozen yet, and minor changes can still happen before 1.0.
The syntax looks similar than C's, except that there is no distinction between statement and expressions: everything is an expression. This makes the language simpler and more uniform. However, sequences and introduction of variables is done using brace ({}) characters, which provide the nice alignment property and view of sequential execution found in imperative languages.
{ let a = if( x == 0) 3 else { let u = 5; u*(u-1) } let b = { let y = 5 let z = 6 y*y + z*z } print( a) print( b) a + b }
In L, type annotations are entirely optional, which contributes
to the lightness of the syntax. let
introduces
read-only bindings (preferred whenever possible),
and var
mutable ones. L preferred style is to use
let
bindings whenever possible (in particular to hold
intermediary results), and use var
only when mutation
is necessary. The use of mutable variables makes code more
difficult to understand, in particular when reusing a variable for
different purposes; limiting mutable variables and marking them
explicitly encourages a better style.
{ var x := 0 var y:Int := 5 // Explicit declaration: no runtime difference if (a == 3) { x := x + y } x } // Equivalent program without mutable variables { let x = 0 let y:Int = 5 // Explicit declaration: no runtime difference let x_possibly_incremented = if (a == 3) x else x+y x_possibly_incremented }
Delimiting statements in C-like languages is necessary to
disambiguate the syntax; for instance it allows to
differentiatef(x) - f(y)
from f(x); -f(y)
, or f(x+3)
from f; (x+3)
. Often, ';' is used as a statement
separator or terminator.
Instead, L relies on the idea of token separation to disambiguate such cases. There are four degrees of separation, that depend on the characters between two tokens; namely (from stronger to weaker):
a+\nb
contains one statement, while a\n+b
contains
two.
f(x)
is f
calling x
; f (x)
does not mean
anything; f\n(x)
and f;(x)
is f
followed by x
. Forcing stuck
calls enforce a single indentation style for all L programs,
and help identifying calls visually.
f(a, computation) .another_one(a_param) .final_computation
L syntax contributes to maintaining a correct alignment and indentation. When performing pattern matching, the start of a match is identified using indentation. Nested function calls are aligned, and provide visually a tree structure. Blocks, introduced with {}, allows to maintain the same level of indentation when writing a sequential computation.
{ let f = { 0 -> let a = 3 let b = 5 a + b 1 -> function1( a_function( 4, 5) another_one( an_argument, another_arg)) n -> 8 + n } f(4) }
Pattern matching is a construct that allows to extract several components of an object at once. For instance:
{ let x = (1,2,3) let (a,_,c) = x a + c // 4 }The
x
variable contains a tuple with three elements.
Later, we use pattern matching to extract the first and third
component out of the tuple. This code would be even more useful if
the tuple construction and pattern matching were more separated,
e.g. were in different functions.
Patterns appear in many different places in the code. As we saw,
The let
construct introduces a pattern (note:
the var
construct does not). But the arguments to a
function are also a tuple pattern:
def f(a,(b,c),d) = a + b + c + d { let u = (1,2) f(1,u,3) }
You can define your own datatypes (record and sum types), and
match them in patterns. Here is an example of a simple record
datatype. Note: the x:
and y:
parts are
optional, but they allow the use of the p.x
notation
to extract an individual component.
data Point(x:Int, y:Int) { let p = Point(1,2) let Point(a,b) = p a+b // 3 }
A more complex form allows to define sum types, a generalized form of enum that can carry data. Here is a simple example of the use of sum types to define the type of mathematical expressions, and a small calculator written in L:
data Expression { Number(Int) Add(Expression,Expression) Sub(Expression,Expression) Mul(Expression,Expression) Div(Expression,Expression) } def eval(e) = e.{ Number(n) -> n Add(e1,e2) -> eval(e1) + eval(e2) Sub(e1,e2) -> eval(e1) - eval(e2) Mul(e1,e2) -> eval(e1) * eval(e2) Div(e1,e2) -> eval(e1) / eval(e2) } eval( Add( 3, Mul( 2, 4))) // 11
The expression.{ pattern_1 -> body_1 ... pattern_n ->
body_n }
construct matches each pattern against the given
expression in turn, selects the first that matches, assigns the
variables to the corresponding components in the expression, and
executes the corresponding body. This construct allows to eliminate
a lot of boilerplate code.
The form data Point(Int,Int)
is actually a shorthand
for the case where the sum type has only one element: data
Point { Point(Int,Int) }
Sum types are ubiquitous when defining complex data structures; but it is often implicit. Its most common form in many other languages is the use of the special "null" pointer (this is Tony Hoare's billion dollar mistake, avoided by the L language.
With L one can write data structures and algorithms which are
parametrically polymorphic, i.e. that
work for any type.
Note that the length
and append
functions below are polymorphic (they work on both lists of Int
and Bool), without requiring any annotation to make them so.
data List<t> = { Cons(head:t, tail:List<t>) Nil } def length = { Cons( _, tail) -> 1+length(tail) Nil -> 0 } def list_int = Cons(11, Cons(22, Cons(33, Nil))) list_int.length // => 3 def list_bool = Cons(true, Cons(false, Nil)) lits_bool.length // => 2 def append(l1,l2) = l1.{ Nil -> l2 Cons(x,tail) = Cons(x, tail.append( l2)) } append( list_int, Cons(44, Nil)) // Cons(11, Cons(22, Cons(33, Cons( 44, Nil)))) append( Cons( true, Cons( true, Nil)), list_bool) // Cons( true, Cons( true, Cons( true, Cons( false, Nil))))
In L, functions are first-class citizens, meaning that they can
be passed around at will. Functions can be created simply using
the { arg1,arg2,...,argn -> body}
syntax.
{ let f = { a,b -> { x -> a * x + b }} let g = f(2,3) g(1) // 5 }
Observe that multiple arguments are passed as tuples, instead of being curried as in most functional languages. This allows to determine more easily when closures are created.
L features a special syntax for creating simple functions. Blocks that contain an '_' character are transformed into functions, each _ being replaced with an argument. This way of performing partial application avoids hard-to-find type errors arising when forgetting some of the arguments of a curried function.
def plus = {_+_} // equivalent to { a,b -> a + b } def three_times = { 3 * _ } // equivalent to { x -> 3 * x }
Functions argument are actually patterns and can be matched
directly. Actually, the arg.{ pattern1 -> body1; ...
patternn -> bodyn
syntax presented above is a special case
of function pattern matching, where arg
is matched
by applying it to a function.
def fact = { 0 -> 1 n -> n * fact(n -1) } fact(5) // 120
Passing functions downward is perfect for parameterizing an algorithm, like the traversal of a data structure or a sorting function.
// Splits l according to the predicate p. Stable. def partition(l:List<t>, p:t->Bool) = l.{ Nil -> Nil Cons(head,tail) -> let (part_true, part_false) = partition(tail,p) if p(head) (Cons(head,part_true), part_false) else (part_true, Cons(head, part_false)) } // Sorts l. def quicksort(l, greater?:t->t->Bool) = l.{ Nil -> NIl Cons(x,tail) -> let (great_ones,small_ones) = tail.partition{ _.greater?(x) } small_ones.quicksort(greater?).append( Cons( x, great_ones.quicksort(greater?))) } // Sort a list of integers using reversed order Cons( 2, Cons( 3, Cons( 1, Nil))).quicksort{ x,y -> x <= y } // => Cons( 3, Cons( 2, Cons( 1, Nil))) // Sort a list of pairs of integers using lexical order Cons( (2,2), Cons( (3,1), Cons( (2,1), Nil))).quicksort{ (x1,x2),(y1,y2) -> x1 > x2 || x1 = x2 && y1 >= y2 } // => Cons( (2,1), Cons( (2,2), Cons( (3,1), Nil)))Passing functions upward also has many uses. One is dynamic compilation, e.g. dynamically creating a function that matches a regular expression.
def id? = Regexp.compile("[A-Za-z_]+") // => id?: String -> Bool "hello".id? // => true "23".id? // => false
L is statically typed, which means that many simple errors are detected at compile time.
def applyn( f, n, x) = if( n == 0) x else f( applyn( f,n-1, x)) applyn( 3, { _ + 2 }, 10) // => Type error: expected t -> t, got Int applyn( { _ + 2 }, 3, 10) // => 16 applyn( { _ * 2 }, 10., 3) // => Type error: exected Int, got Float applyn( { _ * 2 }, 3, 10.) // => 80. append( list_int, list_bool) // => Type error: expected List<Int>, got List<Bool>
In addition, static typing guarantees that the program executes safely, e.g. it will not cause an unexpected segmentation fault; without causing the overhead of the runtime checks done in a dynamically typed language.
L features complete type inference, which means that the whole program can be written without a single type annotation. However, you can write type annotations if you want to.
def applyn( f:t -> t, n:Int, x:t) -> t = if( n == 0) x else f( applyn( f,n-1, x)) applyn( { _ + 2 }, 3, 10:Int) // => 16 applyn( { _ * 2 }:Float -> Float, 3, 10.) // => 80This combination combines the benefits of a lightweight syntax, as found in dynamic languages, with the strong error detection and efficiency of statically typed languages.
L provides several extension points. First, L offers a way to dynamically extend the concrete syntax of the language. Second, Lisp-like macros can perform changes in the parse tree. These extension points allow to add domain specific languages to L, for instance to mix XML and L, SQL and L, adding a nice syntax for the active record pattern...
One of the examples of the standard library where these macro
extensions are used is the print
macro. print
is followed by a list of expressions
between braces. This makes the printing code much more readable (as
compared to, for instance, C-like format strings).
print{ "Hello, " user.name "!\n" "Today you are " user.age + 1 ".\n" }
L features an interactive toplevel (or REPL) that you can use to try the language or debug a program.
$ elle interactive Welcome to L! > 4+5 9:Int
L can be used as an interpreter:
#!/usr/bin/env elle run print{ "Hello, world!\n"}
For more substantive development comprising several files, each file has to be compiled individually. The current supported workflow is to compile to a LLVM bitcode file, and then link the different modules with the LLVM linker. In the future L will support several compilation options, which allows choosing between faster compile time (during development) or faster resulting program (for production).
$ elle compile -to-llvm program.l -o program.bc $ file program.bc program.bc: LLVM bitcode $ clang -O2 -c -emit-llvm other.c -o other.bc $ llvm-ld program.bc other.bc -native program.run $ elle compile program.l -o program.run $ file program.run program.run: ELF 32-bit LSB executable, Intel 80386, ...
L is a functional programming language, but does not discourage the use of imperative features: in-place update is sometimes the way to write the most efficient algorithm.
L features mutable variables, introduced by the var
keyword. Assignment is done using the :=
assignment
operator. The only restriction regarding mutable variables is that
they cannot hold a polymorphic value (this would be unsound). L
preferred style is to avoid mutable variables when they can be
replaced by immutable ones (introduced by let
), because
they are harder to optimize (and can lead to less efficient code),
and their use can make code harder to understand.
def fact(n) = { var i := n var result := 1 while(i < n){ result := result * i; i := i+1 } result }
Note that variables must be given an initial value; in the future
we may assign variables to the undef
special value,
provided we can prove that the variable is not read before being
assigned.
To accommodate mix functional/imperative programming, L is a strict language with a completely defined evaluation order; i.e. arguments are evaluated from left to right before calling the function.
(f(3))(g(4),h(5)) // equivalent to: { let a = f(3); let b = g(4); let c = h(5); a(b,c) }
Delayed and lazy executions are possible, by making them explicit.
{ let delayed = { () -> print{ "Delayed hello world!"}} print{ "Before delayed "} delayed() } // Displays: Before delayed Delayed hello world!
What makes an optimization guaranteed is just the fact that it appears in the L standard, not just in some L compiler (The L compiler will also perform optimizations that will not be guaranteed). This may seem to be a tiny difference. In fact, being in the standard means that the code can rely on the optimization being performed for whatever version of the compiler being used, or even for a different implementation of L.
One use is to write modular code that may look inefficient as is, but that you know the compiler will make efficient by the sole use of the guaranteed optimizations. For instance, you can write code such as:
def an_option = true { var res := 0 [0..1000].each{ i -> [0..800].each{ j -> if( an_option) res := res + i*j else res := res + i+j } } }This code is logically clean, because the choice introduced by
if
is close to the different alternatives. But
as-is, it is horribly inefficient because the test is done inside
the loop. But a guaranteed optimization says that conditionals to
constants are removed away. Thanks to guaranteed optimization, you
know that every compiler will remove the test, so you can leave
the code as it is knowing that this will never cause a performance
issue. Actually, this guaranteed optimization remove the need for
preprocessing conditional in the language.
Another use of guaranteed optimization allows to control memory allocation. By guaranteeing that some allocations are always made on the stack, and by writing the functions carefully (and checking that all allocation is indeed done on the stack), you can writing real-time code, system code such as a garbage collector, or performance-critical code, using L's high-level language and constructs; instead of rewriting code in C. A guaranteed optimization that falls into this category is the classical tail-call optimization, that allows to bound stack allocation when functions are carefully written.
The list of guaranteed optimizations is not definite yet, and will grew as the need arise and the optimizations themselves are implemented in the prototype compiler. But they will comprehend at least:
goto
s), inlining of
functions used once, simplification of if(
constant)