The L parser has been cleaned up and is now commited on github. As always, it is written in a literate programming style, and you can have a look at the generated documents corresponding to its interface and implementation.
For the implementation I wrote a big diagram explaining how TDOP parsing works in practice. I also wrote a more interactive version of this diagram below for this blog post.
To understand this diagram, you should have a look at the tdop.mli file, that defines the interface of the TDOP parser. For people in a hurry, here is a summary of the main points:
parse
function that will, in our case, parse an
expression.
aEb
, where a
and b
are tokens and E
is an expression, then the right binding power of
a
is compared to the left binding power of b
. E
is associated
with the the token with the highest binding power.
This scheme is more powerful than classical operator precedence, and in particular allows to deal with associativity.
parse
function.
The following diagram will explain both how the parser works, and how it is used to parse a simple mathematical expression.
This explanation is probably a bit short if you never encountered a TDOP parser before, so I really encourage you to read tdop.mli, see how it's used in the L parser, or have a look at other nice presentations of TDOP on the internet (or of their implementation in other languages than OCaml).
I found that TDOP parsers were easy to implement, easy to use, and in particular match well the thought process. I believe that human beings distinguish "-" by this notion of "prefix" versus "infix" position, for instance, and mentally parse strings using relative priority between infix tokens; they are not expanding parsing rules to see if one matches.
In the future, I would like to experiment with TDOP-based parser generation. The extension to EBNF notation that you can see in the documentation of the parser allows to see what it could look like.
A future blog article will present the syntax of the L language in more details.