The standard academic literature is most useful for the extreme frontend (parsing) and the extreme backend (SSA, instruction selection and code generation), but the middle-end is ignored. This is fine if you want to learn how to build, e.g., the next LLVM: a fat backend with a very thin frontend. By Fernando Borretti.
Imagine a compiler as a 2D grid. The rows are the stages of compilation: from parsing, through the middleend, to the backend. The columns are the different facets of the programming language being implemented. For a smaller language like C, the different facets are: global variable declarations, enum declarations, type declarations, function declarations, and nothing else. More featureful languages add extra facets: interface and class definitions in C++, trait and impl definition in Rust, etc.
This article contains some of the lessons I learned writing the compiler for Austral, a new systems programming language with linear types that I’ve been working on for a while. The first few sections are high-level, the rest most specific to using OCaml to write a compiler.
The article then dives into:
- Implementation strategies
- Just use OCaml
- Just use Menhir
- What tests are useful
- Compilation model
- Code organization
- Bootstrapping issues
- The environment
- Identifying declarations
- Errors
- Remarks
There are, broadly, two ways to implement a compiler. In the waterfall strategy, each stage is implemented one after the other: the complete parser, then the complete semantic analysis pass, then the complete type checking pass, and on to the backend stages. Excellent read!
[Read More]