Simon Willison on compilers

16 posts tagged “compilers”

2025

llvm: InstCombine: improve optimizations for ceiling division with no overflow—a PR by Alex Gaynor and Claude Code. Alex Gaynor maintains rust-asn1, and recently spotted a missing LLVM compiler optimization while hacking on it, with the assistance of Claude (Alex works for Anthropic).

He describes how he confirmed that optimization in So you want to serialize some DER?, taking advantage of a tool called Alive2 to automatically verify that the potential optimization resulted in the same behavior.

Alex filed a bug, and then...

Obviously the next move is to see if I can send a PR to LLVM, but it’s been years since I was doing compiler development or was familiar with the LLVM internals and I wasn’t really prepared to invest the time and energy necessary to get back up to speed. But as a friend pointed out… what about Claude?

At this point my instinct was, "Claude is great, but I'm not sure if I'll be able to effectively code review any changes it proposes, and I'm not going to be the asshole who submits an untested and unreviewed PR that wastes a bunch of maintainer time". But excitement got the better of me, and I asked claude-code to see if it could implement the necessary optimization, based on nothing more than the test cases.

Alex reviewed the resulting code very carefully to ensure he wasn't wasting anyone's time, then submitted the PR and had Claude Code help implement the various changes requested by the reviewers. The optimization landed two weeks ago.

Alex's conclusion (emphasis mine):

I am incredibly leery about over-generalizing how to understand the capacity of the models, but at a minimum it seems safe to conclude that sometimes you should just let the model have a shot at a problem and you may be surprised -- particularly when the problem has very clear success criteria. This only works if you have the capacity to review what it produces, of course. [...]

This echoes Ethan Mollick's advice to "always invite AI to the table". For programming tasks the "very clear success criteria" is extremely important, as it helps fit the tools-in-a-loop pattern implemented by coding agents such as Claude Code.

LLVM have a policy on AI-assisted contributions which is compatible with Alex's work here:

[...] the LLVM policy is that contributors are permitted to use artificial intelligence tools to produce contributions, provided that they have the right to license that code under the project license. Contributions found to violate this policy will be removed just like any other offending contribution.

While the LLVM project has a liberal policy on AI tool use, contributors are considered responsible for their contributions. We encourage contributors to review all generated code before sending it for review to verify its correctness and to understand it so that they can answer questions during code review.

Back in April Ben Evans put out a call for concrete evidence that LLM tools were being used to solve non-trivial problems in mature open source projects:

I keep hearing #AI boosters / talking heads claiming that #LLMs have transformed software development [...] Share some AI-derived pull requests that deal with non-obvious corner cases or non-trivial bugs from mature #opensource projects.

I think this LLVM optimization definitely counts!

(I also like how this story supports the idea that AI tools amplify existing human expertise rather than replacing it. Alex had previous experience with LLVM, albeit rusty, and could lean on that knowledge to help direct and evaluate Claude's work.)

# 30th June 2025, 4:44 pm / alex-gaynor, compilers, llvm, open-source, ai-assisted-programming, anthropic, claude, coding-agents, claude-code

2024

Figma’s journey to TypeScript: Compiling away our custom programming language (via) I love a good migration story. Figma had their own custom language that compiled to JavaScript, called Skew. As WebAssembly support in browsers emerged and improved the need for Skew’s performance optimizations reduced, and TypeScript’s maturity and popularity convinced them to switch.

Rather than doing a stop-the-world rewrite they built a transpiler from Skew to TypeScript, enabling a multi-year migration without preventing their product teams from continuing to make progress on new features.

# 4th May 2024, 2:08 pm / compilers, javascript, typescript, webassembly

Ruff v0.4.0: a hand-written recursive descent parser for Python. The latest release of Ruff—a Python linter and formatter, written in Rust—includes a complete rewrite of the core parser. Previously Ruff used a parser borrowed from RustPython, generated using the LALRPOP parser generator. Victor Hugo Gomes contributed a new parser written from scratch, which provided a 2x speedup and also added error recovery, allowing parsing of invalid Python—super-useful for a linter.

I tried Ruff 0.4.0 just now against Datasette—a reasonably large Python project—and it ran in less than 1/10th of a second. This thing is Fast.

# 19th April 2024, 5 am / compilers, python, rust, ruff, astral

2023

Lark parsing library JSON tutorial (via) A very convincing tutorial for a new-to-me parsing library for Python called Lark.

The tutorial covers building a full JSON parser from scratch, which ends up being just 19 lines of grammar definition code and 15 lines for the transformer to turn that tree into the final JSON.

It then gets into the details of optimization—the default Earley algorithm is quite slow, but swapping that out for a LALR parser (a one-line change) provides a 5x speedup for this particular example.

# 13th August 2023, 9:50 pm / compilers, json, parsing, python

Mapping Python to LLVM (via) Codon is a fascinating new entry in the “compile Python code to something else” world—this time targeting LLVM. Ariya Shajii describes in great detail how it pulls this off, including tricks such as transforming Python generators to LLVM coroutines. Codon doesn’t promise that all Python code will work—it’s best thought of as a Python-like language which can be used to create compiled modules which can then be imported back into regular Python projects.

# 10th January 2023, 2:08 am / compilers, llvm, python

2022

A CGo-free port of SQLite. Fascinating Go version of SQLite, which uses Go code that has been translated from the original SQLite C using ccgo, a package by the same author which “translates cc ASTs to Go source code”. It claims to pass the full public SQLite test suite, which is very impressive.

# 30th January 2022, 10:25 pm / compilers, go, sqlite

Writing a minimal Lua implementation with a virtual machine from scratch in Rust. Phil Eaton implements a subset of Lua in a Rust in this detailed tutorial.

# 15th January 2022, 6:29 pm / compilers, lua, rust, phil-eaton

2021

Introducing stack graphs (via) GitHub launched “precise code navigation” for Python today—the first language to get support for this feature. Click on any Python symbol in GitHub’s code browsing views and a box will show you exactly where that symbol was defined—all based on static analysis by a custom parser written in Rust as opposed to executing any Python code directly. The underlying computer science uses a technique called stack graphs, based on scope graphs research from Eelco Visser’s research group at TU Delft.

# 9th December 2021, 11:07 pm / compilers, github, python, rust

lex.go in json5-go. This archived GitHub repository has a beautifully clean and clear example of a hand-written lexer in Go, for the JSON5 format (JSON + comments + multi-line strings). parser.go is worth a look too.

# 19th August 2021, 8:15 pm / compilers, go, json

2020

A hands-on introduction to static code analysis. Useful tutorial on using the Python standard library tokenize and ast modules to find specific patterns in Python source code, using the visitor pattern.

# 5th May 2020, 12:15 am / compilers, python, staticanalysis

A Compiler Writing Journey (via) Warren Toomey has been writing a self-compiling compiler for a subset of C, and extensively documenting every step of the journey here on GitHub. The result is an extremely high quality free textbook on compiler construction.

# 8th January 2020, 3:33 am / c, compilers

2010

grammar.coffee (via) The annotated grammar for CoffeeScript, a new language that compiles to JavaScript developed by DocumentCloud’s Jeremy Ashkenas. The linked page is generated using Jeremy’s Docco tool for literate programming, also written in CoffeeScript. CoffeeScript itself is implemented in CoffeeScript, using a bootstrap compiler originally written in Ruby.

# 8th March 2010, 7:27 pm / coffeescript, compilers, docco, documentcloud, javascript, jeremy-ashkenas, literateprogramming, programming, ruby, selfhosting

2009

Developing for the Apple iPhone using Flash. A brilliant feat of engineering: Adobe worked around Apple’s “no runtime allowed” rules by writing a compiler front end for LLVM that compiles ActionScript 3 to ARM assembly code, and apparently ported the regular Flash drawing APIs as well.

# 5th October 2009, 9:15 pm / actionscript, adobe, compilers, flash, hacking, iphone, llvm

2008

Simple Top-Down Parsing in Python. Eye-opening tutorial on building a recursive descent parser for Python, in Python that uses top-down operator precedence.

# 19th July 2008, 11:37 pm / compilers, effbot, fredrik-lundh, parsing, python, recursivedescent

python4ply tutorial. python4ply is a parser for Python written in Python using the PLY toolkit, which compiles to Python bytecode using the built-in compiler module. The tutorial shows how to use it to add support for Perl-style 1_000_000 readable numbers.

# 11th March 2008, 5:49 am / compilers, lexing, parsing, python, python4ply

2007

NestedVM. Provides binary translation from a GCC compiled MIPS binary to a Java class file, letting you run anything supported by GCC on the JVM with no source changes.

# 11th July 2007, 2:52 pm / compilers, gcc, java, nestedvm

Simon Willison’s Weblog