Simon Willison’s Weblog

Subscribe

Wednesday, 17th December 2025

firefox parser/html/java/README.txt (via) TIL (or TIR - Today I was Reminded) that the HTML5 Parser used by Firefox is maintained as Java code (commit history here) and converted to C++ using a custom translation script.

You can see that in action by checking out the ~8GB Firefox repository and running:

cd parser/html/java
make sync
make translate

Here's a terminal session where I did that, including the output of git diff showing the updated C++ files.

I did some digging and found that the code that does the translation work lives, weirdly, in the Nu Html Checker repository on GitHub which powers the W3C's validator.w3.org/nu/ validation service!

Here's a snippet from htmlparser/cpptranslate/CppVisitor.java showing how a class declaration is converted into C++:

    protected void startClassDeclaration() {
        printer.print("#define ");
        printer.print(className);
        printer.printLn("_cpp__");
        printer.printLn();

        for (int i = 0; i < Main.H_LIST.length; i++) {
            String klazz = Main.H_LIST[i];
            if (!klazz.equals(javaClassName)) {
                printer.print("#include \"");
                printer.print(cppTypes.classPrefix());
                printer.print(klazz);
                printer.printLn(".h\"");
            }
        }

        printer.printLn();
        printer.print("#include \"");
        printer.print(className);
        printer.printLn(".h\"");
        printer.printLn();
    }

Here's a fascinating blog post from John Resig explaining how validator author Henri Sivonen introduced the new parser into Firefox in 2009.

# 1:48 am / c-plus-plus, firefox2, henri-sivonen, java, john-resig, mozilla

Tool Codex Timeline Viewer — View Mozilla Codex rollout timeline events by uploading or pasting a JSONL file to explore the sequence of messages, function calls, and system events with interactive filtering and search capabilities. Each event displays its timestamp in your local timezone or UTC, with detailed views showing formatted content, tool parameters, and reasoning blocks for supported event types.
Research Litestream S3 Replication Experiments — Experiments in this project evaluate Litestream’s robustness when SQLite writes occur while Litestream is stopped and later restarted, with focus on replication to S3. Both the simple restart and the scenario where the WAL is checkpointed (truncated) while Litestream is offline confirm no data loss: Litestream either streams pending WAL changes upon restart or detects a database change and uploads a new full snapshot (“generation”).
Tool Claude Code Timeline Viewer — View Claude Code session `.jsonl` files as an interactive timeline with customizable filtering and search capabilities. This tool displays events chronologically, extracting conversation messages, tool calls, and file snapshots with formatted previews of text content, code blocks, and embedded images. Use the file picker, drag-and-drop, paste input, or URL fetch to load your session data and explore it with timezone switching, content-type filters, and easy JSON export.
Release llm-gemini 0.28 — LLM plugin to access Google's Gemini family of models

Gemini 3 Flash

Visit Gemini 3 Flash

It continues to be a busy December, if not quite as busy as last year. Today’s big news is Gemini 3 Flash, the latest in Google’s “Flash” line of faster and less expensive models.

[... 1,271 words]

AoAH Day 15: Porting a complete HTML5 parser and browser test suite (via) Anil Madhavapeddy is running an Advent of Agentic Humps this year, building a new useful OCaml library every day for most of December.

Inspired by Emil Stenström's JustHTML and my own coding agent port of that to JavaScript he coined the term vibespiling for AI-powered porting and transpiling of code from one language to another and had a go at building an HTML5 parser in OCaml, resulting in html5rw which passes the same html5lib-tests suite that Emil and myself used for our projects.

Anil's thoughts on the copyright and ethical aspects of this are worth quoting in full:

The question of copyright and licensing is difficult. I definitely did some editing by hand, and a fair bit of prompting that resulted in targeted code edits, but the vast amount of architectural logic came from JustHTML. So I opted to make the LICENSE a joint one with Emil Stenström. I did not follow the transitive dependency through to the Rust one, which I probably should.

I'm also extremely uncertain about every releasing this library to the central opam repository, especially as there are excellent HTML5 parsers already available. I haven't checked if those pass the HTML5 test suite, because this is wandering into the agents vs humans territory that I ruled out in my groundrules. Whether or not this agentic code is better or not is a moot point if releasing it drives away the human maintainers who are the source of creativity in the code!

I decided to credit Emil in the same way for my own vibespiled project.

# 11:23 pm / definitions, functional-programming, ai, generative-ai, llms, ai-assisted-programming, ai-ethics, vibe-coding, ocaml

Tuesday, 16th December 2025
Thursday, 18th December 2025