Simon Willison’s Weblog

Subscribe

Wednesday, 17th December 2025

firefox parser/html/java/README.txt (via) TIL (or TIR - Today I was Reminded) that the HTML5 Parser used by Firefox is maintained as Java code (commit history here) and converted to C++ using a custom translation script.

You can see that in action by checking out the ~8GB Firefox repository and running:

cd parser/html/java
make sync
make translate

Here's a terminal session where I did that, including the output of git diff showing the updated C++ files.

I did some digging and found that the code that does the translation work lives, weirdly, in the Nu Html Checker repository on GitHub which powers the W3C's validator.w3.org/nu/ validation service!

Here's a snippet from htmlparser/cpptranslate/CppVisitor.java showing how a class declaration is converted into C++:

    protected void startClassDeclaration() {
        printer.print("#define ");
        printer.print(className);
        printer.printLn("_cpp__");
        printer.printLn();

        for (int i = 0; i < Main.H_LIST.length; i++) {
            String klazz = Main.H_LIST[i];
            if (!klazz.equals(javaClassName)) {
                printer.print("#include \"");
                printer.print(cppTypes.classPrefix());
                printer.print(klazz);
                printer.printLn(".h\"");
            }
        }

        printer.printLn();
        printer.print("#include \"");
        printer.print(className);
        printer.printLn(".h\"");
        printer.printLn();
    }

Here's a fascinating blog post from John Resig explaining how validator author Henri Sivonen introduced the new parser into Firefox in 2009.

# 1:48 am / c-plus-plus, firefox2, henri-sivonen, java, john-resig, mozilla

2025 » December

MTWTFSS
1234567
891011121314
15161718192021
22232425262728
293031