Wednesday, 17th December 2025
firefox parser/html/java/README.txt (via) TIL (or TIR - Today I was Reminded) that the HTML5 Parser used by Firefox is maintained as Java code (commit history here) and converted to C++ using a custom translation script.
You can see that in action by checking out the ~8GB Firefox repository and running:
cd parser/html/java
make sync
make translate
Here's a terminal session where I did that, including the output of git diff showing the updated C++ files.
I did some digging and found that the code that does the translation work lives, weirdly, in the Nu Html Checker repository on GitHub which powers the W3C's validator.w3.org/nu/ validation service!
Here's a snippet from htmlparser/cpptranslate/CppVisitor.java showing how a class declaration is converted into C++:
protected void startClassDeclaration() { printer.print("#define "); printer.print(className); printer.printLn("_cpp__"); printer.printLn(); for (int i = 0; i < Main.H_LIST.length; i++) { String klazz = Main.H_LIST[i]; if (!klazz.equals(javaClassName)) { printer.print("#include \""); printer.print(cppTypes.classPrefix()); printer.print(klazz); printer.printLn(".h\""); } } printer.printLn(); printer.print("#include \""); printer.print(className); printer.printLn(".h\""); printer.printLn(); }
Here's a fascinating blog post from John Resig explaining how validator author Henri Sivonen introduced the new parser into Firefox in 2009.