Simon Willison’s Weblog

Subscribe
Atom feed for henri-sivonen

6 posts tagged “henri-sivonen”

2025

firefox parser/html/java/README.txt (via) TIL (or TIR - Today I was Reminded) that the HTML5 Parser used by Firefox is maintained as Java code (commit history here) and converted to C++ using a custom translation script.

You can see that in action by checking out the ~8GB Firefox repository and running:

cd parser/html/java
make sync
make translate

Here's a terminal session where I did that, including the output of git diff showing the updated C++ files.

I did some digging and found that the code that does the translation work lives, weirdly, in the Nu Html Checker repository on GitHub which powers the W3C's validator.w3.org/nu/ validation service!

Here's a snippet from htmlparser/cpptranslate/CppVisitor.java showing how a class declaration is converted into C++:

    protected void startClassDeclaration() {
        printer.print("#define ");
        printer.print(className);
        printer.printLn("_cpp__");
        printer.printLn();

        for (int i = 0; i < Main.H_LIST.length; i++) {
            String klazz = Main.H_LIST[i];
            if (!klazz.equals(javaClassName)) {
                printer.print("#include \"");
                printer.print(cppTypes.classPrefix());
                printer.print(klazz);
                printer.printLn(".h\"");
            }
        }

        printer.printLn();
        printer.print("#include \"");
        printer.print(className);
        printer.printLn(".h\"");
        printer.printLn();
    }

Here's a fascinating blog post from John Resig explaining how validator author Henri Sivonen introduced the new parser into Firefox in 2009.

# 17th December 2025, 1:48 am / c-plus-plus, firefox2, henri-sivonen, java, john-resig, mozilla

2010

Firefox 4: the HTML5 parser—inline SVG, speed and more. A complete replacement for the oldest part of Gecko (the HTML parser dates back to 1998) headed up by HTML5 validator author Henri Sivonen, using the parsing algorithm defined in the HTML5 specification. Improvements include parsing taking place off the main UI thread and the ability to embed SVG and MathML directly inline in HTML pages.

# 12th May 2010, 8:56 am / firefox, gecko, henri-sivonen, html5, mathml, svg, recovered, firefox4, parser

2009

HTML 5 Parsing. Firefox nightlies include a new parser that implements the HTML5 parsing algorithm (disabled by default), which uses C++ code automatically generated from Henri Sivonen’s Java parser first used in the HTML5 validator.

# 11th July 2009, 11:36 pm / browsers, firefox, henri-sivonen, html5, john-resig, mozilla, parsing, validator

There are two meanings to XHTML: technical and marketing. The technical kind (XHTML served using the application/xhtml xml MIME type) is a formulation of HTML as an XML vocabulary. The marketing kind (XHTML served using the text/html MIME type) is processed just like HTML by browsers but the authors attempt to observe slightly different syntax rules in order to make it seem that they are doing something newer and shinier compared to HTML.

Henri Sivonen

# 6th July 2009, 12:46 pm / buzzwords, henri-sivonen, xhtml, xml

2008

Draconian failure on error is not the answer problems of Postel's law. Draconian error handling creates an unstable equilibrium in Game Theory terms - it only lasts until one player breaks the rule. One non-Draconian XML5 implementation in key client product and the Draconian XML ranks would break. Well-specified error recovery is the right way to implement the liberal part of Postel's law.

Henri Sivonen

# 20th March 2008, 2:43 pm / draconian, henri-sivonen, html5, law, postelslaw, xml