Simon Willison’s Weblog

Weeknotes: Distracted by Playwright

My goal for this week was to unblock progress on Datasette by finally finishing the dash encoding implementation I described last week. I was getting close, and then I got very distracted by Playwright.

Dash encoding v2

In Why I invented “dash encoding”, a new encoding scheme for URL paths I described a new mechanism I had invented for handling the gnarly problem of including table names with / characters in the URL path on Datasette. The very short version: you can’t use URL encoding in a path, because common proxies (including Apache and Nginx) will decode them before they get to your application.

Thanks to feedback on that post I actually changed my design: I’m now using a variant of percent encoding that uses the - instead of the %. More details in the issue—and I’ll write this up fully once I’ve finished landing the change.

shot-scraper and Playwright

I thoroughly nerd-sniped myself with this one. I started investigating possibilities for automatically generating screeshots for documentation, and realized that Playwright made this substantially easier than it has been in the past.

The result was shot-scraper—a new command-line utility for taking screenshots of web pages, or portions of web pages—and for running through a set of screenshots defined in a YAML file.

I still can’t quite believe how quickly this came together.

Every now and then a tool comes along which adds a fundamental new set of capabilities to your toolbox, and can be multiplied against other tools to open up a huge range of possibilities.

Playwright feels like one of those tools.

A quick pip install playwright is all it takes to start writing robust browser automation tools, using dedicated standalone headless instances of multiple browsers that are installed for you using playwright install.

It’s easy to run in CI—getting it working in GitHub Actions was trivial.

shot-scraper is my first project built on Playwright, but there will definitely be more.

shot-scraper accessibility

I started a Twitter conversation asking for ways to write automated tests that exercise screen readers—not just running audit rules, but actually simulating what happens when a screen reader user attempts to navigate through a specific flow within an application.

The most interesting answer I had was from Ben Mustill-Rose, who built a system for automating tests against an Android screen reader while working on BBC iPlayer—demo here.

@fardarter pointed me back to Playwright again, which turns out to have an Accessibility snapshot mechanism that can dump out the current state of the Chromium accessibility tree.

I couldn’t resist adding that to shot-scraper—so now you can run the following to see the accessibility tree for a web page:

~ % shot-scraper accessibility
    "role": "WebArea",
    "name": "Datasette: An open source multi-tool for exploring and publishing data",
    "children": [
            "role": "link",
            "name": "Uses"
            "role": "link",
            "name": "Documentation"

Full output here.

As a really fun bonus trick: since the output is JSON, you can pipe it into sqlite-utils insert to get a SQLite database:

shot-scraper accessibility \
    | jq .children | sqlite-utils insert \
    /tmp/accessibility.db nodes - --alter

And then open it in Datasette Desktop and start faceting by role and heading level!

Datasette Desktop browsing the nodes table - it has text, link, heading, button and textbox roles and four different heading levels.

sqlite-utils documentation improvements

I complained on Twitter that the way type information was displayed in the Sphinx sqlite-utils API reference documentation was ugly:

Really long ugly type signatures

Adam Johnson pointed me to the autodoc_typehints = "description" option which fixes this. I spent a while tidying up the documentation to work better with this, mainly by adding a whole bunch of :param name: description tags that I had previously omitted. That work happenen in this issue. I think it looks much better now:

Type signatures are much easier to read now, and there's a detailed list of parameters with descriptions.

Releases this week

TIL this week