Weeknotes: Distracted by Playwright
12th March 2022
My goal for this week was to unblock progress on Datasette by finally finishing the dash encoding implementation I described last week. I was getting close, and then I got very distracted by Playwright.
Dash encoding v2
In Why I invented “dash encoding”, a new encoding scheme for URL paths I described a new mechanism I had invented for handling the gnarly problem of including table names with /
characters in the URL path on Datasette. The very short version: you can’t use URL encoding in a path, because common proxies (including Apache and Nginx) will decode them before they get to your application.
Thanks to feedback on that post I actually changed my design: I’m now using a variant of percent encoding that uses the -
instead of the %
. More details in the issue—and I’ll write this up fully once I’ve finished landing the change.
shot-scraper and Playwright
I thoroughly nerd-sniped myself with this one. I started investigating possibilities for automatically generating screeshots for documentation, and realized that Playwright made this substantially easier than it has been in the past.
The result was shot-scraper—a new command-line utility for taking screenshots of web pages, or portions of web pages—and for running through a set of screenshots defined in a YAML file.
I still can’t quite believe how quickly this came together.
Every now and then a tool comes along which adds a fundamental new set of capabilities to your toolbox, and can be multiplied against other tools to open up a huge range of possibilities.
Playwright feels like one of those tools.
A quick pip install playwright
is all it takes to start writing robust browser automation tools, using dedicated standalone headless instances of multiple browsers that are installed for you using playwright install
.
It’s easy to run in CI—getting it working in GitHub Actions was trivial.
shot-scraper
is my first project built on Playwright, but there will definitely be more.
shot-scraper accessibility
I started a Twitter conversation asking for ways to write automated tests that exercise screen readers—not just running audit rules, but actually simulating what happens when a screen reader user attempts to navigate through a specific flow within an application.
The most interesting answer I had was from Ben Mustill-Rose, who built a system for automating tests against an Android screen reader while working on BBC iPlayer—demo here.
@fardarter pointed me back to Playwright again, which turns out to have an Accessibility snapshot mechanism that can dump out the current state of the Chromium accessibility tree.
I couldn’t resist adding that to shot-scraper—so now you can run the following to see the accessibility tree for a web page:
~ % shot-scraper accessibility https://datasette.io
{
"role": "WebArea",
"name": "Datasette: An open source multi-tool for exploring and publishing data",
"children": [
{
"role": "link",
"name": "Uses"
},
{
"role": "link",
"name": "Documentation"
},
As a really fun bonus trick: since the output is JSON, you can pipe it into sqlite-utils insert to get a SQLite database:
shot-scraper accessibility https://datasette.io \
| jq .children | sqlite-utils insert \
/tmp/accessibility.db nodes - --alter
And then open it in Datasette Desktop and start faceting by role and heading level!
sqlite-utils documentation improvements
I complained on Twitter that the way type information was displayed in the Sphinx sqlite-utils API reference documentation was ugly:
Adam Johnson pointed me to the autodoc_typehints = "description"
option which fixes this. I spent a while tidying up the documentation to work better with this, mainly by adding a whole bunch of :param name: description
tags that I had previously omitted. That work happenen in this issue. I think it looks much better now:
Releases this week
-
image-diff: 0.2.1—(3 releases total)—2022-03-11
CLI tool for comparing images -
sqlite-utils: 3.25.1—(98 releases total)—2022-03-11
Python CLI utility and library for manipulating SQLite databases -
shot-scraper: 0.4—(5 releases total)—2022-03-10
Automated website screenshots using GitHub Actions -
django-sql-dashboard: 1.0.2—(34 releases total)—2022-03-08
Django app for building dashboards using raw SQL queries -
geojson-to-sqlite: 1.0—(8 releases total)—2022-03-04
CLI tool for converting GeoJSON files to SQLite (with SpatiaLite) -
xml-analyser: 1.3—(4 releases total)—2022-03-01
Simple command line tool for quickly analysing the structure of an arbitrary XML file -
datasette-dateutil: 0.3—(4 releases total)—2022-03-01
dateutil functions for Datasette
TIL this week
More recent articles
- Storing times for human events - 27th November 2024
- Ask questions of SQLite databases and CSV/JSON files in your terminal - 25th November 2024
- Weeknotes: asynchronous LLMs, synchronous embeddings, and I kind of started a podcast - 22nd November 2024