Simon Willison’s Weblog

Subscribe

Weeknotes: datasette-auth-passwords, a Datasette logo and a whole lot more

17th July 2020

All sorts of project updates this week.

datasette-auth-passwords

Datasette 0.44 added authentication support as a core concept, but left the actual implementation details up to the plugins.

I released datasette-auth-passwords on Monday. It’s an implementation of the most obvious form of authentication (as opposed to GitHub SSO or bearer tokens or existing domain cookies): usernames and passwords, typed into a form.

Implementing passwords responsibly is actually pretty tricky, due to the need to effectively hash them. After some research I ended up mostly copying how Django does it (never a bad approach): I’m using 260,000 salted pbkdf2_hmac iterations, taking advantage of the Python standard library. I wrote this up in a TIL.

The plugin currently only supports hard-coded password hashes that are fed to Datasette via an environment variable—enough to set up a password-protected Datasette instance with a couple of users, but not really good for anything more complex than that. I have an open issue for implementing database-backed password accounts, although again the big challenge is figuring out how to responsible store those password hashes.

I’ve set up a live demo of the password plugin at datasette-auth-passwords-demo.datasette.io—you can sign into it to reveal a private database that’s only available to authenticated users.

Datasette website and logo

I’m finally making good progress on a website for Datasette. As part of that I’ve been learning to use Figma, which I used to create a Datasette logo.

Datasette

Figma is really neat: it’s an entirely web-based vector image editor, aimed at supporting the kind of design work that goes into websites and apps. It has full collaborative editing for teams but it’s free for single users. Most importantly it has extremely competent SVG exports.

I’ve added the logo to the latest version of the Datasette docs, and I have an open pull request to sphinx_rtd_theme to add support for setting a custom link target on the logo so I can link back to the rest of the official site, when it goes live.

TIL search snippet highlighting

My TIL site has a search engine, but it didn’t do snippet highlighting. I reused the pattern I described in Fast Autocomplete Search for Your Website—implemented server-side rather than client-side this time—to add that functionality. The implementation is here—here’s a demo of it in action.

SRCCON schedule

I’m attending (virtually) the SRCCON 2020 journalism conference this week, and Datasette is part of the Projects, Products, & Research track.

As a demo, I set up a Datasette powered copy of the conference schedule at srccon-2020.datasette.io—it’s running the datasette-ics plugin which means it can provide a URL that can be subscribed to in Google or Apple Calendar.

The site runs out of the simonw/srccon-2020-datasette repository, which uses a GitHub Action to download the schedule JSON, modify it a little (mainly to turn the start and end dates into ISO datestamps), save it to a SQLite database with sqlite-utils and publish it to Vercel.

Covid 19 population data

My Covid-19 tracker publishes updated numbers of cases and deaths from the New York Times, the LA Times and Johns Hopkins university on an hourly basis.

One thing that was missing was county population data. US counties are identified in the data by their FIPS codes, which offers a mechanism for joining against population estimates pulled from the US Census.

Thanks to Aaron King I’ve now incorporated that data into the site, as a new us_census_county_populations_2019 table.

I used that data to define a SQL view—latest_ny_times_counties_with_populations—which shows the latest New York Times county data with new derived cases_per_million and deaths_per_million columns.

Tweaks to this blog

For many years this blog’s main content has sat on the left of the page—which looks increasingly strange as screens get wider and wider. As of this commit the main layout is centered, which I think looks much nicer.

I also ran a data migration to fix some old internal links.

Miscellaneous

I gave a (virtual) talk at Django London on Monday about Datasette. I’ve taken to sharing a Google Doc for this kind of talk, which I prepare before the talk with notes and then update afterwards to reflect additional material from the Q&A. Here’s the document from Monday’s talk.

San Francisco Public Works maintain a page of tree removal notifications showing trees that are scheduled for removal. I like those trees. They don’t provide an archive of notifications from that page, so I’ve set up a git scraping GitHub repository that scrapes the page daily and maintains a history of its contents in the commit log.

I updated datasette-publish-fly for compatibility with Datasette 0.44 and Python 3.6.

I made a few tweaks to my GitHub profile README, which is now Apache 2 licensed so people know they can adapt it for their own purposes.

I released github-to-sqlite 2.3 with a new option for fetching information for just specific repositories.

The Develomentor podcast published an interview with me about my career, and how it’s been mostly defined by side-projects.

TIL this week