Simon Willison’s Weblog

Items in 2020

Filters: Year: 2020 ×


Datasette 0.45: The annotated release notes

Datasette 0.45, out today, features magic parameters for canned queries, a log out feature, improved plugin documentation and four new plugin hooks.

[... 863 words]

entr: rerun your build when files change. “WHY DID NOBODY TELL ME ABOUT THIS BEFORE?!?!” is one of my favourite genres of blog post. # 1st July 2020, 3:58 pm

Unlocking value with durable teams (via) Anna Shipman describes the FT’s experience switching from project-based teams to “durable” teams—teams which own a specific area of the product. Lots of really smart organizational design thinking in this. I’ve seen how much of a difference it makes to have every inch of a complex system “owned” by a specific team. I also like how Anna uses the term “technical estate” to describe the entirety of the FT’s systems. # 29th June 2020, 9:33 pm

Data Science is a lot like Harry Potter, except there’s no magic, it’s just math, and instead of a sorting hat you just sort the data with a Python script.

GPT-3, shepherded by Max Woolf # 29th June 2020, 4:45 am

Here’s a common piece of advice from people who create things: to make better things, make more things. Not only does it give you constant practice at making things, but it gives you more chances at lucking into making a good thing.

Ned Batchelder # 28th June 2020, 2:29 pm

Reducing search indexing latency to one second. Really detailed dive into the nuts and bolts of Twitter’s latest iteration of search indexing technology, including a great explanation of skip lists. # 26th June 2020, 5:06 pm

Weeknotes: cookiecutter templates, better plugin documentation, sqlite-generate

I spent this week spreading myself between a bunch of smaller projects, and finally getting familiar with cookiecutter. I wrote about my datasette-plugin cookiecutter template earlier in the week; here’s what else I’ve been working on.

[... 703 words]

How CDNs Generate Certificates. Thomas Ptacek (now at Fly) describes in intricate detail the challenges faced by large-scale hosting providers that want to securely issue LetsEncrypt certificates for customer domains. Lots of detail here on the different ACME challenges supported by LetsEncrypt and why the new tls-alpn-01 challenge is the right option for operating at scale. # 26th June 2020, 12:03 am

If you have to choose between engineering and ML, choose engineering. It’s easier for great engineers to pick up ML knowledge, but it’s a lot harder for ML experts to become great engineers.

Chip Huyen # 24th June 2020, 5:24 am

datasette-block-robots. Another little Datasette plugin: this one adds a /robots.txt page with “Disallow: /” to block all indexing of a Datasette instance from respectable search engine crawlers. I built this in less than ten minutes from idea to deploy to PyPI thanks to the datasette-plugin cookiecutter template. # 23rd June 2020, 3:28 am

click-app. While working on sqlite-generate today I built a cookiecutter template for building the skeleton for Click command-line utilities. It’s based on datasette-plugin so it automatically sets up GitHub Actions for running tests and deploying packages to PyPI. # 23rd June 2020, 2:21 am

sqlite-generate (via) I wrote this tool today to generate arbitrarily large SQLite databases, for testing purposes. You tell it how many tables, columns and rows you want and it will use the Faker Python library to generate random data and populate the tables with it. # 23rd June 2020, 2:19 am

A cookiecutter template for writing Datasette plugins

Datasette’s plugin system is one of the most interesting parts of the entire project. As I explained to Matt Asay in this interview, the great thing about plugins is that Datasette can gain new functionality overnight without me even having to review a pull request. I just need to get more people to write them!

[... 914 words]

Without touching upon the question of who’s right and who’s wrong in the specific case of Basecamp’s Hey app, or the broader questions of what, if anything, ought to change in Apple’s App Store policies, an undeniable and important undercurrent to this story is that the business model policies of the App Store have resulted in a tremendous amount of resentment. This spans the entire gamut from one-person indies all the way up to the handful of large corporations that can be considered Apple’s peers or near-peers.

John Gruber # 20th June 2020, 5:48 am

Weeknotes: Datasette alphas for testing new plugin hooks

A relatively quiet week this week, compared to last week’s massive push to ship Datasette 0.44 with authentication, permissions and writable canned queries. I can now ship alpha releases, such as today’s Datasette 0.45a1, which means I can preview new plugin features before they are completely ready and stable.

[... 728 words]

Datasette: A Developer, a Shower and a Data-Inspired Moment (via) Matt Asay interviewed me over Zoom last month. This captures a lot of my thinking around open source really well: “Datasette is aggressively open source for a bunch of reasons. Most of them are very selfish reasons.” # 18th June 2020, 11:32 pm

Refactoring optional chaining into a large codebase: lessons learned (via) JavaScript now supports foo?.bar?.baz?.() optional chaining syntax across all major browsers. Lea Verou provides the definitive guide to using it to refactor code. # 18th June 2020, 3:23 pm

Happy Birthday Sea Lions! (via) Today, June 15th, is Sea Lion birthday—half of all California Sea Lions are born today thanks to clever co-ordinated delayed implantation by Sea Lion females. Natalie has started making nature videos and I’ve been tagging along as her camera-person—this three minute video, shot at Pier 39 in San Francisco, celebrates Sea Lion birthday and explains how it works. # 15th June 2020, 7:08 pm

How much can you learn from just two columns?

Derek Willis shared an intriguing dataset this morning: a table showing every Twitter account followed by an official GOP congressional Twitter account.

[... 951 words]

Datasette 0.44: The annotated release notes

I just released Datasette 0.44 to PyPI. With 128 commits since 0.43 this is the biggest release in a long time—and likely the last major release of new features before Datasette 1.0.

[... 1648 words]

Tip for changing cookie subdomains: change the cookie name too. This is a really useful tip I hadn’t encountered before. If you make a change to the way cookies are configured—changing the cookie domain or path for example—it’s a good idea to change the name of the cookie as well. If you don’t change the cookie name you’ll see weird behaviour for users who have previously had the cookie set using the older configuration. This definitely explains bugs I’ve seen in the past. Filing this tip away for future cookie-related development work. # 9th June 2020, 6:41 pm

Apple password-manager-resources (via) Apple maintain on open source repository full of heuristics for implementing smart password managers. It lists password rules for different sites (e.g. min/max length, special characters required), change password URLs for different services and sites that share credential backends—like icloud.com and apple.com. They accept pull requests! # 9th June 2020, 4:21 am

A List of Hacker News’s Undocumented Features and Behaviors (via) If you’re interested in community software design this is a neat insight into the many undocumented features of Hacker News, collated by Max Woolf. # 6th June 2020, 5:36 pm

Weeknotes, I guess

What a week. Hard to work up the enthusiasm to write about what I’ve been working on.

[... 314 words]

Working Backwards: A New Version Of Amazon’s “Press Release” Approach To Plan Customer-Centric Projects (via) I’ve long wanted to give the Amazon “future press release” trick a go—start a project by writing the imaginary press release that would announce that project to the world, in order to focus on understanding what the project is for and how it will deliver value. Jeff Gothelf has put a lot of thought into this and constructed a thorough looking template for writing one of these that covers a number of different important project aspects. # 2nd June 2020, 3:54 pm

Get Started—Materialize. Materialize is a really interesting new database—“a streaming SQL materialized view engine”. It builds materialized views on top of streaming data sources (such as Kafka)—you define the view using a SQL query, then it figures out how to keep that view up-to-date automatically as new data streams in. It speaks the PostgreSQL protocol so you can talk to it using the psql tool or any PostgreSQL client library. The “get started” guide is particularly impressive: it uses a curl stream of the Wikipedia recent changes API, parsed using a regular expression. And it’s written in Rust, so installing it is as easy as downloading and executing a single binary (though I used Homebrew). # 1st June 2020, 10:11 pm

Practical Python Programming (via) David Beazley has been developing and presenting this three day Python course (aimed at people with some prior programming experience) for over thirteen years, and he’s just released the course materials under a Creative Commons license for the first time. # 29th May 2020, 1:15 pm

Deno is a Browser for Code (via) One of the most interesting ideas in Deno is that code imports are loaded directly from URLs—which can themselves depend on other URL-based packages. On first encounter it feels wrong—obviously insecure. Deno contributor Kitson Kelly provides a deeper exploration of the idea, and explains how the combination of caching and lock files makes it no less secure than code installed from npm or PyPI. # 29th May 2020, 2:36 am

Weeknotes: Datasette 0.43

My main achievement this week was shipping Datasette 0.43, with a collection of smaller improvements and one big one: a redesign of the register_output_renderer plugin hook.

[... 475 words]

Advice on specifying more granular permissions with Google Cloud IAM (via) My single biggest frustration working with both Google Cloud and AWS is permissions: more specifically, figuring out what the smallest set of permissions are that I need to assign in order to achieve different goals. Katie McLaughlin’s new series aims to address exactly that problem. I learned a ton from this that I’ve previously missed, and there’s plenty of actionable advice on tooling that can be used to help figure this stuff out. # 28th May 2020, 10:44 pm