Simon Willison’s Weblog

Subscribe

June 2020

June 1, 2020

Get Started—Materialize. Materialize is a really interesting new database—“a streaming SQL materialized view engine”. It builds materialized views on top of streaming data sources (such as Kafka)—you define the view using a SQL query, then it figures out how to keep that view up-to-date automatically as new data streams in. It speaks the PostgreSQL protocol so you can talk to it using the psql tool or any PostgreSQL client library. The “get started” guide is particularly impressive: it uses a curl stream of the Wikipedia recent changes API, parsed using a regular expression. And it’s written in Rust, so installing it is as easy as downloading and executing a single binary (though I used Homebrew).

# 10:11 pm / sql, rust, postgresql, databases, kafka

June 2, 2020

Working Backwards: A New Version Of Amazon’s “Press Release” Approach To Plan Customer-Centric Projects (via) I’ve long wanted to give the Amazon “future press release” trick a go—start a project by writing the imaginary press release that would announce that project to the world, in order to focus on understanding what the project is for and how it will deliver value. Jeff Gothelf has put a lot of thought into this and constructed a thorough looking template for writing one of these that covers a number of different important project aspects.

# 3:54 pm / product-management

June 4, 2020

Weeknotes, I guess

What a week. Hard to work up the enthusiasm to write about what I’ve been working on.

[... 314 words]

June 6, 2020

A List of Hacker News’s Undocumented Features and Behaviors (via) If you’re interested in community software design this is a neat insight into the many undocumented features of Hacker News, collated by Max Woolf.

# 5:36 pm / hacker-news, community, max-woolf

June 9, 2020

Apple password-manager-resources (via) Apple maintain on open source repository full of heuristics for implementing smart password managers. It lists password rules for different sites (e.g. min/max length, special characters required), change password URLs for different services and sites that share credential backends—like icloud.com and apple.com. They accept pull requests!

# 4:21 am / apple, passwords

Tip for changing cookie subdomains: change the cookie name too. This is a really useful tip I hadn’t encountered before. If you make a change to the way cookies are configured—changing the cookie domain or path for example—it’s a good idea to change the name of the cookie as well. If you don’t change the cookie name you’ll see weird behaviour for users who have previously had the cookie set using the older configuration. This definitely explains bugs I’ve seen in the past. Filing this tip away for future cookie-related development work.

# 6:41 pm / cookies

June 12, 2020

Datasette 0.44: The annotated release notes

I just released Datasette 0.44 to PyPI. With 128 commits since 0.43 this is the biggest release in a long time—and likely the last major release of new features before Datasette 1.0.

[... 1,648 words]

June 15, 2020

How much can you learn from just two columns?

Visit How much can you learn from just two columns?

Derek Willis shared an intriguing dataset this morning: a table showing every Twitter account followed by an official GOP congressional Twitter account.

[... 951 words]

Happy Birthday Sea Lions! (via) Today, June 15th, is Sea Lion birthday—half of all California Sea Lions are born today thanks to clever co-ordinated delayed implantation by Sea Lion females. Natalie has started making nature videos and I’ve been tagging along as her camera-person—this three minute video, shot at Pier 39 in San Francisco, celebrates Sea Lion birthday and explains how it works.

# 7:08 pm / youtube, wildlife, natalie-downe

June 18, 2020

Refactoring optional chaining into a large codebase: lessons learned (via) JavaScript now supports foo?.bar?.baz?.() optional chaining syntax across all major browsers. Lea Verou provides the definitive guide to using it to refactor code.

# 3:23 pm / javascript

Datasette: A Developer, a Shower and a Data-Inspired Moment (via) Matt Asay interviewed me over Zoom last month. This captures a lot of my thinking around open source really well: “Datasette is aggressively open source for a bunch of reasons. Most of them are very selfish reasons.”

# 11:32 pm / interview, open-source, datasette

June 19, 2020

Weeknotes: Datasette alphas for testing new plugin hooks

A relatively quiet week this week, compared to last week’s massive push to ship Datasette 0.44 with authentication, permissions and writable canned queries. I can now ship alpha releases, such as today’s Datasette 0.45a1, which means I can preview new plugin features before they are completely ready and stable.

[... 728 words]

June 20, 2020

Without touching upon the question of who’s right and who’s wrong in the specific case of Basecamp’s Hey app, or the broader questions of what, if anything, ought to change in Apple’s App Store policies, an undeniable and important undercurrent to this story is that the business model policies of the App Store have resulted in a tremendous amount of resentment. This spans the entire gamut from one-person indies all the way up to the handful of large corporations that can be considered Apple’s peers or near-peers.

John Gruber

# 5:48 am / basecamp, apple, appstore, john-gruber

A cookiecutter template for writing Datasette plugins

Datasette’s plugin system is one of the most interesting parts of the entire project. As I explained to Matt Asay in this interview, the great thing about plugins is that Datasette can gain new functionality overnight without me even having to review a pull request. I just need to get more people to write them!

[... 914 words]

June 23, 2020

sqlite-generate (via) I wrote this tool today to generate arbitrarily large SQLite databases, for testing purposes. You tell it how many tables, columns and rows you want and it will use the Faker Python library to generate random data and populate the tables with it.

# 2:19 am / projects, sqlite

click-app. While working on sqlite-generate today I built a cookiecutter template for building the skeleton for Click command-line utilities. It’s based on datasette-plugin so it automatically sets up GitHub Actions for running tests and deploying packages to PyPI.

# 2:21 am / cookiecutter, projects, python

datasette-block-robots. Another little Datasette plugin: this one adds a /robots.txt page with Disallow: / to block all indexing of a Datasette instance from respectable search engine crawlers. I built this in less than ten minutes from idea to deploy to PyPI thanks to the datasette-plugin cookiecutter template.

# 3:28 am / projects, robots-txt, plugins, seo, datasette

June 24, 2020

If you have to choose between engineering and ML, choose engineering. It’s easier for great engineers to pick up ML knowledge, but it’s a lot harder for ML experts to become great engineers.

Chip Huyen

# 5:24 am / machine-learning, careers

June 26, 2020

How CDNs Generate Certificates. Thomas Ptacek (now at Fly) describes in intricate detail the challenges faced by large-scale hosting providers that want to securely issue LetsEncrypt certificates for customer domains. Lots of detail here on the different ACME challenges supported by LetsEncrypt and why the new tls-alpn-01 challenge is the right option for operating at scale.

# 12:03 am / tls, fly, thomas-ptacek, acme, certificates

Weeknotes: cookiecutter templates, better plugin documentation, sqlite-generate

I spent this week spreading myself between a bunch of smaller projects, and finally getting familiar with cookiecutter. I wrote about my datasette-plugin cookiecutter template earlier in the week; here’s what else I’ve been working on.

[... 703 words]

Reducing search indexing latency to one second. Really detailed dive into the nuts and bolts of Twitter’s latest iteration of search indexing technology, including a great explanation of skip lists.

# 5:06 pm / twitter, data-structures, scaling, search, lucene

June 28, 2020

Here’s a common piece of advice from people who create things: to make better things, make more things. Not only does it give you constant practice at making things, but it gives you more chances at lucking into making a good thing.

Ned Batchelder

# 2:29 pm / ned-batchelder

June 29, 2020

Data Science is a lot like Harry Potter, except there's no magic, it's just math, and instead of a sorting hat you just sort the data with a Python script.

GPT-3, shepherded by Max Woolf

# 4:45 am / machine-learning, data-science, max-woolf

Unlocking value with durable teams (via) Anna Shipman describes the FT’s experience switching from project-based teams to “durable” teams—teams which own a specific area of the product. Lots of really smart organizational design thinking in this. I’ve seen how much of a difference it makes to have every inch of a complex system “owned” by a specific team. I also like how Anna uses the term “technical estate” to describe the entirety of the FT’s systems.

# 9:33 pm / annashipman, management

2020 » June

MTWTFSS
1234567
891011121314
15161718192021
22232425262728
2930