Simon Willison’s Weblog

Items in 2020

Filters: Year: 2020 ×


Weeknotes: Datasette Writes

As discussed previously, the biggest hole in Datasette’s feature set at the moment involves writing to the database.

[... 604 words]

I’ve really come to appreciate that performance isn’t just some property of a tool independent from its functionality or its feature set. Performance — in particular, being notably fast — is a feature in and of its own right, which fundamentally alters how a tool is used and perceived.

Nelson Elhage # 24th February 2020, 2:32 pm

Why Google invested in providing Google Fonts for free. Fascinating comment from former Google Fonts team member Raph Levien. In short: text rendered as PNGs hurt a Google Search, fonts were a delay in the transition from Flash, Google Docs needed them to better compete with Office and anything that helps create better ads is easy to find funding for. # 23rd February 2020, 2:13 pm

So next time someone is giving you feedback about something you made, think to yourself that to win means getting two or three insights, ideas, or suggestions that you are excited about, and that you couldn’t think up on your own.

Juliette Cezzar # 21st February 2020, 1:04 am

Things I learned about shapefiles building shapefile-to-sqlite

The latest in my series of x-to-sqlite tools is shapefile-to-sqlite. I learned a whole bunch of things about the ESRI shapefile format while building it.

[... 1073 words]

pup. This is a great idea: a command-line tool for parsing HTML on stdin using CSS selectors. It’s like jq but for HTML. Supports a sensible collection of selectors and has a number of output options for the selected nodes, including plain text and JSON. It also works as a simple pretty-printer for HTML. # 14th February 2020, 4:25 pm

A group of software engineers gathered around a whiteboard are a joint cognitive system. The scrawls on the board are spatial cues for building a shared model of a complex system.

Eric Dobbs # 13th February 2020, 6:48 pm

How to cheat at unit tests with pytest and Black

I’ve been making a lot of progress on Datasette Cloud this week. As an application that provides private hosted Datasette instances (initially targeted at data journalists and newsrooms) the majority of the code I’ve written deals with permissions: allowing people to form teams, invite team members, promote and demote team administrators and suchlike.

[... 885 words]

We write a lot of JavaScript at Basecamp, but we don’t use it to create “JavaScript applications” in the contemporary sense. All our applications have server-side rendered HTML at their core, then add sprinkles of JavaScript to make them sparkle. [...] It allows us to party with productivity like days of yore. A throwback to when a single programmer could make rapacious progress without getting stuck in layers of indirection or distributed systems. A time before everyone thought the holy grail was to confine their server-side application to producing JSON for a JavaScript-based client application.

David Heinemeier Hansson # 8th February 2020, 8:10 am

Deep learning isn’t hard anymore. This article does a great job of explaining how transfer learning is unlocking a new wave of innovation around deep learning. Previously if you wanted to train a model you needed vast amounts if data and thousands of dollars of compute time. Thanks to transfer learning you can now take an existing model (such as GPT2) and train something useful on top of it that’s specific to a new domain in just minutes it hours, with only a few hundred or a few thousand new labeled samples. # 7th February 2020, 8:47 am

Weeknotes: Shaving yaks for Datasette Cloud

I’ve been shaving a lot of yaks, but I’m finally ready to for other people to start kicking the tires on the MVP of Datasette Cloud.

[... 392 words]

geojson-to-sqlite (via) I just put out the first release of geojson-to-sqlite—a CLI tool that can convert GeoJSON files (consisting of a Feature or a set of features in a FeatureCollection) into a table in a SQLite database. If you use the --spatialite option it will initalize the table with SpatiaLite and store the geometries in a spacially indexed geometry field—without that option it stores them as GeoJSON. # 31st January 2020, 6:40 am

I used to tolerate and expect complexity. Working on Go the past 10 years has changed my perspective, though. I now value simplicity above almost all else and tolerate complexity only when it’s well isolated, well documented, well tested, and necessary to make things simpler overall at other layers for most people.

Brad Fitzpatrick # 30th January 2020, 5:17 pm

Weeknotes: datasette-auth-existing-cookies and datasette-sentry

Work on Datasette Cloud continues—I’m tantalizingly close to having a MVP I can start to invite people to try out.

[... 701 words]

2020 Web Milestones (via) A lot of stuff is happening in 2020! Mike Sherov rounds it up—highlights include the release of Chromium Edge (Microsoft’s Chrome-powered browser for Windows 7+), Web Components supported in every major browser, Deno 1.x, SameSite Cookies turned on by default (which should dramatically reduce CSRF exposure) and Python 2 and Flash EOLs. # 24th January 2020, 4:43 am

Generated Columns in SQLite (via) SQLite 3.31.0 released today, and generated columns are the single most notable new feature. PostgreSQL 12 added these in October 2019, and MySQL has had them since 5.7 in October 2015. MySQL and SQLite both offer either “stored” or “virtual” generated columns, with virtual columns being calculated at runtime. PostgreSQL currently only supports stored columns. # 24th January 2020, 4:20 am

Weeknotes: Datasette Cloud and zero downtime deployments

Yesterday’s piece on Deploying a data API using GitHub Actions and Cloud Run was originally intended to be my weeknotes, but ended up getting a bit too involved.

[... 1428 words]

Deploying a data API using GitHub Actions and Cloud Run

I’m using the combination of GitHub Actions and Google Cloud Run to retrieve data from the U.S. Department of Justice FARA website and deploy it as a queryable API using Datasette.

[... 1598 words]

Ray Bandar’s Bone Palace (via) The hundredth museum I’ve posted to Niche Museums, and this one is my absolute favourite. Ray Bandar spent sixty years collecting over 7,000 animal skulls and putting them on display in his house and basement. Natalie managed to score us a tour a few weeks before the entire collection was donated to the California Academy of Sciences. It was the most amazing room I’ve ever been in. # 18th January 2020, 7:05 am

Code is made of pain, lies, and bad ideas, all so we can pretend that electrified sand will do what we tell it to

Yoz Grahame # 18th January 2020, 7:04 am

Your own hosted blog, the easy, free, open way (even if you’re not a computer expert) (via) Jeremy Howard and the fast.ai team have released fast_template—a GitHub repository designed to be used as a template to create new repositories with a complete Jekyll blog configured for use with GitHub pages. GitHub’s official document recommends you install Ruby on your machine to do this, but Jeremy points out that with the right repository setup you can run a blog entirely by editing files through the GitHub web interface. # 17th January 2020, 1:12 am

How to do Zero Downtime Deployments of Docker Containers. I’m determined to get reliable zero-downtime deploys working for a new project, because I know from experience that even a few seconds of downtime during a deploy changes the project mentality from “deploy any time you want” to “don’t deploy too often”. I’m using Docker containers behind Traefik, which means new containers should have traffic automatically balanced to them by Traefik based on their labels. After much fiddling around the pattern described by this article worked best for me: it lets me start a new container, then stop the old one and have Traefik’s “retry” mechanism send any requests to the stopped container over to the new one instead. # 16th January 2020, 11:12 pm

Demos, Prototypes, and MVPs (via) I really like how Jacob describes the difference between a demo and a prototype: a demo is externally facing and helps explain a concept to a customer; a prototype is internally facing and helps prove that something can be built. # 16th January 2020, 8:24 pm

There is enough wood; Green estimates that it takes about 13 minutes for 20 North American forests to collectively grow enough wood for a 20-story building

David Roberts # 15th January 2020, 4:03 pm

Weeknotes: Improv at Stanford, planning Datasette Cloud

Last week was the first week of the quarter at Stanford—which is called “shopping week” here because students are expected to try different classes to see which ones they are going to stick with.

[... 806 words]

Serving 100µs reads with 100% availability (via) Fascinating use-case for SQLite from Segment: they needed a massively replicated configuration database across all of their instances that process streaming data. They chose to make the configuration available as a ~50GB SQLite database file mirrored to every instance, meaning lookups against that data could complete in microseconds. Changes to the central MySQL configuration store are pulled every 2-3 seconds, resulting in a trade-off of consistency for availability which fits their use-case just fine. # 10th January 2020, 5:15 am

A visual introduction to machine learning. Beautiful interactive essay explaining how a decision tree machine learning module is constructed, and using that to illustrate the concept of overfitting. This is one of the best explanations of machine learning fundamentals I’ve seen anywhere. # 10th January 2020, 5:12 am

Snowpack (via) Really interesting new twist on build systems for JavaScript. Modern browsers (everything since IE11) support JavaScript modules, but actually working with them is tricky since so much of the JavaScript ecosystem expects you to be using a bundler like Webpack. Snowpack is a tool for converting npm dependencies into JavaScript modules which can then be loaded directly by the browser, taking advantage of HTTP/2 to efficiently load the resulting larger number of files. # 10th January 2020, 5:06 am

Portable Cloud Functions with the Python Functions Framework (via) The new functions-framework library on PyPI lets you run Google Cloud Functions written in Python in other environments—on your local developer machine or bundled in a Docker container for example. I have real trouble trusting serverless platforms that lock you into a single provider (AWS Lambda makes me very uncomfortable) so this is a breath of fresh air. # 10th January 2020, 4:58 am

Async Support—HTTPX (via) HTTPX is the new async-friendly HTTP library for Python spearheaded by Tom Christie. It works in both async and non-async mode with an API very similar to requests. The async support is particularly interesting—it’s a really clean API, and now that Jupyter supports top-level await you can run ’(await httpx.AsyncClient().get(url)).text’ directly in a cell and get back the response. Most excitingly the library lets you pass an ASGI app directly to the client and then perform requests against it—ideal for unit tests. # 10th January 2020, 4:49 am