Simon Willison’s Weblog

Items in Jan, 2020

Filters: Year: 2020 × Month: Jan ×

geojson-to-sqlite (via) I just put out the first release of geojson-to-sqlite—a CLI tool that can convert GeoJSON files (consisting of a Feature or a set of features in a FeatureCollection) into a table in a SQLite database. If you use the --spatialite option it will initalize the table with SpatiaLite and store the geometries in a spacially indexed geometry field—without that option it stores them as GeoJSON. # 31st January 2020, 6:40 am

I used to tolerate and expect complexity. Working on Go the past 10 years has changed my perspective, though. I now value simplicity above almost all else and tolerate complexity only when it’s well isolated, well documented, well tested, and necessary to make things simpler overall at other layers for most people.

Brad Fitzpatrick # 30th January 2020, 5:17 pm

Weeknotes: datasette-auth-existing-cookies and datasette-sentry

Work on Datasette Cloud continues—I’m tantalizingly close to having a MVP I can start to invite people to try out.

[... 701 words]

2020 Web Milestones (via) A lot of stuff is happening in 2020! Mike Sherov rounds it up—highlights include the release of Chromium Edge (Microsoft’s Chrome-powered browser for Windows 7+), Web Components supported in every major browser, Deno 1.x, SameSite Cookies turned on by default (which should dramatically reduce CSRF exposure) and Python 2 and Flash EOLs. # 24th January 2020, 4:43 am

Generated Columns in SQLite (via) SQLite 3.31.0 released today, and generated columns are the single most notable new feature. PostgreSQL 12 added these in October 2019, and MySQL has had them since 5.7 in October 2015. MySQL and SQLite both offer either “stored” or “virtual” generated columns, with virtual columns being calculated at runtime. PostgreSQL currently only supports stored columns. # 24th January 2020, 4:20 am

Weeknotes: Datasette Cloud and zero downtime deployments

Yesterday’s piece on Deploying a data API using GitHub Actions and Cloud Run was originally intended to be my weeknotes, but ended up getting a bit too involved.

[... 1428 words]

Tracking FARA by deploying a data API using GitHub Actions and Cloud Run

I’m using the combination of GitHub Actions and Google Cloud Run to retrieve data from the U.S. Department of Justice FARA website and deploy it as a queryable API using Datasette.

[... 1599 words]

Ray Bandar’s Bone Palace (via) The hundredth museum I’ve posted to Niche Museums, and this one is my absolute favourite. Ray Bandar spent sixty years collecting over 7,000 animal skulls and putting them on display in his house and basement. Natalie managed to score us a tour a few weeks before the entire collection was donated to the California Academy of Sciences. It was the most amazing room I’ve ever been in. # 18th January 2020, 7:05 am

Code is made of pain, lies, and bad ideas, all so we can pretend that electrified sand will do what we tell it to

Yoz Grahame # 18th January 2020, 7:04 am

Your own hosted blog, the easy, free, open way (even if you’re not a computer expert) (via) Jeremy Howard and the team have released fast_template—a GitHub repository designed to be used as a template to create new repositories with a complete Jekyll blog configured for use with GitHub pages. GitHub’s official document recommends you install Ruby on your machine to do this, but Jeremy points out that with the right repository setup you can run a blog entirely by editing files through the GitHub web interface. # 17th January 2020, 1:12 am

How to do Zero Downtime Deployments of Docker Containers. I’m determined to get reliable zero-downtime deploys working for a new project, because I know from experience that even a few seconds of downtime during a deploy changes the project mentality from “deploy any time you want” to “don’t deploy too often”. I’m using Docker containers behind Traefik, which means new containers should have traffic automatically balanced to them by Traefik based on their labels. After much fiddling around the pattern described by this article worked best for me: it lets me start a new container, then stop the old one and have Traefik’s “retry” mechanism send any requests to the stopped container over to the new one instead. # 16th January 2020, 11:12 pm

Demos, Prototypes, and MVPs (via) I really like how Jacob describes the difference between a demo and a prototype: a demo is externally facing and helps explain a concept to a customer; a prototype is internally facing and helps prove that something can be built. # 16th January 2020, 8:24 pm

There is enough wood; Green estimates that it takes about 13 minutes for 20 North American forests to collectively grow enough wood for a 20-story building

David Roberts # 15th January 2020, 4:03 pm

Weeknotes: Improv at Stanford, planning Datasette Cloud

Last week was the first week of the quarter at Stanford—which is called “shopping week” here because students are expected to try different classes to see which ones they are going to stick with.

[... 806 words]

Serving 100µs reads with 100% availability (via) Fascinating use-case for SQLite from Segment: they needed a massively replicated configuration database across all of their instances that process streaming data. They chose to make the configuration available as a ~50GB SQLite database file mirrored to every instance, meaning lookups against that data could complete in microseconds. Changes to the central MySQL configuration store are pulled every 2-3 seconds, resulting in a trade-off of consistency for availability which fits their use-case just fine. # 10th January 2020, 5:15 am

A visual introduction to machine learning. Beautiful interactive essay explaining how a decision tree machine learning module is constructed, and using that to illustrate the concept of overfitting. This is one of the best explanations of machine learning fundamentals I’ve seen anywhere. # 10th January 2020, 5:12 am

Snowpack (via) Really interesting new twist on build systems for JavaScript. Modern browsers (everything since IE11) support JavaScript modules, but actually working with them is tricky since so much of the JavaScript ecosystem expects you to be using a bundler like Webpack. Snowpack is a tool for converting npm dependencies into JavaScript modules which can then be loaded directly by the browser, taking advantage of HTTP/2 to efficiently load the resulting larger number of files. # 10th January 2020, 5:06 am

Portable Cloud Functions with the Python Functions Framework (via) The new functions-framework library on PyPI lets you run Google Cloud Functions written in Python in other environments—on your local developer machine or bundled in a Docker container for example. I have real trouble trusting serverless platforms that lock you into a single provider (AWS Lambda makes me very uncomfortable) so this is a breath of fresh air. # 10th January 2020, 4:58 am

Async Support—HTTPX (via) HTTPX is the new async-friendly HTTP library for Python spearheaded by Tom Christie. It works in both async and non-async mode with an API very similar to requests. The async support is particularly interesting—it’s a really clean API, and now that Jupyter supports top-level await you can run ’(await httpx.AsyncClient().get(url)).text’ directly in a cell and get back the response. Most excitingly the library lets you pass an ASGI app directly to the client and then perform requests against it—ideal for unit tests. # 10th January 2020, 4:49 am

I’ve found, in my 20 years of running the site, that whenever you ban an ironic Nazi, suddenly they become actual Nazis

Richard “Lowtax” Kyanka # 8th January 2020, 4:11 pm

A Compiler Writing Journey (via) Warren Toomey has been writing a self-compiling compiler for a subset of C, and extensively documenting every step of the journey here on GitHub. The result is an extremely high quality free textbook on compiler construction. # 8th January 2020, 3:33 am

Better Python Object Serialization. TIL about functions.singledispatch, a decorator which makes it easy to create Python functions with implementations that vary based on the type of their arguments and which can have additional implementations registered after the fact—great for things like custom JSON serialization. # 7th January 2020, 8:35 pm

Building a sitemap.xml with a one-off Datasette plugin

One of the fun things about launching a new website is re-learning what it takes to promote a website from scratch on the modern web. I’ve been thoroughly enjoying using Niche Museums as an excuse to explore 2020-era SEO.

[... 1078 words]

Come version 80, any cookie without a SameSite attribute will be treated as “Lax” by Chrome. This is really important to understand because put simply, it’ll very likely break a bunch of stuff. [...] The fix is easy, all it needs is for everyone responsible for maintaining any system that uses cookies that might be passed from an external origin to understand what’s going on. Can’t be that hard, right? Hello? Oh...

Troy Hunt # 3rd January 2020, 4:22 pm

How the Digg team was acquihired. (via) Useful insight into how a talent acquisition can play out from Will Larson, who was an engineering leader at Digg when they negotiated their acquihire exit. # 3rd January 2020, 2:27 am

What do you call the parts of a story? Or: why can’t journalists spell “lead”? (via) Carl M. Johnson’s analysis of what journalists call different elements of a story, useful for data modeling a CMS for a news organization. # 3rd January 2020, 1:13 am

How we use “ship small” to rapidly build new features at GitHub (via) Useful insight into how GitHub develop new features. They make aggressive use of feature flags, shipping a rough skeleton of a new feature to production as early as possible and actively soliciting feedback from other employees as they iterate on the feature. They static JSON mocks of APIs to unblock their frontend engineers and iterate on the necessary data structures while the real backend is bring implemented. # 2nd January 2020, 4:30 am