Simon Willison’s Weblog

Subscribe

Items in Jul, 2021

Filters: Year: 2021 × Month: Jul × Sorted by date


Weeknotes: datasette-remote-metadata, sqlite-transform --multi

I mentioned Project Pelican (still a codename until the end of the embargo) last week. This week it inspired a new plugin, datasette-remote-metadata.

[... 595 words]

Clickhouse on Cloud Run (via) Alex Reid figured out how to run Clickhouse against read-only baked data on Cloud Run last year, and wrote up some comprehensive notes. # 29th July 2021, 6:07 am

The Baked Data architectural pattern

I’ve been exploring an architectural pattern for publishing websites over the past few years that I call the “Baked Data” pattern. It provides many of the advantages of static site generators while avoiding most of their limitations. I think it deserves to be used more widely.

[... 1896 words]

How the Python import system works (via) Remarkably detailed and thorough dissection of how exactly import, modules and packages work in Python—eventually digging right down into the C code. Part of Victor Skvortsov’s excellent “Python behind the scenes” series. # 24th July 2021, 8:12 pm

The Tyranny of Spreadsheets (via) In discussing the notorious Excel incident last year when the UK lost track of 16,000 Covid cases due to a .xls row limit, Tim Harford presents a history of the spreadsheet, dating all the way back to Francesco di Marco Datini and double-entry bookkeeping in 1396. A delightful piece of writing. # 23rd July 2021, 3:57 am

Datasette—an ecosystem of tools for working with small data

This is the transcript and video from a talk I gave at PyGotham 2020 about using SQLite, Datasette and Dogsheep to work with small data.

[... 4654 words]

Weeknotes: sqlite-transform 1.1, Datasette 0.58.1, datasette-graphql 1.5

Work on Project Pelican inspires new features and improvements across a number of different projects.

[... 1419 words]

Launch HN Instructions (via) The instructions for YC companies that are posting their launch announcement on Hacker News are really interesting to read. “As founders, you’re used to talking to users, customers, and investors. HN readers are not any of those—what they are is peers, and using any of those styles with peers feels clueless and entitled. [...] To interest HN, write in a factual, personal, and modest way about what problem you solve, why it matters, how you solve it, and how you got there.” # 19th July 2021, 1:05 am

toyDB: references. toyDB is a “distributed SQL database in Rust, written as a learning project”, with its own implementations of SQL, raft, ACID transactions, B+trees and more. toyDB author Erik Grinaker has assembled a detailed set of references that he used to learn how to build a database—I’d love to see more projects do this, it’s really useful. # 19th July 2021, 12:18 am

Inserting One Billion Rows in SQLite Under a Minute (via) Avinash Sajjanshetty experiments with accelerating writes to a test table in SQLite, using various SQLite pragmas to accelerate inserts followed by a rewrite of Python code to Rust. Also of note: running the exact same code in PyPy saw a 3.5x speed-up! # 19th July 2021, 12:13 am

Organize and Index Your Screenshots (OCR) on macOS (via) Alexandru Nedelcu has a very neat recipe for creating an archive of searchable screenshots on macOS: set the default save location for screenshots to a Dropbox folder, then create a launch agent that runs a script against new files in that folder to run tesseract OCR to convert them into a searchable PDF. # 18th July 2021, 4:11 pm

I’ve always believed that a book, even a technical book, should try to tell a cohesive story. The challenge is that as Python has grown in popularity, it has really turned into three different languages--each with their own story. There is a whimsical Python for scripting and tinkering, a quirky Python for asynchronous programming and networking, and a serious Python for enterprise applications. Sometimes these stories intersect. Sometimes not.

David Beazley # 18th July 2021, 2:53 pm

It doesn’t take much public creativity to stand out as a job candidate

I’ve spent nearly twenty years blogging, giving talks and releasing open source code. It’s been fantastic for my career, and a huge amount of work. But here’s a useful secret: you don’t have to put very much work at all into public creativity in order to stand out as a job candidate.

[... 495 words]

Datasette downloads per day (with Observable Plot) (via) I built an Observable notebook that imports PyPI package download data from datasette.io (itself scraped from pypistats.org using a scheduled GitHub Action) and plots it using Observable Plot. Datasette downloads from PyPI apparently jumped from ~800/day in May to ~4,000/day in July—would love to know why! # 17th July 2021, 5:01 pm

The Digital Antiquarian: Sam and Max Hit the Road. Delightful history and retrospective review of 1993’s Sam and Max Hit the Road. I didn’t know Sam and Max happened because the independent comic’s creator worked for LucasArts and the duo had embedded themselves in LucasArts culture through their use in the internal educational materials prepared for SCUMM University. # 17th July 2021, 3:12 am

Last Mile Redis (via) Fly.io article about running a local redis cache in each of their geographic regions—“Cache data overlaps a lot less than you assume it will. For the most part, people in Singapore will rely on a different subset of data than people in Amsterdam or São Paulo or New Jersey.” But then they note that Redis has the ability to act as both a replica of a primary AND a writable server at the same time (“replica-read-only no”), which actually makes sense for a cache—it lets you cache local data but send out cluster-wide cache purges if necessary. # 17th July 2021, 2:44 am

The Untold Story of SQLite With Richard Hipp. This is a really interesting interview with SQLite creator D. Richard Hipp—it covers all sorts of aspects of the SQLite story I hadn’t heard before, from its inspiration by a software challenge on a battleship to the first income from clients such as AOL and Symbian to the formation of the SQLite Consortium (based on advice from Mozilla’s Mitchell Baker) and more. # 16th July 2021, 8:12 pm

Datasette 0.58: The annotated release notes

I released Datasette 0.58 last night, with new plugin hooks, Unix domain socket support, a major faceting performance fix and a few other improvements. Here are the annotated release notes.

[... 1062 words]

Beginning in M94, Chrome will offer HTTPS-First Mode, which will attempt to upgrade all page loads to HTTPS and display a full-page warning before loading sites that don’t support it. Users who enable this mode gain confidence that Chrome is connecting them to sites over HTTPS whenever possible, and that they will see a warning before connecting to sites over HTTP. Based on ecosystem feedback, we’ll explore making HTTPS-First mode the default for all users in the future.

Chromium Blog # 14th July 2021, 7:14 pm

Dropbox: Sharing our Engineering Career Framework with the world (via) Dropbox have published their engineering career framework, with detailed descriptions of the different levels of the engineering (as opposed to management) career track and what is expected for each one. I’m fascinated by how different companies handle the challenge of keeping career progression working for engineers without pushing them into people management, and this as a particularly detailed and well thought-out implementation of that. # 13th July 2021, 11:31 pm

RabbitMQ Streams Overview. New in RabbitMQ 3.9: streams are a persisted, replicated append-only log with non-destructive consuming semantics. Sounds like it fits the same hole as Kafka and Redis Streams, an extremely useful pattern. # 13th July 2021, 11:29 pm

Weeknotes: Fun with Unix domain sockets

A small enhancement to Datasette this week: I’ve added support for proxying via Unix domain sockets.

[... 809 words]

Behind the scenes, AWS Lambda (via) Bruno Schaatsbergen pulled together details about how AWS Lambda works under the hood from a detailed review of the AWS documentation, the Firecracker paper and various talks at AWS re:Invent. # 10th July 2021, 7:40 pm

The data team: a short story (via) Erik Bernhardsson’s fictional account (“I guess I should really call this a parable”) of a new data team leader successfully growing their team and building a data-first culture in a medium-sized technology company. His depiction of the initial state of the company (data in many different places, frustrated ML researchers who can’t get their research into production, confusion over what the data team is actually for) definitely rings true to me. # 8th July 2021, 11:12 pm

Probably Are Gonna Need It: Application Security Edition (via) Jacob Kaplan-Moss shares his PAGNIs for application security: “basic security mitigations that are easy to do at the beginning, but get progressively harder the longer you put them off”. Plenty to think about in here—I particularly like Jacob’s recommendation to build a production-to-staging database mirroring solution that works from an allow-list of columns, to avoid the risk of accidentally exposing new private data as the product continues to evolve. # 8th July 2021, 6:31 pm

Temporal: getting started with JavaScript’s new date time API. Axel Rauschmayer explores the new proposed API for handling dates, times and timezones in JavaScript., which is under development by Ecma TC39 at the moment (and made available as a Polyfill which you are recommended not to run in production since the API is still being figured out). This is a notoriously difficult problem so it’s always interesting to see the latest thinking on how to best address it. # 7th July 2021, 10:29 pm

Django SQL Dashboard 1.0

Earlier this week I released Django SQL Dashboard 1.0. I also just released 1.0.1, with a bug fix for PostgreSQL 10 contributed by Ryan Cheley.

[... 629 words]

The art of asking nicely (via) CLIP+VQGAN Is a GAN that generates images based on some text input—you can run it on Google Collab notebooks, there are instructions linked at the bottom of this post. Janelle Shane of AI Weirdness explores tricks for getting the best results out of it for “a herd of sheep grazing on a lush green hillside”—various modifiers like “amazing awesome and epic” produce better images, but the one with the biggest impact, quite upsettingly, is “ultra high definition free desktop wallpaper”. # 2nd July 2021, 3:02 pm

Smooth sailing with Kubernetes (via) Scott McCloud (of Understanding Comics) authored this comic introduction to Kubernetes, and it’s a really good explanation of the core concepts. I’d love to have something like this for Datasette—I still feel like I’m a long way from being able to explain the project with anything like this amount of clarity. # 1st July 2021, 11:30 pm

PAGNIs: Probably Are Gonna Need Its

Luke Page has a great post up with his list of YAGNI exceptions.

[... 1289 words]