Simon Willison’s Weblog

Subscribe

July 2021

July 1, 2021

Django SQL Dashboard 1.0 (via) As part of my ongoing attempt to be braver about 1.0 releases (crucial if you want to do semantic versioning properly) I’ve released version 1.0 of Django SQL Dashboard, my Datasette-inspired app for Django that adds an interface for running read-only, bookmarkable SQL queries against a PostgreSQL database. The new version adds a column cog menu providing shortcuts for changing the sort order, counting distinct values and performing a group-by/count against column values.

# 5:44 pm / projects, django-sql-dashboard, sql, django

YAGNI exceptions (via) Luke Plant provides his collection of things that you probably ARE going to need in a project, where adding them later is painful enough that it’s worth the up-front investment. I really like these as a concept, and I’m coining the term PAGNI—for Probably Are Gonna Need It—to describe them.

# 6:30 pm / pagni, software-engineering, yagni, lukeplant

PAGNIs: Probably Are Gonna Need Its

Luke Page has a great post up with his list of YAGNI exceptions.

[... 1,289 words]

Smooth sailing with Kubernetes (via) Scott McCloud (of Understanding Comics) authored this comic introduction to Kubernetes, and it’s a really good explanation of the core concepts. I’d love to have something like this for Datasette—I still feel like I’m a long way from being able to explain the project with anything like this amount of clarity.

# 11:30 pm / kubernetes, comics

July 2, 2021

The art of asking nicely (via) CLIP+VQGAN Is a GAN that generates images based on some text input—you can run it on Google Collab notebooks, there are instructions linked at the bottom of this post. Janelle Shane of AI Weirdness explores tricks for getting the best results out of it for “a herd of sheep grazing on a lush green hillside”—various modifiers like “amazing awesome and epic” produce better images, but the one with the biggest impact, quite upsettingly, is “ultra high definition free desktop wallpaper”.

# 3:02 pm / machine-learning, ai

July 6, 2021

Django SQL Dashboard 1.0

Visit Django SQL Dashboard 1.0

Earlier this week I released Django SQL Dashboard 1.0. I also just released 1.0.1, with a bug fix for PostgreSQL 10 contributed by Ryan Cheley.

[... 629 words]

July 7, 2021

Temporal: getting started with JavaScript’s new date time API. Axel Rauschmayer explores the new proposed API for handling dates, times and timezones in JavaScript., which is under development by Ecma TC39 at the moment (and made available as a Polyfill which you are recommended not to run in production since the API is still being figured out). This is a notoriously difficult problem so it’s always interesting to see the latest thinking on how to best address it.

# 10:29 pm / javascript, timezones, datetime

July 8, 2021

Probably Are Gonna Need It: Application Security Edition (via) Jacob Kaplan-Moss shares his PAGNIs for application security: “basic security mitigations that are easy to do at the beginning, but get progressively harder the longer you put them off”. Plenty to think about in here—I particularly like Jacob’s recommendation to build a production-to-staging database mirroring solution that works from an allow-list of columns, to avoid the risk of accidentally exposing new private data as the product continues to evolve.

# 6:31 pm / jacob-kaplan-moss, security, pagni

The data team: a short story (via) Erik Bernhardsson’s fictional account (“I guess I should really call this a parable”) of a new data team leader successfully growing their team and building a data-first culture in a medium-sized technology company. His depiction of the initial state of the company (data in many different places, frustrated ML researchers who can’t get their research into production, confusion over what the data team is actually for) definitely rings true to me.

# 11:12 pm / data-science, data, leadership

July 10, 2021

Behind the scenes, AWS Lambda (via) Bruno Schaatsbergen pulled together details about how AWS Lambda works under the hood from a detailed review of the AWS documentation, the Firecracker paper and various talks at AWS re:Invent.

# 7:40 pm / aws, software-architecture, lambda, firecracker

July 13, 2021

Weeknotes: Fun with Unix domain sockets

A small enhancement to Datasette this week: I’ve added support for proxying via Unix domain sockets.

[... 809 words]

RabbitMQ Streams Overview. New in RabbitMQ 3.9: streams are a persisted, replicated append-only log with non-destructive consuming semantics. Sounds like it fits the same hole as Kafka and Redis Streams, an extremely useful pattern.

# 11:29 pm / rabbitmq, redis, message-queues, kafka

Dropbox: Sharing our Engineering Career Framework with the world (via) Dropbox have published their engineering career framework, with detailed descriptions of the different levels of the engineering (as opposed to management) career track and what is expected for each one. I’m fascinated by how different companies handle the challenge of keeping career progression working for engineers without pushing them into people management, and this as a particularly detailed and well thought-out implementation of that.

# 11:31 pm / kellan-elliott-mccrea, careers, dropbox

July 14, 2021

Beginning in M94, Chrome will offer HTTPS-First Mode, which will attempt to upgrade all page loads to HTTPS and display a full-page warning before loading sites that don’t support it. Users who enable this mode gain confidence that Chrome is connecting them to sites over HTTPS whenever possible, and that they will see a warning before connecting to sites over HTTP. Based on ecosystem feedback, we’ll explore making HTTPS-First mode the default for all users in the future.

Chromium Blog

# 7:14 pm / browsers, chrome, security, https

July 16, 2021

Datasette 0.58: The annotated release notes

I released Datasette 0.58 last night, with new plugin hooks, Unix domain socket support, a major faceting performance fix and a few other improvements. Here are the annotated release notes.

[... 1,062 words]

The Untold Story of SQLite With Richard Hipp. This is a really interesting interview with SQLite creator D. Richard Hipp—it covers all sorts of aspects of the SQLite story I hadn’t heard before, from its inspiration by a software challenge on a battleship to the first income from clients such as AOL and Symbian to the formation of the SQLite Consortium (based on advice from Mozilla’s Mitchell Baker) and more.

# 8:12 pm / sqlite, d-richard-hipp, podcasts

July 17, 2021

Last Mile Redis (via) Fly.io article about running a local redis cache in each of their geographic regions—“Cache data overlaps a lot less than you assume it will. For the most part, people in Singapore will rely on a different subset of data than people in Amsterdam or São Paulo or New Jersey.” But then they note that Redis has the ability to act as both a replica of a primary AND a writable server at the same time (“replica-read-only no”), which actually makes sense for a cache—it lets you cache local data but send out cluster-wide cache purges if necessary.

# 2:44 am / fly, caching, redis

The Digital Antiquarian: Sam and Max Hit the Road. Delightful history and retrospective review of 1993’s Sam and Max Hit the Road. I didn’t know Sam and Max happened because the independent comic’s creator worked for LucasArts and the duo had embedded themselves in LucasArts culture through their use in the internal educational materials prepared for SCUMM University.

# 3:12 am / history, games, game-design

Datasette downloads per day (with Observable Plot) (via) I built an Observable notebook that imports PyPI package download data from datasette.io (itself scraped from pypistats.org using a scheduled GitHub Action) and plots it using Observable Plot. Datasette downloads from PyPI apparently jumped from ~800/day in May to ~4,000/day in July—would love to know why!

# 5:01 pm / observable, datasette, observable-plot

It doesn’t take much public creativity to stand out as a job candidate

I’ve spent nearly twenty years blogging, giving talks and releasing open source code. It’s been fantastic for my career, and a huge amount of work. But here’s a useful secret: you don’t have to put very much work at all into public creativity in order to stand out as a job candidate.

[... 495 words]

July 18, 2021

I've always believed that a book, even a technical book, should try to tell a cohesive story. The challenge is that as Python has grown in popularity, it has really turned into three different languages--each with their own story. There is a whimsical Python for scripting and tinkering, a quirky Python for asynchronous programming and networking, and a serious Python for enterprise applications. Sometimes these stories intersect. Sometimes not.

David Beazley

# 2:53 pm / david-beazley, python

Organize and Index Your Screenshots (OCR) on macOS (via) Alexandru Nedelcu has a very neat recipe for creating an archive of searchable screenshots on macOS: set the default save location for screenshots to a Dropbox folder, then create a launch agent that runs a script against new files in that folder to run tesseract OCR to convert them into a searchable PDF.

# 4:11 pm / ocr, macosx

July 19, 2021

Inserting One Billion Rows in SQLite Under a Minute (via) Avinash Sajjanshetty experiments with accelerating writes to a test table in SQLite, using various SQLite pragmas to accelerate inserts followed by a rewrite of Python code to Rust. Also of note: running the exact same code in PyPy saw a 3.5x speed-up!

# 12:13 am / rust, sqlite, pypy

toyDB: references. toyDB is a “distributed SQL database in Rust, written as a learning project”, with its own implementations of SQL, raft, ACID transactions, B+trees and more. toyDB author Erik Grinaker has assembled a detailed set of references that he used to learn how to build a database—I’d love to see more projects do this, it’s really useful.

# 12:18 am / rust, databases

Launch HN Instructions (via) The instructions for YC companies that are posting their launch announcement on Hacker News are really interesting to read. “As founders, you’re used to talking to users, customers, and investors. HN readers are not any of those—what they are is peers, and using any of those styles with peers feels clueless and entitled. [...] To interest HN, write in a factual, personal, and modest way about what problem you solve, why it matters, how you solve it, and how you got there.”

# 1:05 am / marketing, y-combinator, hacker-news

July 21, 2021

Weeknotes: sqlite-transform 1.1, Datasette 0.58.1, datasette-graphql 1.5

Work on Project Pelican inspires new features and improvements across a number of different projects.

[... 1,419 words]

July 22, 2021

Datasette—an ecosystem of tools for working with small data

Visit Datasette - an ecosystem of tools for working with small data

This is the transcript and video from a talk I gave at PyGotham 2020 about using SQLite, Datasette and Dogsheep to work with small data.

[... 4,655 words]

July 23, 2021

The Tyranny of Spreadsheets (via) In discussing the notorious Excel incident last year when the UK lost track of 16,000 Covid cases due to a .xls row limit, Tim Harford presents a history of the spreadsheet, dating all the way back to Francesco di Marco Datini and double-entry bookkeeping in 1396. A delightful piece of writing.

# 3:57 am / spreadsheets, covid19, history

July 24, 2021

How the Python import system works (via) Remarkably detailed and thorough dissection of how exactly import, modules and packages work in Python—eventually digging right down into the C code. Part of Victor Skvortsov’s excellent “Python behind the scenes” series.

# 8:12 pm / python

July 28, 2021

The Baked Data architectural pattern

Visit The Baked Data architectural pattern

I’ve been exploring an architectural pattern for publishing websites over the past few years that I call the “Baked Data” pattern. It provides many of the advantages of static site generators while avoiding most of their limitations. I think it deserves to be used more widely.

[... 1,896 words]

2021 » July

MTWTFSS
   1234
567891011
12131415161718
19202122232425
262728293031