Simon Willison’s Weblog


Filters: Type: blogmark ×

unasync (via) Today I started wondering out loud if one could write code that takes an asyncio Python library and transforms it into the synchronous equivalent by using some regular expressions to strip out the “await ...” keywords and suchlike. Turns out that can indeed work, and Ratan Kulshreshtha built it! unasync uses the standard library tokenize module to run some transformations against an async library and spit out the sync version automatically. I’m now considering using this for sqlite-utils. # 27th February 2021, 10:20 pm

cosmopolitan libc (via) “Cosmopolitan makes C a build-once run-anywhere language, similar to Java, except it doesn’t require interpreters or virtual machines be installed beforehand. [...] Instead, it reconfigures stock GCC to output a POSIX-approved polyglot format that runs natively on Linux + Mac + Windows + FreeBSD + OpenBSD + BIOS with the best possible performance and the tiniest footprint imaginable.” This is a spectacular piece of engineering. # 27th February 2021, 6:02 am

How to secure an Ubuntu server using Tailscale and UFW. This is the Tailscale tutorial I’ve always wanted: it explains in detail how you can run an Ubuntu server (from any cloud provider) such that only devices on your personal Tailscale network can access it. # 26th February 2021, 8:31 pm

Fuzzy Name Matching in Postgres. Paul Ramsey describes how to implement fuzzy name matching in PostgreSQL using the fuzzystrmatch extension and its levenshtein() and soundex() functions, plus functional indexes to query against indexed soundex first and then apply slower Levenshtein. The same tricks should also work against SQLite using the datasette-jellyfish plugin. # 22nd February 2021, 9:16 pm

Blazing fast CI with pytest-split and GitHub Actions (via) pytest-split is a neat looking variant on the pattern of splitting up a test suite to run different parts of it in parallel on different machines. It involves maintaining a periodically updated JSON file in the repo recording the average runtime of different tests, to enable them to be more fairly divided among test runners. Includes a recipe for running as a matrix in GitHub Actions. # 22nd February 2021, 7:06 pm

People, processes, priorities. Twitter thread from Adrienne Porter Felt outlining her model for thinking about engineering management. I like this trifecta of “people, processes, priorities” a lot. # 22nd February 2021, 5:21 pm

trustme (via) This looks incredibly useful. Run “python -m trustme” and it will create three files for you: server.pem, server.key and a client.pem client certificate, providing a certificate for “localhost” (or another host you spefict) using a fake certificate authority. Looks like it should be the easiest way to test TLS locally. # 11th February 2021, 8 pm

Why I Built Litestream. Litestream is a really exciting new piece of technology by Ben Johnson, who previously built BoltDB, the key-value store written in Go that is used by etcd. It adds replication to SQLite by running a process that converts the SQLite WAL log into a stream that can be saved to another folder or pushed to S3. The S3 option is particularly exciting—Ben estimates that keeping a full point-in-time recovery log of a high write SQLite database should cost in the order of a few dollars a month. I think this could greatly expand the set of use-cases for which SQLite is sensible choice. # 11th February 2021, 7:25 pm

Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies (via) Alex Birsan describes a new category of security vulnerability he discovered in the npm, pip and gem packaging ecosystems: if a company uses a private repository with internal package names, uploading a package with the same name to the public repository can often result in an attacker being able to execute their own code inside the networks of their target. Alex scored over $130,000 in bug bounties from this one, from a number of name-brand companies. Of particular note for Python developers: the --extra-index-url argument to pip will consult both public and private registries and install the package with the highest version number! # 10th February 2021, 8:42 pm

Cleaning Up Your Postgres Database (via) Craig Kerstiens provides some invaluable tips on running an initial check of the health of a PostgreSQL database, by using queries against the pg_statio_user_indexes table to find the memory cache hit ratio and the pg_stat_user_tables table to see what percentage of queries to your tables are using an index. # 3rd February 2021, 7:32 am

JMeter Result Analysis using Datasette (via) NaveenKumar Namachivayam wrote a detailed tutorial on using Datasette (on Windows) and csvs-to-sqlite to analyze the results of JMeter performance test runs and then publish them online using Vercel. # 1st February 2021, 4:42 am

Making GitHub’s new homepage fast and performant. A couple of really clever tricks in this article by Tobias Ahlin. The first is using IntersectionObserver in conjunction with the video preload=“none” attribute to lazily load a video when it scrolls into view. The second is an ingenious trick to create an efficiently encoded transparent JPEG image: embed the image in a SVG file twice, once as the image and once as a transparency mask. # 29th January 2021, 7:05 pm

Culture is the Behavior You Reward and Punish (via) Jocelyn Goldfein describes an intriguing exercise for discovering your company culture: imagine a new hire asking for advice on what makes people successful there, and use that to review what behavior is rewarded and discouraged. # 12th January 2021, 6:09 am

datasette-css-properties (via) My new Datasette plugin defines a “.css” output format which returns the data from the query as a valid CSS stylesheet defining custom properties for each returned column. This means you can build a page using just HTML and CSS that consumes API data from Datasette, no JavaScript required! Whether this is a good idea or not is left as an exercise for the reader. # 7th January 2021, 7:42 pm

Custom Properties as State. Fascinating thought experiment by Chris Coyier: since CSS custom properties can be defined in an external stylesheet, we can APIs that return stylesheets defining dynamically server-side generated CSS values for things like time-of-day colour schemes or even strings that can be inserted using ::after { content: var(--my-property). This gave me a very eccentric idea for a Datasette plugin... # 7th January 2021, 7:39 pm (via) I did not know this trick: by defining multiple box-shadow values as a comma separated list you can create much more finely tuned shadow effects. This tool by Philipp Brumm provides a very smart UI for designing shadows. # 6th January 2021, 4:12 pm

DALL·E: Creating Images from Text (via) “DALL·E is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text–image pairs.”. The examples in this paper are astonishing—“an illustration of a baby daikon radish in a tutu walking a dog” generates exactly that. # 5th January 2021, 8:31 pm

Everything You Always Wanted To Know About GitHub (But Were Afraid To Ask) (via) ClickHouse by Yandex is an open source column-oriented data warehouse, designed to run analytical queries against TBs of data. They’ve loaded the full GitHub Archive of events since 2011 into a public instance, which is a great way of both exploring GitHub activity and trying out ClickHouse. Here’s a query I just ran that shows number of watch events per year, for example: SELECT toYear(created_at) as yyyy, count() FROM github_events WHERE event_type = ’WatchEvent’ group by yyyy # 5th January 2021, 1:02 am

hooks-in-a-nutshell.js (via) Neat, heavily annotated implementation of React-style hooks in pure JavaScript, really useful for understanding how they work. # 4th January 2021, 6:36 pm

sqlite-utils 3.2 (via) As discussed in my weeknotes yesterday, this is the release of sqlite-utils that adds the new “cached table counts via triggers” mechanism. # 3rd January 2021, 9:25 pm

Replicating SQLite with rqlite (via) I’ve been trying out rqlite, a “lightweight, distributed relational database, which uses SQLite as its storage engine”. It’s written in Go and uses the Raft consensus algorithm to allow a cluster of nodes to elect a leader and replicate SQLite statements between them. By default it uses in-memory SQLite databases with an on-disk Raft replication log—here are my notes on running it in “on disk” mode as a way to run multiple Datasette processes against replicated SQLite database files. # 28th December 2020, 7:51 pm

How Shopify Uses WebAssembly Outside of the Browser (via) I’m fascinated by applications of WebAssembly outside the browser. As a Python programmer I’m excited to see native code libraries herring compiled to WASM in a way that lets me call them from Python code via a bridge, but the other interesting application is executing untrusted code in a sandbox. Shopify are doing exactly that—they are building a kind-of plugin mechanism where partner code compiled to WASM runs inside their architecture using Fastly’s Lucet. The performance numbers are in the same ballpark as native code. Also interesting: they’re recommending AssemblyScript, a TypeScript-style language designed to compile directly to WASM without needing any additional interpreter support, as required by dynamic languages such as JavaScript, Python or Ruby. # 19th December 2020, 4:46 pm

Commits are snapshots, not diffs (via) Useful, clearly explained revision of some Git fundamentals. # 17th December 2020, 10:01 pm (via) Datasette finally has an official project website, three years after the first release of the software. I built it using Datasette, with custom templates to define the various pages. The site includes news, latest releases, example sites and a new searchable plugin directory. # 11th December 2020, 4:11 am

Deno 1.6 Release Notes. Two signature features in Deno 1.6 worth paying attention to: a built-in language server for code editors like VS Code, and the “deno compile” command which can build Deno JavaScript/TypeScript projects into standalone binaries. The ability to build binaries has turned out to be a killer feature of both Go and Rust, so seeing it ship as a default capability of a interpreted dynamic language is fascinating. I would love it if Python followed Deno’s example. # 10th December 2020, 1:25 am

The case against client certificates (via) Colm MacCárthaigh provides a passionately argued Twitter thread about client certificates and why they should be avoided. I tried using them as an extra layer of protection fir my personal Dogsheep server and ended up abandoning them—certificate management across my devices was too fiddly. # 9th December 2020, 2:41 pm

Cameras and Lenses (via) Fabulous explotable interactive essay by Bartosz Ciechanowski explaining how cameras and lenses work. # 8th December 2020, 3:38 am

The secrets of Monkey Island’s source code (via) To celebrate the thirty year anniversary of the Secret of Monkey Island the Video Game History Foundation interviewed developer Rod Gilbert and produced this comprehensive collection of cut content and material showing how the game was originally constructed. # 5th December 2020, 4:32 pm

Command Line Interface Guidelines (via) Aanand Prasad, Ben Firshman, Carl Tashian and Eva Parish provide the missing manual for designing CLI tools in 2020. Deeply researched and clearly presented—I picked up a bunch of useful tips and ideas from reading this, and I’m looking forward to applying them to my own CLI projects. # 4th December 2020, 8:44 pm