Simon Willison’s Weblog

Subscribe
Atom feed

Blogmarks

Filters: Sorted by date

typesplainer (via) A Python module that produces human-readable English descriptions of Python type definitions—also available as a web interface.

# 15th March 2022, 6:18 am / python, mypy

Bugs in Hello World. If a Unix program attempts to send its standard output to /dev/full it should return an error code. Many classic “hello world” programs fail to correctly handle this case.

# 15th March 2022, 6:14 am / unix

Contributing to Complex Projects (via) Mitchell Hashimoto describes in detail his process for understanding and eventually contributing to a complex new codebase. I picked up a whole bunch of useful tips from this.

# 15th March 2022, 6:09 am / open-source, software-engineering, mitchell-hashimoto

@newshomepages (via) Ben Welsh used my shot-scraper tool and GitHub Actions to launch a Twitter bot which tweets screenshots of newspaper homepages on a scheduled basis. Ben says: “The tech is so easy, I was able to pull it off in a couple hours at zero cost. A decade ago I ran a similar project using the cloud resources of the day. [...] It costs thousands of dollars and the screenshots were of much lower quality. Incredible progress!”

# 12th March 2022, 7:21 pm / twitter, github-actions, playwright, shot-scraper, ben-welsh

curlconverter.com (via) This is pretty magic: paste in a “curl” command (including the ones you get from browser devtools using copy-as-curl) and this will convert that into code for making the same HTTP request... using Python, JavaScript, PHP, R, Go, Rust, Elixir, Java, MATLAB, Ansible URI, Strest, Dart or JSON.

# 10th March 2022, 8:12 pm / curl, http

Postgres Auditing in 150 lines of SQL (via) I’ve run up against the problem of tracking changes made to rows within a database table so many times, and I still don’t have a preferred solution. This approach to it looks very neat: it uses PostgreSQL triggers to populate a single audit table (as opposed to one audit table per tracked table) and records the previous and current column values for the row using jsonb.

# 9th March 2022, 7:19 pm / databases, postgresql

lite-youtube-embed (via) Handy Web Component wrapper around the standard YouTube iframe embed which knocks over 500KB of JavaScript off the initial page load—I just added this to the datasette.io homepage and increased the Lighthouse performance score from 51 to 93!

# 8th March 2022, 9:13 pm / iframes, paul-irish, youtube, web-performance, web-components

migra (via) This looks like a very handy tool to have around: run “migra postgresql:///a postgresql:///b” and it will detect and output the SQL alter statements needed to modify the first PostgreSQL database schema to match the second. It’s written in Python, running on top of SQLAlchemy.

# 26th February 2022, 11:23 pm / databases, migrations, postgresql

[history] When I tried this in 1996 (via) “I removed the GIL back in 1996 from Python 1.4...” is the start of a fascinating (supportive) comment by Greg Stein on the promising nogil Python fork that Sam Gross has been putting together. Greg provides some historical context that I’d never heard before, relating to an embedded Python for Microsoft IIS.

# 21st February 2022, 10:43 pm / gil, history, python

Fullmoon (via) A “fast and minimalistic web framework” written in Lua, based on Redbean. The documentation for this is fantastic, and because it uses Redbean the development experience is to download the Redbean executable (which runs on every platform) and then drop your own Lua scripts into it using zip.

# 18th February 2022, 6:41 pm / lua, redbean, cosmopolitan

redbean (via) “redbean makes it possible to share web applications that run offline as a single-file αcτµαlly pδrταblε εxεcµταblε zip archive which contains your assets. All you need to do is download the redbean.com program below, change the filename to .zip, add your content in a zip editing tool, and then change the extension back to .com”.

redbean is implemented as a single C file with a dazzling array of clever tricks—most impressively, the single executable works on Linux, macOS, Windows and various BSDs!

It embeds Lua, and in June last year added SQLite too—so self-contained distributable web applications built with Redbean can now use Lua and SQLite for dynamic scripting. Performance sounds incredible: “redbean can serve 1 million+ gzip encoded responses per second on a cheap personal computer”.

# 17th February 2022, 6:01 am / c, lua, sqlite, redbean, cosmopolitan

Datasette table diagram using Mermaid (via) Mermaid is a DSL for generating diagrams from plain text, designed to be embedded in Markdown. GitHub just added support for Mermaid to their Markdown pipeline, which inspired me to try it out. Here’s an Observable Notebook I built which uses Mermaid to visualize the relationships between Datasette tables based on their foreign keys.

# 14th February 2022, 7:43 pm / dsl, github, visualization, datasette, observable, mermaid

Running C unit tests with pytest (via) Brilliant, detailed tutorial by Gabriele Tornetta on testing C code using pytest, which also doubles up as a ctypes tutorial. There’s a lot of depth here—in addition to exercising C code through ctypes, Gabriele shows how to run each test in a separate process so that segmentation faults don’t fail the entire suite, then adds code to run the compiler as part of the pytest run, and then shows how to use gdb trickery to generate more useful stack traces.

# 12th February 2022, 5:14 pm / c, ctypes, testing, pytest

jless (via) A really nice new command-line JSON viewer, written in Rust, created by Paul Julius Martinez. It provides a terminal interface for navigating through large JSON files, including expanding and contracting nested objects and searching for strings or a modified form of regular expressions.

# 12th February 2022, 3:17 am / json, rust

lon lat lon lat lon. Tom MacWright’s definitive guide to the (latitude, longitude) v.s. (longitude, latitude) debate. The answer is frustrating: both orders are used by significant software, so there’s no single answer that will satisfy everyone. I’ve recently been mostly convinced over to the longitude, latitude side mainly because that’s a better fit for the non-geospatial x, y pattern.

# 10th February 2022, 4:32 pm / geospatial, tom-macwright

GitHub Burndown (via) Neat Observable notebook by Tom MacWright—give it a GitHub access token and the name of a repo and it pulls the details of every issue and plots a burndown chart over time, showing how long issues stay open for. The code is worth spending some time with—the way it fetches data from the paginated JSON API is a really great example of using generators with Observable, and the chart itself is a lovely clear example of Observable Plot.

# 10th February 2022, 4:29 pm / github, observable, tom-macwright, observable-plot

Single dependency stacks (via) Brandur Leach notes that the core services at Crunchy (admittedly a PostgreSQL hosting and consultancy company) have only one stateful dependency – Postgres. No Redis, ElasticSearch or anything else. This means that problems like rate limiting and search, which are often farmed out to external services, are all handled using either PostgreSQL or in-memory mechanisms on their servers.

# 9th February 2022, 6:43 pm / postgresql, brandur-leach

Sha256 Algorithm Explained (via) Absolutely beautiful interactive animated explanation by Domingo Martin of the SHA256 hashing algorithm.

# 7th February 2022, 7:27 pm / algorithms, explorables

webvm.io (via) This is one heck of a tech demo: it’s a full copy of Debian, compiled to WebAssembly and running in your browser. It’s fully stocked with Python, Perl, Ruby, Node.js and even a working gcc compiler! The underlying technology, CheerpX, is a closed-source WebAssembly virtualization platform.

# 2nd February 2022, 2:29 am / debian, webassembly

A CGo-free port of SQLite. Fascinating Go version of SQLite, which uses Go code that has been translated from the original SQLite C using ccgo, a package by the same author which “translates cc ASTs to Go source code”. It claims to pass the full public SQLite test suite, which is very impressive.

# 30th January 2022, 10:25 pm / compilers, go, sqlite

Mypyc (via) Spotted this in the Black release notes: “Black is now compiled with mypyc for an overall 2x speed-up”. Mypyc is a tool that compiles Python modules (written in a subset of Python) to C extensions—similar to Cython but using just Python syntax, taking advantage of type annotations to perform type checking and type inference. It’s part of the mypy type checking project, which has been using it since 2019 to gain a 4x performance improvement over regular Python.

# 30th January 2022, 1:31 am / c, performance, python, mypy

Black 22.1.0 (via) Black, the uncompromising code formatter for Python, has had its first stable non-beta release after almost four years of releases. I adopted Black a few years ago for all of my projects and I wouldn’t release Python code without it now—the productivity boost I get from not spending even a second thinking about code formatting and indentation is huge.

I know Django has been holding off on adopting it until a stable release was announced, so hopefully that will happen soon.

# 30th January 2022, 1:23 am / django, python, lukasz-langa, black

The baseline for web development in 2022 (via) “TL;DR:The baseline for web development in 2022 is: low-spec Android devices in terms of performance, Safari from two years before in terms of Web Standards, and 4G in terms of networks. The web in general is not answering those needs properly, especially in terms of performance where factors such as an over-dependence on JavaScript are hindering our sites’ performance.”

# 27th January 2022, 8:09 pm / alex-russell, web-performance

Two reasons Kubernetes is so complex (via) I like how this article proposes that Kubernetes isn’t trying to be a tool for deploying containers—it’s more like an operating system for a cluster of machines, responsible for the same kind of goals as a regular operating system such as resource sharing and portability. And since everything is built as control loops which attempt to modify actual state to fit the declarative desired state, errors can occur asynchronously seconds or even minutes after the desired state has been updated.

# 27th January 2022, 6:25 pm / kubernetes

Observable Plot Cheatsheets (via) Beautiful new set of cheatsheets by Mike Freeman for the Observable Plot charting library. This is really top notch documentation—the cheatsheets are available as printable PDFs but the real value here is in the interactive versions of them, which include Observable-powered sliders to tweak the different examples and copy out the resulting generated code.

# 25th January 2022, 10:12 pm / visualization, observable, observable-plot

Roblox Return to Service 10/28-10/31 2021 (via) A particularly good example of a public postmortem on an outage. Roblox was down for 72 hours last year, as a result of an extremely complex set of circumstances which took a lot of effort to uncover. It’s interesting to think through what kind of monitoring you would need to have in place to help identify the root cause of this kind of issue.

# 21st January 2022, 4:41 pm / ops, observability, postmortem

How to Add a Favicon to Your Django Site (via) Adam Johnson did the research on the best way to handle favicons - Safari still doesn't handle SVG icons so the best solution today is a PNG served from the /favicon.ico path. This article inspired me to finally add a proper favicon to Datasette.

# 20th January 2022, 7:03 am / django, favicons, adam-johnson

Tricking Postgres into using an insane – but 200x faster – query plan. Jacob Martin talks through a PostgreSQL query optimization they implemented at Spacelift, showing in detail how to interpret the results of EXPLAIN (FORMAT JSON, ANALYZE) using the explain.dalibo.com visualization tool.

# 18th January 2022, 8:53 pm / performance, postgresql, optimization

SQLime: SQLite Playground (via) Anton Zhiyanov built this useful mobile-friendly online playground for trying things out it SQLite. It uses the sql.js library which compiles SQLite to WebAssembly, so it runs everything in the browser—but it also supports saving your work to Gists via the GitHub API. The JavaScript source code is fun to read: the site doesn’t use npm or Webpack or similar, opting instead to implement everything library-free using modern JavaScript modules and Web Components.

# 17th January 2022, 7:08 pm / javascript, sqlite, web-components, webassembly, anton-zhiyanov

Abusing AWS Lambda to make an Aussie Search Engine (via) Ben Boyter built a search engine that only indexes .au Australian websites, with the novel approach of directly compiling the search index into 250 different ~40MB large lambda functions written in Go, then running searches across 12 million pages by farming them out to all of the lambdas and combining the results. His write-up includes all sorts of details about how he built this, including how he ran the indexer and how he solved the surprisingly hard problem of returning good-enough text snippets for the results.

# 16th January 2022, 8:52 pm / aws, go, lambda, search

Years

Tags