Simon Willison’s Weblog

Subscribe

Items in Jul, 2022

Filters: Year: 2022 × Month: Jul × Sorted by date


Cleaning data with sqlite-utils and Datasette (via) I wrote a new tutorial for the Datasette website, showing how to use sqlite-utils to import a CSV file, clean up the resulting schema, fix date formats and extract some of the columns into a separate table. It’s accompanied by a ten minute video originally recorded for the HYTRADBOI conference. # 31st July 2022, 7:57 pm

GAS-ICS-Sync (via) Google Calendar can subscribe to ICS calendar feeds... but polls for updates less than once every 24 hours (as far as I can tell) greatly limiting their usefulness. Derek Antrican wrote a script using Google App Script which fixes this by polling calendar URLs more often and writing them to your calendar via the write API. # 30th July 2022, 11:47 pm

GPSJam (via) John Wiseman’s “Daily maps of GPS interference” —a beautiful interactive globe (powered by Mapbox GL) which you can use to see points of heaviest GPS interference over a 24 hour period, using data collected from commercial airline radios by ADS-B Exchange. “From what I can tell the most common reason for aircraft GPS systems to have degraded accuracy is jamming by military systems. At least, the vast majority of aircraft that I see with bad GPS accuracy are flying near conflict zones where GPS jamming is known to occur.” # 30th July 2022, 7:51 pm

Introducing sqlite-lines—a SQLite extension for reading files line-by-line (via) Alex Garcia wrote a brilliant C module for SQLIte which adds functions (and a table-valued function) for efficiently reading newline-delimited text into SQLite. When combined with SQLite’s built-in JSON features this means you can read a huge newline-delimited JSON file into SQLite in a streaming fashion so it doesn’t exhaust memory for a large file. Alex also compiled the extension to WebAssembly, and his post here is an Observable notebook post that lets you exercise the code directly. # 30th July 2022, 7:18 pm

Weeknotes: Joining the board of the Python Software Foundation

A few weeks ago I was elected to the board of directors for the Python Software Foundation.

[... 2081 words]

Packaging Python Projects with pyproject.toml. I decided to finally figure out how packaging with pyproject.toml works—all of my existing projects use setup.py. The official tutorial from the Python Packaging Authority (PyPA) had everything I needed. # 29th July 2022, 11:18 pm

Fastest way to turn HTML into text in Python (via) A light benchmark of the new-to-me selectolax Python library shows it performing extremely well for tasks such as extracting just the text from an HTML string, after first manipulating the DOM. selectolax is a Python binding over the Modest and Lexbor HTML parsing engines, which are written in no-outside-dependency C. # 27th July 2022, 5:55 pm

SQLite Internals: Pages & B-trees (via) Ben Johnson provides a delightfully clear introduction to SQLite internals, describing the binary format used to store rows on disk and how SQLite uses 4KB pages for both row storage and for the b-trees used to look up records. # 27th July 2022, 2:57 pm

Cosmopolitan: Compiling Python. Cosmopolitan is Justine Tunney’s “build-once run-anywhere C library”—part of the αcτµαlly pδrταblε εxεcµταblε effort, which produces wildly clever binary executable files that work on multiple different platforms, and is the secret sauce behind redbean. I hadn’t realized this was happening but there’s an active project to get Python to work as this format, producing a new way of running Python applications as standalone executables, only these ones have the potential to run unmodified on Windows, Linux and macOS. # 26th July 2022, 8:43 pm

viewport-preview (via) I built a tiny tool which lets you preview a URL in a bunch of different common browser viewport widths, using iframes. # 26th July 2022, 12 am

Reduce Friction. Outstanding essay on software engineering friction and development team productivity by C J Silverio: it explains the concept of “friction” (and gives great definitions of “process”, “ceremony” and “formality” in the process) as it applies to software engineering, lays out the challenges involved in getting organizations to commit to reducing it and then provides actionable advice on how to get consensus and where to invest your efforts in order to make things better. # 25th July 2022, 10:25 pm

Sqitch tutorial for SQLite (via) Sqitch is an interesting implementation of database migrations: it’s a command-line tool written in Perl with an interface similar to Git, providing commands to create, run, revert and track migration scripts. The scripts the selves are written as SQL in whichever database engine you are using. The tutorial for SQLite gives a good idea as to how the whole system works. # 24th July 2022, 11:44 pm

You should take more screenshots (via) Alex Chan suggests saving screenshots of your work, since they may well last a lot longer than the projects themselves. I try to do that these days but I have SO many projects from the past that I didn’t capture in this way, and that I really regret not keeping a better visual record of. # 24th July 2022, 9:03 pm

I discovered a while ago that all those errors and bugs that only appear when you demo something to an audience also magically appear when you record yourself demoing it to nobody. Maybe narrating a feature to a pretend audience takes the blinders off enough that you notice little mistakes you wouldn’t have otherwise.

karaterobot # 24th July 2022, 8:59 pm

Promise Maps. Egbert Teeselink describes a neat JavaScript caching pattern: instead of caching key:value cache key:promise-that-resolves-to-value—doing this gives you dog piling prevention for free, because the first lookup of a value trigers the computation to fetch it while subsequent lookups wait on the same promise to resolve—or resolve instantly if the computation has completed. # 22nd July 2022, 3:52 pm

How John Wiseman tracks worldwide GPS interference (via) Part of the ADS-B signals broadcast by commercial aircraft include a measure of GPS accuracy. By collecting global data every day, John is able to generate a map of areas that are experiencing higher than expected GPS interference, which generally corresponds to military jamming technology. He sometimes posts the resulting maps on Twitter—he just picked up increasing jamming activity around Moscow. # 20th July 2022, 11:45 pm

Weeknotes: Datasette, sqlite-utils, Datasette Desktop

A flurry of releases this week, including a new Datasette alpha and a fixed Datasette Desktop.

[... 1113 words]

Visual Studio Code: Development Process (via) A detailed description of the development process used by VS Code: a 6-12 month high level roadmap, then month long iterations that each result in a new version that is shipped to users. Includes details of how the four weeks of each iteration are spent too. # 20th July 2022, 4:34 pm

The Checkered Flag Diagram for visualizing SQL joins. I really like this alternative to Venn diagrams for showing the difference between different types of SQL join (left join, right join, cross join etc). # 20th July 2022, 1:16 pm

Soft Deletion Probably Isn’t Worth It. Brandur argues that soft deletion—where you delete records by populating a “is_deleted” or “deleted_at” column in your table—isn’t worth the additional complexity and risk it adds to other database queries. Instead, he suggests having a separate deleted records table which records the deleted data in a JSON blob—allowing you to review and recover it manually if necessary, and giving you an easy way to expire deleted records that have exceeded your retention policy. # 19th July 2022, 8:40 pm

Datasette Discord community (via) I started a Discord chat community for Datasette. 57 people have joined up already! # 15th July 2022, 3:17 am

The DALL·E 2 Prompt Book (via) This is effectively DALL-E: The Missing Manual: an 81 page PDF book that goes into exhaustive detail about how to get the most out of DALL-E through creative prompt design. # 14th July 2022, 11:26 pm

Bringing page transitions to the web (via) Jake Archibald’s 13 minute Google I/O talk demonstrating the page transitions API that’s now available in Chrome Canary. This is a fascinating piece of API design—it works by effectively creating a static image screenshot of the before and after states of the transition, then letting you define CSS animations that animate a transition between the two static images. By default the screenshot encompasses the full viewport, but you can instead define multiple elements within the page and apply separate transitions to them. It’s only available for SPAs right now but the final design should include support for multi-page applications as well—which means transitions with no JavaScript needed at all! # 13th July 2022, 4:26 pm

sqlite-comprehend: run AWS entity extraction against content in a SQLite database

I built a new tool this week: sqlite-comprehend, which passes text from a SQLite database through the AWS Comprehend entity extraction service and stores the returned entities.

[... 1146 words]

GPT-3 prompt for spotting nonsense questions (via) In response to complaints that GPT-3 will happily provide realistic sounding answers to nonsense questions, rictic recommends the following prompt: “I’ll ask a series of questions. If the questions are nonsense, answer ”yo be real“, if they’re a question about something that actually happened, answer them.” # 10th July 2022, 4:33 am

Using GPT-3 to explain how code works

One of my favourite uses for the GPT-3 AI language model is generating explanations of how code works. It’s shockingly effective at this: its training set clearly include a vast amount of source code.

[... 1983 words]

SOC2 is about the security of the company, not the company’s products. A SOC2 audit would tell you something about whether the customer support team could pop a shell on production machines; it wouldn’t tell you anything about whether an attacker could pop a shell with a SQL Injection vulnerability.

Thomas Ptacek # 7th July 2022, 8:31 pm

How to Temporarily Disable Face ID or Touch ID, and Require a Passcode to Unlock Your iPhone or iPad. Hold down the power and volume up buttons for a couple of seconds, and your iPhone will no longer allow you to use FaceID to unlock it without first entering your passcode. # 6th July 2022, 5:38 pm

Helpful 404s for Jekyll (and GitHub Pages). Neat trick from Ben Balter: JavaScript that runs on your 404 page, fetches the sitemap.xml, parses all of the URLs out of it and then uses a levenshtein edit-distance comparison to find the closest URL to the one that you landed on and suggests that as a “Perhaps you’re looking for?”. # 6th July 2022, 5:31 pm

Bun. “Bun is a fast all-in-one JavaScript runtime”—this is very interesting. It’s the first project I’ve seen written using the Zig language, which I see as somewhat equivalent to Rust. Bun provides a full Node.js-style JavaScript environment plus a host of packaged tools—an npm install client, a TypeScript transpiler, bundling tools—all wrapped up in a single binary. The JavaScript engine itself extends JavaScriptCore. Bun also ships with its own wrapper for SQLite. # 6th July 2022, 5:24 pm