Simon Willison’s Weblog

Subscribe

May 2022

49 posts: 6 entries, 20 links, 23 beats

May 1, 2022

Release s3-credentials 0.11 — A tool for creating credentials for accessing S3 buckets

May 2, 2022

Release sqlite-utils 3.26.1 — Python CLI utility and library for manipulating SQLite databases

sqlite-utils 3.26.1 (via) I released sqlite-utils 3.36.1 with one tiny but exciting feature: I fixed its one dependency that wasn’t published as a pure Python wheel, which means it can now be used with Pyodide—Python compiled to WebAssembly running in your browser!

# 6:43 pm / python, webassembly, sqlite-utils, pyodide

Release datasette 0.62a0 — An open source multi-tool for exploring and publishing data

May 3, 2022

Web Scraping via Javascript Runtime Heap Snapshots (via) This is an absolutely brilliant scraping trick. Adrian Cooney figured out a way to use Puppeteer and the Chrome DevTools protocol to take a heap snapshot of all of the JavaScript running on a web page, then recursively crawl through the heap looking for any JavaScript objects that have a specified selection of properties. This allows him to scrape data from arbitrarily complex client-side web applications. He built a JavaScript library and command line tool that implements the pattern.

# 12:51 am / javascript, scraping

Simple declarative schema migration for SQLite (via) This is an interesting, clearly explained approach to the database migration problem. Create a new in-memory database and apply the current schema, then run some code to compare that with the previous schema—which tables are new, and which tables have had columns added. Then apply those changes.

I’d normally be cautious of running something like this because I can think of ways it could go wrong—but SQLite backups are so quick and cheap (just copy the file) that I could see this being a relatively risk-free way to apply migrations.

# 6:07 pm / migrations, sqlite

May 4, 2022

Datasette Lite: a server-side Python web application running in a browser

Visit Datasette Lite: a server-side Python web application running in a browser

Datasette Lite is a new way to run Datasette: entirely in a browser, taking advantage of the incredible Pyodide project which provides Python compiled to WebAssembly plus a whole suite of useful extras.

[... 4,800 words]

TIL Setting up a custom subdomain for a GitHub Pages site — This is so much easier than I was expecting it to be.

SIARD: Software Independent Archiving of Relational Databases (via) I hadn’t heard of this before but it looks really interesting: the Federal Archives of Switzerland developed a standard for archiving any relational database as a zip file full of XML which is “is used in over 50 countries around the globe”.

# 10:40 pm / archives, databases, xml

May 6, 2022

Weeknotes: Datasette Lite, nogil Python, HYTRADBOI

My big project this week was Datasette Lite, a new way to run Datasette directly in a browser, powered by WebAssembly and Pyodide. I also continued my research into running SQL queries in parallel, described last week. Plus I spoke at HYTRADBOI.

[... 1,434 words]

May 7, 2022

Museum Paso Robles Pioneer Museum — 2010 Riverside Ave, Paso Robles, CA 93446

May 13, 2022

sqlite-utils: a nice way to import data into SQLite for analysis (via) Julia Evans on my sqlite-utils Python library and CLI tool.

# 6:17 pm / cli, sqlite, julia-evans, sqlite-utils

May 14, 2022

TIL Efficiently copying a file — **TLDR:** Use `shutil.copyfileobj(fsrc, fdst)`

May 15, 2022

Why Rust’s postfix await syntax is good (via) C J Silverio explains postfix await in Rust—where you can write a line like this, with the ? causing any errors to be caught and turned into an error return from your function:

let count = fetch_all_animals().await?.filter_for_hedgehogs().len();

# 2:27 pm / async, rust

How Materialize and other databases optimize SQL subqueries. Jamie Brandon offers a survey of the state-of-the-art in optimizing correlated subqueries, across a number of different database engines.

# 8:24 pm / sql

May 16, 2022

Heroku: Core Impact (via) Ex-Heroku engineer Brandur Leach pulls together some of the background information circulating concerning the now more than a month long Heroku security incident and provides some ex-insider commentary on what went right and what went wrong with a platform that left a huge, if somewhat underappreciated impact on the technology industry at large.

# 4:24 am / heroku, brandur-leach

Weeknotes: Camping, a road trip and two new museums

Visit Weeknotes: Camping, a road trip and two new museums

Natalie and I took a week-long road trip and camping holiday. The plan was to camp on Santa Rosa Island in the California Channel Islands, but the boat to the island was cancelled due to bad weather. We treated ourselves to a Central Californian road trip instead.

[... 872 words]

Release datasette-insert 0.7 — Datasette plugin for inserting and updating data

Supercharging GitHub Actions with Job Summaries (via) GitHub Actions workflows can now generate a rendered Markdown summary of, well, anything that you can think to generate as part of the workflow execution. I particularly like the way this is designed: they provide a filename in a $GITHUB_STEP_SUMMARY environment variable which you can then append data to from each of your steps.

# 11:02 pm / github-actions

May 17, 2022

Release datasette-upload-dbs 0.1 — Upload SQLite database files to Datasette
Release datasette-upload-dbs 0.1.1 — Upload SQLite database files to Datasette
TIL GitHub Actions job summaries — New feature [announced here](https://github.blog/2022-05-09-supercharging-github-actions-with-job-summaries/). Here's the [full documentation](https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#adding-a-job-summary).

simonw/datasette-screenshots (via) I started a new GitHub repository to automate taking screenshots of Datasette for marketing purposes, using my shot-scraper browser automation tool.

# 5:56 pm / projects, datasette, github-actions, shot-scraper

May 18, 2022

Comby (via) Describes itself as “Structural search and replace for any language”. Lets you execute search and replace patterns that look a little bit like simplified regular expressions, but with some deep OCaml-powered magic that makes them aware of comment, string and nested parenthesis rules for different languages. This means you can use it to construct scripts that automate common refactoring or code upgrade tasks.

# 5:47 am / parsing, refactoring, ocaml

May 19, 2022

TIL Optimizing PNGs in GitHub Actions using Oxipng — My [datasette-screenshots](https://github.com/simonw/datasette-screenshots) repository generates screenshots of [Datasette](https://datasette.io/) using my [shot-scraper](https://github.com/simonw/shot-scraper) tool, for people who need them for articles or similar.
Release shot-scraper 0.14 — A command-line utility for taking automated screenshots of websites

May 21, 2022

Release google-calendar-to-sqlite 0.1a0 — Create a SQLite database containing your data from Google Calendar

GOV.UK Guidance: Documenting APIs (via) Characteristically excellent guide from GOV.UK on writing great API documentation. “Task-based guidance helps users complete the most common integration tasks, based on the user needs from your research.”

# 11:31 pm / documentation, gov-uk

May 22, 2022

TIL Using the Fly Docker registry — [Fly.io](https://fly.io/) lets you deploy Docker containers that will be compiled as a Firecracker VM and run in locations around the world.

2022 » May

MTWTFSS
      1
2345678
9101112131415
16171819202122
23242526272829
3031