Simon Willison’s Weblog

Subscribe

Items in Nov, 2021

Filters: Year: 2021 × Month: Nov × Sorted by date


PHP 8.1 release notes (via) PHP is gaining “Fibers” for lightweight cooperative concurrency—very similar to Python asyncio. Interestingly you don’t need to use separate syntax like “await fn()” to call them—calls to non-blocking functions are visually indistinguishable from calls to blocking functions. Considering how much additional library complexity has emerged in Python world from having these two different colours of functions it’s noteworthy that PHP has chosen to go in a different direction here. # 25th November 2021, 7:53 pm

htmlspecialchars was a very early function. Back when PHP had less than 100 functions and the function hashing mechanism was strlen(). In order to get a nice hash distribution of function names across the various function name lengths names were picked specifically to make them fit into a specific length bucket. This was circa late 1994 when PHP was a tool just for my own personal use and I wasn’t too worried about not being able to remember the few function names.

Rasmus Lerdorf # 22nd November 2021, 7:23 pm

Introduction to heredocs in Dockerfiles (via) This is a fantastic upgrade to Dockerfile syntax, enabled by BuildKit and a new frontend for executing the Dockerfile that can be specified with a #syntax= directive. I often like to create a standalone Dockerfile that works without needing other files from a directory, so being able to use <<EOF syntax to populate configure files from inline blocks of code is really handy. # 22nd November 2021, 5:01 pm

Weeknotes: Apache proxies in Docker containers, refactoring Datasette

Updates to six major projects this week, plus finally some concrete progress towards Datasette 1.0.

[... 1630 words]

Hurl (via) Hurl is “a command line tool that runs HTTP requests defined in a simple plain text format”—written in Rust on top of curl, it lets you run HTTP requests and then execute assertions against the response, defined using JSONPath or XPath for HTML. It can even assert that responses were returned within a specified duration. # 22nd November 2021, 3:32 am

Many Web3 boost­ers see them­selves as disruptors, but “tokenize all the things” is noth­ing if not an obe­di­ent con­tin­u­a­tion of “market-ize all the things”, the cam­paign started in the 1970s, hugely suc­cessful, ongoing. I think the World Wide Web was the real rupture — “Where … is the money?”—which Web 2.0 smoothed over and Web3 now attempts to seal totally.

Robin Sloan # 18th November 2021, 9:55 pm

Cookiecutter Data Science (via) Some really solid thinking in this documentation for the DrivenData cookiecutter template. They emphasize designing data science projects for repeatability, such that just the src/ and data/ folders can be used to recreate all of the other analysis from scratch. I like the suggestion to give each project a dedicated S3 bucket for keeping immutable copies of the original raw data that might be too large for GitHub. # 18th November 2021, 3:21 pm

Weeknotes: git-history, created for a Git scraping workshop

My main project this week was a 90 minute workshop I delivered about Git scraping at Coda.Br 2021, a Brazilian data journalism conference, on Friday. This inspired the creation of a brand new tool, git-history, plus smaller improvements to a range of other projects.

[... 1239 words]

Datasette is four years old today. I marked the occasion with a short Twitter thread about the project so far. # 13th November 2021, 6:14 pm

One could never price a thirty year mortgage in bitcoin because its volatility makes it completely unpredictable and no sensible bank could calculate the risk of covering that debt. A world in which Elon Musk can tweet two emojis and your home depreciates 80% in value is a dystopia.

Stephen Diehl # 10th November 2021, 7:45 am

Deno Deploy Beta 3 (via) I missed Deno Deploy when it first came out back in June: it’s a really interesting new hosting environment for scripts written in Deno, Node.js creator Ryan Dahl’s re-imagining of Node.js. Deno Deploy runs your code using v8 isolates running in 28 regions worldwide, with a clever BroadcastChannel mechanism (inspired by the browser API of the same name) that allows instances of the server-side code running in different regions to send each other messages. See the “via” link for my annotated version of a demo by Ondřej Žára that got me excited about what it can do. # 7th November 2021, 1:51 am

AWS IAM definitions in Datasette (via) As part of my ongoing quest to conquer IAM permissions, I built myself a Datasette instance that lets me run queries against all 10,441 permissions across 280 AWS services. It’s deployed by a build script running in GitHub Actions which downloads a 8.9MB JSON file from the Salesforce policy_sentry repository—policy_sentry itself creates that JSON file by running an HTML scraper against the official AWS documentation! # 6th November 2021, 3:47 am

A half-hour to learn Rust. I haven’t tried to write any Rust yet but I occasionally find myself wanting to read it, and I find some of the syntax really difficult to get my head around. This article helped a lot: it provides a quick but thorough introduction to most of Rust’s syntax, with clearly explained snippet examples for each one. # 5th November 2021, 5:21 am

An oral history of Bank Python (via) Fascinating description of a very custom Python environment inside a large investment bank—where all of the code lives inside the Python environment itself, everything can be imported into the same process and a directed acyclic graph engine implements Excel-style reactive dependencies. Plenty of extra flavour from people who’ve worked with this (and related) Python systems in the Hacker News comments. # 5th November 2021, 5:18 am

Weeknotes: datasette-jupyterlite, s3-credentials and a Python packaging talk

My big project this week was s3-credentials, described yesterday—but I also put together a fun expermiental Datasette plugin bundling JupyterLite and wrote up my PyGotham talk on Python packaging.

[... 476 words]

How to build, test and publish an open source Python library

At PyGotham this year I presented a ten minute workshop on how to package up a new open source Python library and publish it to the Python Package Index. Here is the video and accompanying notes, which should make sense even without watching the talk.

[... 2055 words]

s3-credentials: a tool for creating credentials for S3 buckets

I’ve built a command-line tool called s3-credentials to solve a problem that’s been frustrating me for ages: how to quickly and easily create AWS credentials (an access key and secret key) that have permission to read or write from just a single S3 bucket.

[... 1618 words]