Simon Willison’s Weblog

Subscribe

Items in Sep, 2020

Filters: Year: 2020 × Month: Sep × Sorted by date


I was wrong. CRDTs are the future (via) Joseph Gentle has been working on collaborative editors since being a developer on Google Wave back in 2010, later building ShareJS. He’s used Operational Transforms throughout, due to their performance and memory benefits over CRDTs (Conflict-free replicated data types)—but the latest work in that space from Martin Kleppmann and other researchers has seen him finally switch allegiance to these newer algorithms. As a long-time fan of collaborative editing (ever since the Hydra/SubEthaEdit days) I thoroughly enjoyed this as an update on how things have evolved over the past decade. # 28th September 2020, 9:03 pm

elite-source.asm—annotated source code for Elite on the BBC Micro (via) Mark Moxon has annotated every single line of the source code for Elite on the BBC Micro, and his annotations are so clear and in-depth that I can follow it despite knowing next to nothing about assembly code (and certainly nothing about writing it for the BBC). # 28th September 2020, 2:30 am

datasette-dateutil (via) New Datasette plugin exposing date/time parsing custom SQL functions powered by the classic dateutil Python library. # 28th September 2020, 12:33 am

Inevitably we got round to talking about async.

As much of an unneeded complication as it is for so many day-to-day use-cases, it’s important for Python because, if and when you do need the high throughput handling of these io-bound use-cases, you don’t want to have to switch language.

The same for Django: most of what you’re doing has no need of async but you don’t want to have to change web framework just because you need a sprinkling of non-blocking IO.

Carlton Gibson # 27th September 2020, 3:09 pm

The Bias-for-Building Fallacy is most common in orgs that worship speed. That’s fine, but if you go speedily in the wrong direction, you will end up in the wrong place. That’s why teams should value velocity much more than speed: velocity being a combo of speed & direction.

Shreyas Doshi # 26th September 2020, 2:07 pm

Weeknotes: software carpentry, compiling modules for SQLite

This week I completed the Software Carpentry instructor training course, added two foundational features to sqlite-utils and learned how to compile modules for SQLite.

[... 805 words]

Refactoring databases with sqlite-utils extract

Yesterday I described the new sqlite-utils transform mechanism for applying SQLite table transformations that go beyond those supported by ALTER TABLE. The other new feature in sqlite-utils 2.20 builds on that capability to allow you to refactor a database table by extracting columns into separate tables. I’ve called it sqlite-utils extract.

[... 1345 words]

Executing advanced ALTER TABLE operations in SQLite

SQLite’s ALTER TABLE has some significant limitations: it can’t drop columns (UPDATE: that was fixed in SQLite 3.35.0 in March 2021), it can’t alter NOT NULL status, it can’t change column types. Since I spend a lot of time with SQLite these days I’ve written some code to fix this—both from Python and as a command-line utility.

[... 689 words]

One academic who interviewed attendees of a flat-earth convention found that, almost to a person, they’d discovered the subculture via YouTube recommendations.

YouTube’s Plot to Silence Conspiracy Theories # 20th September 2020, 1:27 am

DuckDB (via) This is a really interesting, relatively new database. It’s kind of a weird hybrid between SQLite and PostgreSQL: it uses the PostgreSQL parser but models itself after SQLite in that databases are a single file and the code is designed for use as an embedded library, distributed in a single amalgamation C++ file (SQLite uses a C amalgamation). It features a “columnar-vectorized query execution engine” inspired by MonetDB (also by the DuckDB authors) and is hence designed to run analytical queries really quickly. You can install it using “pip install duckdb”—the resulting module feels similar to Python’s sqlite3, and follows roughly the same DBAPI pattern. # 19th September 2020, 11:43 pm

Weeknotes: datasette-seaborn, fivethirtyeight-polls

This week I released Datasette 0.49 and tinkered with datasette-seaborn, dogsheep-beta and polling data from FiveThirtyEight.

[... 951 words]

Array programming with NumPy—the NumPy paper (via) The NumPy paper is out, published in Nature. I found this enlightening: for an academic paper it’s very understandable, and it filled in quite a few gaps in my mental model of what NumPy is and which problems it addresses, as well as its relationship to the many other tools in the scientific Python stack. # 17th September 2020, 4:34 pm

Datasette 0.49: The annotated release notes

Datasette 0.49 is out. Some notes on what’s new.

[... 1234 words]

A manager on Strategic Response mused to myself that most of the world outside the West was effectively the Wild West with myself as the part-time dictator – he meant the statement as a compliment, but it illustrated the immense pressures upon me.

Sophie Zhang # 15th September 2020, 9:21 pm

“I Have Blood on My Hands”: A Whistleblower Says Facebook Ignored Global Political Manipulation (via) Sophie Zhang worked as the data scientist for the Facebook Site Integrity fake engagement team. She gave up her severance package in order to speak out internally about what she saw there, and someone leaked her memo to BuzzFeed News. It’s a hell of a story: she saw bots and coordinated manual accounts used to influence politics in countries all around the world, and found herself constantly making moderation decisions that had lasting political impact. “With no oversight whatsoever, I was left in a situation where I was trusted with immense influence in my spare time". This sounds like a nightmare—imagine taking on responsibility for protecting democracy in so many different places. # 15th September 2020, 9:11 pm

Weeknotes: datasette-dump, sqlite-backup, talks

I spent some time this week digging into Python’s sqlite3 internals. I also gave two talks and recorded a third, due to air at PyGotham in October.

[... 928 words]

Stories of reaching Staff-plus engineering roles (via) Extremely useful collection of career stories from staff-level engineers at a variety of different companies, collected by Will Larson. # 11th September 2020, 3:30 am

15 rules for blogging, and my current streak (via) Matt Webb is on a 24 week streak of blogging multiple posts a week and shares his rules on how he’s doing this. These are really good rules. A rule of thumb that has helped me a lot is to fight back against the temptation to make a post as good as I can before I publish it— because that way lies a giant drafts folder and no actual published content. “Perfect is the enemy of shipped”. # 10th September 2020, 6:09 pm

AVIF has landed. AVIF support landed in Chrome 85 a few weeks ago. It’s a new lossy royalty-free image format derived from AV1 video and it’s really impressive—it can achieve similar results to JPEG using a quarter of the file size! Jake digs into AVIF in detail, providing lots of illustrative examples created using the Squoosh online compressor, which now supports AVIF encoding. Jake used the same WebAssembly encoder from Squoosh to decode AVIF images in a web worker so that the demos in his article would work even for browsers that don’t yet support AVIF natively. # 9th September 2020, 4:49 pm

Weeknotes: airtable-export, generating screenshots in GitHub Actions, Dogsheep!

This week I figured out how to populate Datasette from Airtable, wrote code to generate social media preview card page screenshots using Puppeteer, and made a big breakthrough with my Dogsheep project.

[... 1461 words]

Render Markdown tool (via) I wrote a quick JavaScript tool for rendering Markdown via the GitHub Markdown API—which includes all of their clever extensions like tables and syntax highlighting—and then stripping out some extraneous HTML to give me back the format I like using for my blog posts. # 3rd September 2020, 12:08 am

The “await me maybe” pattern for Python asyncio

I’ve identified a pattern for handling potentially-asynchronous callback functions in Python which I’m calling the “await me maybe” pattern. It works by letting you return a value, a callable function that returns a value OR an awaitable function that returns that value.

[... 787 words]

Simply put, if you’re in a position of power at work, you’re unlikely to see workplace harassment in front of you. That’s because harassment and bullying are attempts to exert power over people with less of it. People who behave improperly don’t tend to do so with people they perceive as having power already.

Sarah Milstein # 1st September 2020, 3:10 pm