Simon Willison’s Weblog

Subscribe

Items in Aug, 2021

Filters: Year: 2021 × Month: Aug × Sorted by date


Unshipped work is inventory and it costs you money as it spoils

Avery Pennarun # 30th August 2021, 5:46 am

Building a desktop application for Datasette (and weeknotes)

This week I started experimenting with a desktop application version of Datasette—with the goal of providing people who aren’t comfortable with the command-line the ability to get Datasette up and running on their own personal computers.

[... 1423 words]

Dynamic content for GitHub repository templates using cookiecutter and GitHub Actions

GitHub repository templates were introduced a couple of years ago to provide a mechanism for creating a brand new GitHub repository starting with an initial set of files.

[... 1413 words]

Allo shows the ultimate failure of Google’s Minimum Viable Product strategy. MVP works when you have almost no competition, or if you are taking a radically different approach to what’s on the market, but it completely falls on its face when you are just straight-up cloning an established competitor. There’s no reason to use a half-baked WhatsApp clone when regular WhatsApp exists.

Ron Amadeo # 25th August 2021, 10:28 pm

API Tokens: A Tedious Survey. Thomas Ptacek reviews different approaches to implementing secure API tokens, from simple random strings stored in a database through various categories of signed token to exotic formats like Macaroons and Biscuits, both new to me.

Macaroons carry a signed list of restrictions with them, but combine it with a mechanism where a client can add their own additional restrictions, sign the combination and pass the token on to someone else.

Biscuits are similar, but “embed Datalog programs to evaluate whether a token allows an operation”. # 25th August 2021, 12:12 am

SQLModel. A new project by FastAPI creator Sebastián Ramírez: SQLModel builds on top of both SQLAlchemy and Sebastián’s Pydantic validation library to provide a new ORM that’s designed around Python 3’s optional typing. The real brilliance here is that a SQLModel subclass is simultaneously a valid SQLAlchemy ORM model AND a valid Pydantic validation model, saving on duplicate code by allowing the same class to be used both for form/API validation and for interacting with the database. # 24th August 2021, 11:16 pm

How Discord Stores Billions of Messages (via) Fascinating article from 2017 describing how Discord migrated their primary message store to Cassandra (from MongoDB, but I could easily see them making the same decision if they had started with PostgreSQL or MySQL). The trick with scalable NoSQL databases like Cassandra is that you need to have a very deep understanding of the kinds of queries you will need to answer—and Discord had exactly that. In the article they talk about their desire to eventually migrate to Scylla (a compatible Cassandra alternative written in C++)—in the Hacker News comments they confirm that in 2021 they are using Scylla for a few things but they still have their core messages in Cassandra. # 24th August 2021, 9:31 pm

It’s perhaps a very English thing to find it hard to accept kind words about oneself. If anyone praised me in my early days as a comedy performer I would say, “Oh, nonsense. Shut up. No really, I was dreadful.” I remember going through this red-faced shuffle in the presence of the mighty John Cleese who upbraided me the moment we were alone. ‘You genuinely think you’re being polite and modest, don’t you?’ ‘Well, you know …’ ‘Don’t you see that when someone hears their compliments contradicted they naturally assume that you must think them a fool? [..] ‘It’s so simple. You just say thank you. You just thank them. How hard is that?’

Stephen Fry # 23rd August 2021, 10:33 pm

At critical moments in time, you can raise the aspirations of other people significantly, especially when they are relatively young, simply by suggesting they do something better or more ambitious than what they might have in mind.  It costs you relatively little to do this, but the benefit to them, and to the broader world, may be enormous.

Tyler Cowen # 23rd August 2021, 8:02 pm

The rapid increase of COVID-19 cases among kids has shattered last year’s oft-repeated falsehood that kids don’t get COVID-19, and if they do, it’s not that bad. It was a convenient lie that was easy to believe in part because we kept most of our kids home. With remote learning not an option now, this year we’ll find out how dangerous this virus is for children in the worst way possible.

Dan Sinker # 23rd August 2021, 1:59 am

MDN: Subdomain takeovers (via) MDN have a page about subdomain takeover attacks that focuses more on CNAME records: if you have a CNAME pointing to a common delegated hosting provider but haven’t yet provisioned your virtual host there, someone else might beat you to it and use it for an XSS attack.

“Preventing subdomain takeovers is a matter of order of operations in lifecycle management for virtual hosts and DNS.”

I now understand why Google Cloud make your “prove” your ownership of a domain before they’ll let you configure it to host e.g. a Cloud Run instance. # 22nd August 2021, 5:31 am

I stumbled across a nasty XSS hole involving DNS A records. Found out today that an old subdomain that I had assigned an IP address to via a DNS A record was serving unexpected content—turned out I’d shut down the associated VPS and the IP had been recycled to someone else, so their content was now appearing under my domain. It strikes me that if you got really unlucky this could turn into an XSS hole—and that new server could even use Let’s Encrypt to obtain an HTTPS certificate for your subdomain.

I’ve added “audit your A records” to my personal security checklist. # 22nd August 2021, 5:27 am

Weeknotes: Getting my personal Dogsheep up and running again

I gave a talk about Dogsheep at Noisebridge’s Five Minutes of Fame on Thursday. Just one problem: my regular Dogsheep demo was broken, so I ended up building it from scratch again. In doing so I fixed a few bugs in some Dogsheep tools.

[... 1311 words]

The Diátaxis documentation framework. Daniele Procida’s model of four types of technical documentation—tutorials, how-to guides, technical reference and explanation—now has a name: Diátaxis. # 21st August 2021, 10:59 pm

SQLite: STRICT Tables (draft). Draft documentation for a feature that sounds like it could be arriving in SQLite 3.37 (the next release)—adding a “STRICT” table-option keyword to a CREATE TABLE statement will cause the table to strictly enforce typing rules for data in that table, rejecting inserts that fail to match the column’s datatypes.

I’ve seen many programmers dismiss SQLite due to its loose typing, so this feature is really exciting to me: it will hopefully remove a common objection to embracing SQLite for projects. # 21st August 2021, 7:05 pm

lex.go in json5-go. This archived GitHub repository has a beautifully clean and clear example of a hand-written lexer in Go, for the JSON5 format (JSON + comments + multi-line strings). parser.go is worth a look too. # 19th August 2021, 8:15 pm

The way you motivate someone who doesn’t need the money is the same way you should motivate people who do need the money: by giving them meaningful roles with real responsibility where they can see how their efforts contribute to a larger whole, giving them an appropriate amount of ownership over their work and input into decisions that involve that work, providing useful feedback, recognizing their contributions, helping them feel they’re making progress toward things that matter to them, and — importantly — not doing things that de-motivate people (like yelling or constantly shifting goals or generally being a jerk).

Alison Green (Ask a Manager) # 17th August 2021, 11:01 pm

Product Hunt Engineering Principles (via) Product Hunt implement “Collaborative Single Player Mode”, which they define as “A developer should be able to execute a feature from start to finish -- from the database to the backend, API, frontend, and CSS. The goal is never to get blocked.” I’ve encountered this principle applied to teams before (which I really like) but not for individual developers, which I imagine is more likely to work well for smaller organizations. Intriguing approach.

They also practice trunk driven development with feature flags: “Always start a feature with a feature flag and try to get something to production on day 1.”

And “If a product decision is missing, try to make this decision yourself—it’s better to ask for forgiveness rather than permission.” # 16th August 2021, 3:35 pm

There’s three ways to handle work assigned to you. If you say you’ll do it, do it. If you say you can’t, that’s ok. But if you sign up for work and drop the ball, the team fails. Learn to say no.

Chris Jones, original Internet Explorer team # 14th August 2021, 8:16 pm

Datasette on Codespaces, sqlite-utils API reference documentation and other weeknotes

This week I broke my streak of not sending out the Datasette newsletter, figured out how to use Sphinx for Python class documentation, worked out how to run Datasette on GitHub Codespaces, implemented Datasette column metadata and got tantalizingly close to a solution for an elusive Datasette feature.

[... 2164 words]

Re-assessing the automatic charset decoding policy in HTTPX (via) Tom Christie ran an analysis of the top 1,000 most accessed websites (according to an older extract from Google’s Ad Planner service) and found that a full 5% of them both omitted a charset parameter and failed to decode as UTF-8. As a result, HTTPX will be depending on the charset-normalizer Python library to handle those cases. # 13th August 2021, 10:07 pm

GitHub’s Engineering Team has moved to Codespaces. My absolute dream development environment is one where I can spin up a new, working development environment in seconds—to try something new on a branch, or because I broke something and don’t want to spend time figuring out how to fix it. This article from GitHub explains how they got there: from a half-day setup to a 45 minute bootstrap in a codespace, then to five minutes through shallow cloning and a nightly pre-built Docker image and finally to 10 seconds be setting up “pools of codespaces, fully cloned and bootstrapped, waiting to be connected with a developer who wants to get to work”. # 11th August 2021, 4:53 pm

Adding Sphinx autodoc to a project, and configuring Read The Docs to build it. My TIL notes from figuring out how to use sphinx-autodoc for the sqlite-utils reference documentation today. # 11th August 2021, 1:21 am

sqlite-utils API reference (via) I released sqlite-utils 3.15.1 today with just one change, but it’s a big one: I’ve added docstrings and type annotations to nearly every method in the library, and I’ve started using sphinx-autodoc to generate an API reference page in the documentation directly from those docstrings. I’ve deliberately avoided building this kind of documentation in the past because I so often see projects where the class reference is the ONLY documentation, which I find makes it really hard to figure out how to actually use it. sqlite-utils already has extensive narrative prose documentation so in this case I think it’s a useful enhancement—especially since the docstrings and type hints can help improve the usability of the library in IDEs and Jupyter notebooks. # 11th August 2021, 1:03 am

Bare columns in an aggregate queries. This is a really nice SQL tweak implemented in SQLite: If you run a query like “SELECT a, b, max(c) FROM tab1 GROUP BY a” SQLite will find the row with the highest value for c and use the columns of that row as the returned values for the other columns mentioned in the query. # 10th August 2021, 1:29 am

Everything new in Datasette since January, plus Django SQL Dashboard. I sent out the first Datasette newsletter since late January this year, covering everything that’s new in Datasette and sqlite-utils this year and introducing Django SQL Dashboard. # 10th August 2021, 1:28 am

The World of CSS Transforms. Comprehensive, clearly explained tutorial on CSS transforms by Josh W. Comeau, with some very neat interactive demos. I hadn’t understood how useful it is that the translate() transform treats percentages as applying to the dimensions of the element being transformed, not its parent. This means you can use expressions like transform: translateX(calc(100% + 4px)); to shift an element by its entire width plus a few more pixels. # 9th August 2021, 2:30 pm

Stanford School Enrollment Project (via) This is Project Pelican: I’ve been working with the Big Local News team at Stanford helping bundle up and release the data they’ve been collecting on school enrollment statistics around the USA. This Datasette instance has data from 33 states for every year since 2015—3.3m rows total. Be sure to check out the accompanying documentation! # 8th August 2021, 12:23 am

Running GitHub on Rails 6.0. Back in 2019 Eileen M. Uchitelle explained how GitHub upgraded everything in production to Rails 6.0 within 1.5 weeks of the stable release. There’s a trick in here I really like: they have an automated weekly job which fetches the latest Rails main branch and runs the full GitHub test suite against it, giving them super-early warnings about anything that might break and letting them provide feedback to upstream about unintended regressions. # 6th August 2021, 4:30 pm

The thing about semver major version numbers are that they don’t mean new stuff, they’re a permanent reminder of how many times you got the API wrong. Semver doesn’t mean MAJOR.MINOR.PATCH, it means FAILS.FEATURES.BUGS

Will McGugan # 6th August 2021, 4:17 pm