Simon Willison’s Weblog

Subscribe

August 2021

Aug. 2, 2021

OkCupid had a CSRF vulnerability (via) Good write-up of a (now fixed) CSRF vulnerability on OkCupid. Their site worked by POSTing JSON objects to an API. JSON POSTs are usually protected against CSRF because they can only be sent using fetch() or XMLHttpRequest, which are protected by the same-origin policy. Yan Zhu notes that you can use the enctype="text/plain" attribute on a form (introduced in HTML5) and a crafty hidden input element with name='{"foo":"' value='bar"}' to construct JSON in an off-site form, which enabled CSRF attacks.

# 10:12 pm / csrf, security

Aug. 3, 2021

Exploring the SameSite cookie attribute for preventing CSRF

Visit Exploring the SameSite cookie attribute for preventing CSRF

In reading Yan Zhu’s excellent write-up of the JSON CSRF vulnerability she found in OkCupid one thing puzzled me: I was under the impression that browsers these days default to treating cookies as SameSite=Lax, so I would expect attacks like the one Yan described not to work in modern browsers.

[... 2,198 words]

Aug. 6, 2021

Apply conversion functions to data in SQLite columns with the sqlite-utils CLI tool

Visit Apply conversion functions to data in SQLite columns with the sqlite-utils CLI tool

Earlier this week I released sqlite-utils 3.14 with a powerful new command-line tool: sqlite-utils convert, which applies a conversion function to data stored in a SQLite column.

[... 1,941 words]

Breaking Changes to the Web Platform (via) “Over the years there have been necessary changes to the web platform that caused legacy websites to break.”—this list is thankfully very short, only 11 items so far. Let’s hope it stays that way!

# 6:32 am / web

The thing about semver major version numbers are that they don't mean new stuff, they're a permanent reminder of how many times you got the API wrong. Semver doesn't mean MAJOR.MINOR.PATCH, it means FAILS.FEATURES.BUGS

Will McGugan

# 4:17 pm / versioning, semver, will-mcgugan

Running GitHub on Rails 6.0. Back in 2019 Eileen M. Uchitelle explained how GitHub upgraded everything in production to Rails 6.0 within 1.5 weeks of the stable release. There’s a trick in here I really like: they have an automated weekly job which fetches the latest Rails main branch and runs the full GitHub test suite against it, giving them super-early warnings about anything that might break and letting them provide feedback to upstream about unintended regressions.

# 4:30 pm / rails, continuous-integration, github

Aug. 8, 2021

Stanford School Enrollment Project (via) This is Project Pelican: I’ve been working with the Big Local News team at Stanford helping bundle up and release the data they’ve been collecting on school enrollment statistics around the USA. This Datasette instance has data from 33 states for every year since 2015—3.3m rows total. Be sure to check out the accompanying documentation!

# 12:23 am / journalism, datasette

Aug. 9, 2021

The World of CSS Transforms. Comprehensive, clearly explained tutorial on CSS transforms by Josh W. Comeau, with some very neat interactive demos. I hadn’t understood how useful it is that the translate() transform treats percentages as applying to the dimensions of the element being transformed, not its parent. This means you can use expressions like transform: translateX(calc(100% + 4px)); to shift an element by its entire width plus a few more pixels.

# 2:30 pm / css, josh-comeau

Aug. 10, 2021

Everything new in Datasette since January, plus Django SQL Dashboard. I sent out the first Datasette newsletter since late January this year, covering everything that’s new in Datasette and sqlite-utils this year and introducing Django SQL Dashboard.

# 1:28 am / sqlite-utils, django-sql-dashboard, datasette

Bare columns in an aggregate queries. This is a really nice SQL tweak implemented in SQLite: If you run a query like “SELECT a, b, max(c) FROM tab1 GROUP BY a” SQLite will find the row with the highest value for c and use the columns of that row as the returned values for the other columns mentioned in the query.

# 1:29 am / sql, sqlite

Aug. 11, 2021

sqlite-utils API reference (via) I released sqlite-utils 3.15.1 today with just one change, but it’s a big one: I’ve added docstrings and type annotations to nearly every method in the library, and I’ve started using sphinx-autodoc to generate an API reference page in the documentation directly from those docstrings. I’ve deliberately avoided building this kind of documentation in the past because I so often see projects where the class reference is the ONLY documentation, which I find makes it really hard to figure out how to actually use it. sqlite-utils already has extensive narrative prose documentation so in this case I think it’s a useful enhancement—especially since the docstrings and type hints can help improve the usability of the library in IDEs and Jupyter notebooks.

# 1:03 am / sqlite-utils, documentation, python, sphinx-docs

Adding Sphinx autodoc to a project, and configuring Read The Docs to build it. My TIL notes from figuring out how to use sphinx-autodoc for the sqlite-utils reference documentation today.

# 1:21 am / sqlite-utils, documentation, sphinx-docs, read-the-docs

GitHub’s Engineering Team has moved to Codespaces. My absolute dream development environment is one where I can spin up a new, working development environment in seconds—to try something new on a branch, or because I broke something and don’t want to spend time figuring out how to fix it. This article from GitHub explains how they got there: from a half-day setup to a 45 minute bootstrap in a codespace, then to five minutes through shallow cloning and a nightly pre-built Docker image and finally to 10 seconds be setting up “pools of codespaces, fully cloned and bootstrapped, waiting to be connected with a developer who wants to get to work”.

# 4:53 pm / docker, github, github-codespaces

Aug. 13, 2021

Re-assessing the automatic charset decoding policy in HTTPX (via) Tom Christie ran an analysis of the top 1,000 most accessed websites (according to an older extract from Google’s Ad Planner service) and found that a full 5% of them both omitted a charset parameter and failed to decode as UTF-8. As a result, HTTPX will be depending on the charset-normalizer Python library to handle those cases.

# 10:07 pm / unicode, tom-christie, httpx

Aug. 14, 2021

Datasette on Codespaces, sqlite-utils API reference documentation and other weeknotes

Visit Datasette on Codespaces, sqlite-utils API reference documentation and other weeknotes

This week I broke my streak of not sending out the Datasette newsletter, figured out how to use Sphinx for Python class documentation, worked out how to run Datasette on GitHub Codespaces, implemented Datasette column metadata and got tantalizingly close to a solution for an elusive Datasette feature.

[... 2,164 words]

There’s three ways to handle work assigned to you. If you say you’ll do it, do it. If you say you can’t, that’s ok. But if you sign up for work and drop the ball, the team fails. Learn to say no.

Chris Jones, original Internet Explorer team

# 8:16 pm / management

Aug. 16, 2021

Product Hunt Engineering Principles (via) Product Hunt implement “Collaborative Single Player Mode”, which they define as “A developer should be able to execute a feature from start to finish -- from the database to the backend, API, frontend, and CSS. The goal is never to get blocked.” I’ve encountered this principle applied to teams before (which I really like) but not for individual developers, which I imagine is more likely to work well for smaller organizations. Intriguing approach.

They also practice trunk driven development with feature flags: “Always start a feature with a feature flag and try to get something to production on day 1.”

And “If a product decision is missing, try to make this decision yourself—it’s better to ask for forgiveness rather than permission.”

# 3:35 pm / management, feature-flags

Aug. 17, 2021

The way you motivate someone who doesn’t need the money is the same way you should motivate people who do need the money: by giving them meaningful roles with real responsibility where they can see how their efforts contribute to a larger whole, giving them an appropriate amount of ownership over their work and input into decisions that involve that work, providing useful feedback, recognizing their contributions, helping them feel they’re making progress toward things that matter to them, and — importantly — not doing things that de-motivate people (like yelling or constantly shifting goals or generally being a jerk).

Alison Green, Ask a Manager

# 11:01 pm / motivation, management

Aug. 19, 2021

lex.go in json5-go. This archived GitHub repository has a beautifully clean and clear example of a hand-written lexer in Go, for the JSON5 format (JSON + comments + multi-line strings). parser.go is worth a look too.

# 8:15 pm / compilers, go, json

Aug. 21, 2021

SQLite: STRICT Tables (draft). Draft documentation for a feature that sounds like it could be arriving in SQLite 3.37 (the next release)—adding a “STRICT” table-option keyword to a CREATE TABLE statement will cause the table to strictly enforce typing rules for data in that table, rejecting inserts that fail to match the column’s datatypes.

I’ve seen many programmers dismiss SQLite due to its loose typing, so this feature is really exciting to me: it will hopefully remove a common objection to embracing SQLite for projects.

# 7:05 pm / sqlite

The Diátaxis documentation framework. Daniele Procida’s model of four types of technical documentation—tutorials, how-to guides, technical reference and explanation—now has a name: Diátaxis.

# 10:59 pm / diataxis, documentation

Aug. 22, 2021

Weeknotes: Getting my personal Dogsheep up and running again

Visit Weeknotes: Getting my personal Dogsheep up and running again

I gave a talk about Dogsheep at Noisebridge’s Five Minutes of Fame on Thursday. Just one problem: my regular Dogsheep demo was broken, so I ended up building it from scratch again. In doing so I fixed a few bugs in some Dogsheep tools.

[... 1,311 words]

I stumbled across a nasty XSS hole involving DNS A records. Found out today that an old subdomain that I had assigned an IP address to via a DNS A record was serving unexpected content—turned out I’d shut down the associated VPS and the IP had been recycled to someone else, so their content was now appearing under my domain. It strikes me that if you got really unlucky this could turn into an XSS hole—and that new server could even use Let’s Encrypt to obtain an HTTPS certificate for your subdomain.

I’ve added “audit your A records” to my personal security checklist.

# 5:27 am / xss, dns, security

MDN: Subdomain takeovers (via) MDN have a page about subdomain takeover attacks that focuses more on CNAME records: if you have a CNAME pointing to a common delegated hosting provider but haven’t yet provisioned your virtual host there, someone else might beat you to it and use it for an XSS attack.

“Preventing subdomain takeovers is a matter of order of operations in lifecycle management for virtual hosts and DNS.”

I now understand why Google Cloud make your “prove” your ownership of a domain before they’ll let you configure it to host e.g. a Cloud Run instance.

# 5:31 am / dns, security

Aug. 23, 2021

The rapid increase of COVID-19 cases among kids has shattered last year’s oft-repeated falsehood that kids don’t get COVID-19, and if they do, it’s not that bad. It was a convenient lie that was easy to believe in part because we kept most of our kids home. With remote learning not an option now, this year we’ll find out how dangerous this virus is for children in the worst way possible.

Dan Sinker

# 1:59 am / covid19

At critical moments in time, you can raise the aspirations of other people significantly, especially when they are relatively young, simply by suggesting they do something better or more ambitious than what they might have in mind.  It costs you relatively little to do this, but the benefit to them, and to the broader world, may be enormous.

Tyler Cowen

# 8:02 pm / inspiring, management

It’s perhaps a very English thing to find it hard to accept kind words about oneself. If anyone praised me in my early days as a comedy performer I would say, “Oh, nonsense. Shut up. No really, I was dreadful.” I remember going through this red-faced shuffle in the presence of the mighty John Cleese who upbraided me the moment we were alone. ‘You genuinely think you’re being polite and modest, don’t you?’ ‘Well, you know …’ ‘Don’t you see that when someone hears their compliments contradicted they naturally assume that you must think them a fool? [..] ‘It’s so simple. You just say thank you. You just thank them. How hard is that?’

Stephen Fry

# 10:33 pm / communication, stephenfry

Aug. 24, 2021

How Discord Stores Billions of Messages (via) Fascinating article from 2017 describing how Discord migrated their primary message store to Cassandra (from MongoDB, but I could easily see them making the same decision if they had started with PostgreSQL or MySQL). The trick with scalable NoSQL databases like Cassandra is that you need to have a very deep understanding of the kinds of queries you will need to answer—and Discord had exactly that. In the article they talk about their desire to eventually migrate to Scylla (a compatible Cassandra alternative written in C++)—in the Hacker News comments they confirm that in 2021 they are using Scylla for a few things but they still have their core messages in Cassandra.

# 9:31 pm / discord, scaling, nosql, cassandra

SQLModel. A new project by FastAPI creator Sebastián Ramírez: SQLModel builds on top of both SQLAlchemy and Sebastián’s Pydantic validation library to provide a new ORM that’s designed around Python 3’s optional typing. The real brilliance here is that a SQLModel subclass is simultaneously a valid SQLAlchemy ORM model AND a valid Pydantic validation model, saving on duplicate code by allowing the same class to be used both for form/API validation and for interacting with the database.

# 11:16 pm / orm, sql, sqlalchemy, python

Aug. 25, 2021

API Tokens: A Tedious Survey. Thomas Ptacek reviews different approaches to implementing secure API tokens, from simple random strings stored in a database through various categories of signed token to exotic formats like Macaroons and Biscuits, both new to me.

Macaroons carry a signed list of restrictions with them, but combine it with a mechanism where a client can add their own additional restrictions, sign the combination and pass the token on to someone else.

Biscuits are similar, but “embed Datalog programs to evaluate whether a token allows an operation”.

# 12:12 am / fly, thomas-ptacek, apis, security

2021 » August

MTWTFSS
      1
2345678
9101112131415
16171819202122
23242526272829
3031