Simon Willison’s Weblog

Subscribe

Blogmarks

Filters: Sorted by date

The Diátaxis documentation framework. Daniele Procida’s model of four types of technical documentation—tutorials, how-to guides, technical reference and explanation—now has a name: Diátaxis.

# 21st August 2021, 10:59 pm / documentation, diataxis

SQLite: STRICT Tables (draft). Draft documentation for a feature that sounds like it could be arriving in SQLite 3.37 (the next release)—adding a “STRICT” table-option keyword to a CREATE TABLE statement will cause the table to strictly enforce typing rules for data in that table, rejecting inserts that fail to match the column’s datatypes.

I’ve seen many programmers dismiss SQLite due to its loose typing, so this feature is really exciting to me: it will hopefully remove a common objection to embracing SQLite for projects.

# 21st August 2021, 7:05 pm / sqlite

lex.go in json5-go. This archived GitHub repository has a beautifully clean and clear example of a hand-written lexer in Go, for the JSON5 format (JSON + comments + multi-line strings). parser.go is worth a look too.

# 19th August 2021, 8:15 pm / compilers, go, json

Product Hunt Engineering Principles (via) Product Hunt implement “Collaborative Single Player Mode”, which they define as “A developer should be able to execute a feature from start to finish -- from the database to the backend, API, frontend, and CSS. The goal is never to get blocked.” I’ve encountered this principle applied to teams before (which I really like) but not for individual developers, which I imagine is more likely to work well for smaller organizations. Intriguing approach.

They also practice trunk driven development with feature flags: “Always start a feature with a feature flag and try to get something to production on day 1.”

And “If a product decision is missing, try to make this decision yourself—it’s better to ask for forgiveness rather than permission.”

# 16th August 2021, 3:35 pm / management, feature-flags

Re-assessing the automatic charset decoding policy in HTTPX (via) Tom Christie ran an analysis of the top 1,000 most accessed websites (according to an older extract from Google’s Ad Planner service) and found that a full 5% of them both omitted a charset parameter and failed to decode as UTF-8. As a result, HTTPX will be depending on the charset-normalizer Python library to handle those cases.

# 13th August 2021, 10:07 pm / unicode, tom-christie, httpx

GitHub’s Engineering Team has moved to Codespaces. My absolute dream development environment is one where I can spin up a new, working development environment in seconds—to try something new on a branch, or because I broke something and don’t want to spend time figuring out how to fix it. This article from GitHub explains how they got there: from a half-day setup to a 45 minute bootstrap in a codespace, then to five minutes through shallow cloning and a nightly pre-built Docker image and finally to 10 seconds be setting up “pools of codespaces, fully cloned and bootstrapped, waiting to be connected with a developer who wants to get to work”.

# 11th August 2021, 4:53 pm / github, docker, github-codespaces

Adding Sphinx autodoc to a project, and configuring Read The Docs to build it. My TIL notes from figuring out how to use sphinx-autodoc for the sqlite-utils reference documentation today.

# 11th August 2021, 1:21 am / documentation, sqlite-utils, sphinx-docs, read-the-docs

sqlite-utils API reference (via) I released sqlite-utils 3.15.1 today with just one change, but it’s a big one: I’ve added docstrings and type annotations to nearly every method in the library, and I’ve started using sphinx-autodoc to generate an API reference page in the documentation directly from those docstrings. I’ve deliberately avoided building this kind of documentation in the past because I so often see projects where the class reference is the ONLY documentation, which I find makes it really hard to figure out how to actually use it. sqlite-utils already has extensive narrative prose documentation so in this case I think it’s a useful enhancement—especially since the docstrings and type hints can help improve the usability of the library in IDEs and Jupyter notebooks.

# 11th August 2021, 1:03 am / documentation, python, sqlite-utils, sphinx-docs

Bare columns in an aggregate queries. This is a really nice SQL tweak implemented in SQLite: If you run a query like “SELECT a, b, max(c) FROM tab1 GROUP BY a” SQLite will find the row with the highest value for c and use the columns of that row as the returned values for the other columns mentioned in the query.

# 10th August 2021, 1:29 am / sql, sqlite

Everything new in Datasette since January, plus Django SQL Dashboard. I sent out the first Datasette newsletter since late January this year, covering everything that’s new in Datasette and sqlite-utils this year and introducing Django SQL Dashboard.

# 10th August 2021, 1:28 am / datasette, sqlite-utils, django-sql-dashboard

The World of CSS Transforms. Comprehensive, clearly explained tutorial on CSS transforms by Josh W. Comeau, with some very neat interactive demos. I hadn’t understood how useful it is that the translate() transform treats percentages as applying to the dimensions of the element being transformed, not its parent. This means you can use expressions like transform: translateX(calc(100% + 4px)); to shift an element by its entire width plus a few more pixels.

# 9th August 2021, 2:30 pm / css, josh-comeau

Stanford School Enrollment Project (via) This is Project Pelican: I’ve been working with the Big Local News team at Stanford helping bundle up and release the data they’ve been collecting on school enrollment statistics around the USA. This Datasette instance has data from 33 states for every year since 2015—3.3m rows total. Be sure to check out the accompanying documentation!

# 8th August 2021, 12:23 am / data-journalism, journalism, datasette

Running GitHub on Rails 6.0. Back in 2019 Eileen M. Uchitelle explained how GitHub upgraded everything in production to Rails 6.0 within 1.5 weeks of the stable release. There’s a trick in here I really like: they have an automated weekly job which fetches the latest Rails main branch and runs the full GitHub test suite against it, giving them super-early warnings about anything that might break and letting them provide feedback to upstream about unintended regressions.

# 6th August 2021, 4:30 pm / continuous-integration, github, rails

Breaking Changes to the Web Platform (via) “Over the years there have been necessary changes to the web platform that caused legacy websites to break.”—this list is thankfully very short, only 11 items so far. Let’s hope it stays that way!

# 6th August 2021, 6:32 am / web

OkCupid had a CSRF vulnerability (via) Good write-up of a (now fixed) CSRF vulnerability on OkCupid. Their site worked by POSTing JSON objects to an API. JSON POSTs are usually protected against CSRF because they can only be sent using fetch() or XMLHttpRequest, which are protected by the same-origin policy. Yan Zhu notes that you can use the enctype="text/plain" attribute on a form (introduced in HTML5) and a crafty hidden input element with name='{"foo":"' value='bar"}' to construct JSON in an off-site form, which enabled CSRF attacks.

# 2nd August 2021, 10:12 pm / csrf, security

Clickhouse on Cloud Run (via) Alex Reid figured out how to run Clickhouse against read-only baked data on Cloud Run last year, and wrote up some comprehensive notes.

# 29th July 2021, 6:07 am / cloudrun, baked-data, clickhouse

How the Python import system works (via) Remarkably detailed and thorough dissection of how exactly import, modules and packages work in Python—eventually digging right down into the C code. Part of Victor Skvortsov’s excellent “Python behind the scenes” series.

# 24th July 2021, 8:12 pm / python

The Tyranny of Spreadsheets (via) In discussing the notorious Excel incident last year when the UK lost track of 16,000 Covid cases due to a .xls row limit, Tim Harford presents a history of the spreadsheet, dating all the way back to Francesco di Marco Datini and double-entry bookkeeping in 1396. A delightful piece of writing.

# 23rd July 2021, 3:57 am / history, spreadsheets, covid19

Launch HN Instructions (via) The instructions for YC companies that are posting their launch announcement on Hacker News are really interesting to read. “As founders, you’re used to talking to users, customers, and investors. HN readers are not any of those—what they are is peers, and using any of those styles with peers feels clueless and entitled. [...] To interest HN, write in a factual, personal, and modest way about what problem you solve, why it matters, how you solve it, and how you got there.”

# 19th July 2021, 1:05 am / hacker-news, marketing, y-combinator

toyDB: references. toyDB is a “distributed SQL database in Rust, written as a learning project”, with its own implementations of SQL, raft, ACID transactions, B+trees and more. toyDB author Erik Grinaker has assembled a detailed set of references that he used to learn how to build a database—I’d love to see more projects do this, it’s really useful.

# 19th July 2021, 12:18 am / databases, rust

Inserting One Billion Rows in SQLite Under a Minute (via) Avinash Sajjanshetty experiments with accelerating writes to a test table in SQLite, using various SQLite pragmas to accelerate inserts followed by a rewrite of Python code to Rust. Also of note: running the exact same code in PyPy saw a 3.5x speed-up!

# 19th July 2021, 12:13 am / pypy, sqlite, rust

Organize and Index Your Screenshots (OCR) on macOS (via) Alexandru Nedelcu has a very neat recipe for creating an archive of searchable screenshots on macOS: set the default save location for screenshots to a Dropbox folder, then create a launch agent that runs a script against new files in that folder to run tesseract OCR to convert them into a searchable PDF.

# 18th July 2021, 4:11 pm / macos, ocr

Datasette downloads per day (with Observable Plot) (via) I built an Observable notebook that imports PyPI package download data from datasette.io (itself scraped from pypistats.org using a scheduled GitHub Action) and plots it using Observable Plot. Datasette downloads from PyPI apparently jumped from ~800/day in May to ~4,000/day in July—would love to know why!

# 17th July 2021, 5:01 pm / pypi, datasette, observable, observable-plot

The Digital Antiquarian: Sam and Max Hit the Road. Delightful history and retrospective review of 1993’s Sam and Max Hit the Road. I didn’t know Sam and Max happened because the independent comic’s creator worked for LucasArts and the duo had embedded themselves in LucasArts culture through their use in the internal educational materials prepared for SCUMM University.

# 17th July 2021, 3:12 am / game-design, games, history

Last Mile Redis (via) Fly.io article about running a local redis cache in each of their geographic regions—“Cache data overlaps a lot less than you assume it will. For the most part, people in Singapore will rely on a different subset of data than people in Amsterdam or São Paulo or New Jersey.” But then they note that Redis has the ability to act as both a replica of a primary AND a writable server at the same time (“replica-read-only no”), which actually makes sense for a cache—it lets you cache local data but send out cluster-wide cache purges if necessary.

# 17th July 2021, 2:44 am / caching, redis, fly

The Untold Story of SQLite With Richard Hipp. This is a really interesting interview with SQLite creator D. Richard Hipp—it covers all sorts of aspects of the SQLite story I hadn’t heard before, from its inspiration by a software challenge on a battleship to the first income from clients such as AOL and Symbian to the formation of the SQLite Consortium (based on advice from Mozilla’s Mitchell Baker) and more.

# 16th July 2021, 8:12 pm / podcasts, sqlite, d-richard-hipp

Dropbox: Sharing our Engineering Career Framework with the world (via) Dropbox have published their engineering career framework, with detailed descriptions of the different levels of the engineering (as opposed to management) career track and what is expected for each one. I’m fascinated by how different companies handle the challenge of keeping career progression working for engineers without pushing them into people management, and this as a particularly detailed and well thought-out implementation of that.

# 13th July 2021, 11:31 pm / kellan-elliott-mccrea, careers, dropbox

RabbitMQ Streams Overview. New in RabbitMQ 3.9: streams are a persisted, replicated append-only log with non-destructive consuming semantics. Sounds like it fits the same hole as Kafka and Redis Streams, an extremely useful pattern.

# 13th July 2021, 11:29 pm / message-queues, rabbitmq, redis, kafka

Behind the scenes, AWS Lambda (via) Bruno Schaatsbergen pulled together details about how AWS Lambda works under the hood from a detailed review of the AWS documentation, the Firecracker paper and various talks at AWS re:Invent.

# 10th July 2021, 7:40 pm / aws, lambda, software-architecture, firecracker

The data team: a short story (via) Erik Bernhardsson’s fictional account (“I guess I should really call this a parable”) of a new data team leader successfully growing their team and building a data-first culture in a medium-sized technology company. His depiction of the initial state of the company (data in many different places, frustrated ML researchers who can’t get their research into production, confusion over what the data team is actually for) definitely rings true to me.

# 8th July 2021, 11:12 pm / data, data-science, leadership

Years

Tags