Items in Jul, 2020
Why Your Christian Friends and Family Members Are So Easily Fooled by Conspiracy Theories (via) I think the title undersells this: this is a really great piece of writing on conspiracy theories, why people fall for them and why it’s so hard to dig people back out again—regardless of any particular religion, despite being written for a Christian audience. # 31st July 2020, 6:28 pm
This week I’ve been mainly experimenting with binary data storage in SQLite. sqlite-utils can now insert data from binary files, and datasette-media can serve content over HTTP that originated as binary BLOBs in a database file.[... 947 words]
Sandboxing and Workload Isolation (via) Fly.io run other people’s code in containers, so workload isolation is a Big Deal for them. This blog post goes deep into the history of isolation and the various different approaches you can take, and fills me with confidence that the team at Fly.io know their stuff. I got to the bottom and found it had been written by Thomas Ptacek, which didn’t surprise me in the slightest. # 30th July 2020, 10:19 pm
Some SQL Tricks of an Application DBA (via) This post taught me so many PostgreSQL tricks that I hadn’t seen before. Did you know you can start a transaction, drop an index, run explain and then rollback the transaction (cancelling the index drop) to see what explain would look like without that index? Among other things I also learned what the “correlation” database statistic does: it’s a measure of how close-to-sorted the values in a specific column are, which helps PostgreSQL decide if it should do an index scan or a bitmap scan when making use of an index. # 29th July 2020, 7:04 pm
datasette-media 0.4. datasette-media is my Datasette plugin for serving media (e.g. images) directly from Datasette. The first version used file paths saved in a column and served the data from disk—this new version adds the ability to serve content from BLOB columns, such as those created by the new “sqlite-utils insert-files” command. It also adds configurable support for resizing images based on querystring parameters like ?w=100. # 28th July 2020, 2:22 am
sqlite-utils 2.12 (via) I’ve been experimenting with ways of improving BLOB support in Datasette and sqlite-utils. This new version of sqlite-utils includes a “sqlite-utils insert-files” command, which can recursively crawl directories for files and add their contents to SQLite with configurable columns containing their metadata. I was inspired by Paul Ford who has been creating multi-GB SQLite databases of images and PDFs. It turns out that when disk space is cheap this is a pretty effective way of working with interesting corpuses of documents and images. # 27th July 2020, 7:36 am
Organizations adopt microservices when the logistical overhead of coordinating teams against a monolith becomes so large that it starts affecting product velocity. Microservices are a way to release that organizational/logistical friction, at great technical cost.
pypi-rename. I wanted to rename a PyPI package (renaming datasette-insert-api to datasette-insert as it’s about to grow some non-API features). PyPI recommend uploading a final release under the old name which points to (and depends on) the new name. I’ve built a cookiecutter template to codify that pattern. # 25th July 2020, 11:07 pm
PostgreSQL full-text search in the Django Admin. Today I figured out how to use PostgreSQL full-text search in the Django admin for my blog, using the get_search_results method on a subclass of ModelAdmin. # 25th July 2020, 11:05 pm
Doing Stupid Stuff with GitHub Actions (via) I love the idea here of running a scheduled action once a year that deliberately fails, causing GitHub to send you a “Happy New Year” failure email! # 25th July 2020, 9:19 pm
If you have to repeat yourself, you weren’t clear enough the first time. However, if you’re talking about something brand new, you may have to repeat yourself for years before you’re heard. Pick your repeats wisely.
Two new Datasette plugins this week: datasette-copyable, helping users copy-and-paste data from Datasette into other places, and datasette-insert-api, providing a JSON API for inserting and updating data and creating tables.[... 953 words]
The unofficial Google Cloud Run FAQ. This is really useful: a no-fluff, content rich explanation of Google Cloud Run hosted as a GitHub repo that actively accepts pull requests from the community. It’s maintained by Ahmet Alp Balkan, a Cloud Run engineer who states “Googlers: If you find this repo useful, you should recognize the work internally, as I actively fight for alternative forms of content like this”. One of the hardest parts of working with AWS and GCP is digging through the marketing materials to figure out what the product actually does, so the more alternative forms of documentation like this the better. # 22nd July 2020, 5:20 pm
You always get the name of the dog, the editor explained. The dog is a character in your story, and names tell readers a lot about your characters. It’s a crucial storytelling detail, and if you’re alert and inquisitive enough to ask for the name of the dog, you’ll surely not miss any other important details.
Quite simply, it’s the product manager’s job to articulate two simple things: What game are we playing? How do we keep score? Do these two things right, and all of a sudden a collection of brilliant individual contributors with talents in engineering, operations, quality, design and marketing will start running in the same direction. Without it, no amount of prioritization or execution management will save you.
22 Principles for Great Product Managers (via) By Alex Reeve, a PM at LinkedIn. These are really strong—I particularly liked the “leading your team” section which emphasizes ensuring your team understand the goal and the path to reach it, and that you know what winning will look like and how to tell. # 20th July 2020, 8:17 pm
Instead of seeing instrumentation as a last-ditch effort of strings and metrics, we must think about propagating the full context of a request and emitting it at regular pulses. No pull request should ever be accepted unless the engineer can answer the question, “How will I know if this breaks?”
When I was curating my generated tweets, I estimated 30-40% of the tweets were usable comedically, a massive improvement over the 5-10% usability from my GPT-2 tweet generation. However, a 30-40% success rate implies a 60-70% failure rate, which is patently unsuitable for a production application.
Tempering Expectations for GPT-3 and OpenAI’s API. Insightful commentary on GPT-3 (which is producing some ridiculously cool demos at the moment thanks to the invite-only OpenAI API) from Max Woolf. # 18th July 2020, 7:29 pm
All sorts of project updates this week.[... 913 words]
Develomentor podcast: Simon Willison – Data Journalism, The Importance of Side Projects (via) Grant Ingersoll interviewed me for the Develomentor podcast. We talked about my career so far, and how much of it was driven by side-projects that I’ve worked on individually or with Natalie. # 17th July 2020, 1:33 am
datasette-auth-passwords. My latest plugin: datasette-auth-passwords provides a mechanism for signing into Datasette using a username and password (which is verified in order to set a ds_actor authentication cookie). So far it only supports passwords that are hard-coded into Datasette’s configuration via environment variables, but I plan to add database-backed user accounts in the future. # 13th July 2020, 11:39 pm
zhiiiyang/zhiiiyang profile README (via) This is a brilliant hack: a GitHub profile README that uses an action to retrieve the author’s latest tweet (using R), render it as a PNG screenshot in headless Chrome via rstudio/webshot2 and embed that image in their profile. # 11th July 2020, 5:47 pm
GitHub quietly released a new feature at some point in the past few days: profile READMEs. Create a repository with the same name as your GitHub account (in my case that’s github.com/simonw/simonw), add a
README.md to it and GitHub will render the contents at the top of your personal profile page—for me that’s github.com/simonw
This week I’ve mainly been exploring Small Business Administration Covid-19 loans data, pitching some talks and upgrading some plugins for compatibility with Datasette 0.44+.[... 524 words]
When data is messy. I love this story: a neural network trained on images was asked what the most significant pixels in pictures of tench (a kind of fish) were: it returned pictures of fingers on a green background, because most of the tench photos it had seen were fisherfolk showing off their catch. # 7th July 2020, 7:03 pm
GitHub Actions: Manual triggers with workflow_dispatch (via) New GitHub Actions feature which fills a big gap in the offering: you can now create “workflow dispatch” events which provide a button for manually triggering an action—and you can specify extra UI form fields that can customize how that action runs. This turns Actions into an interactive automation engine for any code that can be wrapped in a Docker container. # 7th July 2020, 4:33 am
sba-loans-covid-19-datasette (via) The treasury department released a bunch of data on the Covid-19 SBA Paycheck Protection Program Loan recipients today—I’ve loaded the most interesting data (the $150,000+ loans) into a Datasette instance. # 7th July 2020, 2:42 am
The future will not be like the past. The comfortable Victorian and Georgian world complete with grand country houses, a globe-spanning British empire, and lords and commoners each knowing their place, was swept away by the events that began in the summer of 1914 (and that with Britain on the “winning” side of both world wars.) So too, our comfortable “American century” of conspicuous consumer consumption, global tourism, and ever-increasing stock and home prices may be gone forever.