Items in Feb, 2021
This week I got involved with the VaccinateCA effort. We are trying to end the pandemic a little earlier, by building the most accurate database possible of vaccination locations and availability in California.[... 1154 words]
unasync (via) Today I started wondering out loud if one could write code that takes an asyncio Python library and transforms it into the synchronous equivalent by using some regular expressions to strip out the “await ...” keywords and suchlike. Turns out that can indeed work, and Ratan Kulshreshtha built it! unasync uses the standard library tokenize module to run some transformations against an async library and spit out the sync version automatically. I’m now considering using this for sqlite-utils. # 27th February 2021, 10:20 pm
cosmopolitan libc (via) “Cosmopolitan makes C a build-once run-anywhere language, similar to Java, except it doesn’t require interpreters or virtual machines be installed beforehand. [...] Instead, it reconfigures stock GCC to output a POSIX-approved polyglot format that runs natively on Linux + Mac + Windows + FreeBSD + OpenBSD + BIOS with the best possible performance and the tiniest footprint imaginable.” This is a spectacular piece of engineering. # 27th February 2021, 6:02 am
How to secure an Ubuntu server using Tailscale and UFW. This is the Tailscale tutorial I’ve always wanted: it explains in detail how you can run an Ubuntu server (from any cloud provider) such that only devices on your personal Tailscale network can access it. # 26th February 2021, 8:31 pm
When building a tool, it’s easy to forget how much you’ve internalized: how much knowledge and context you’ve assumed. Your tool can feel familiar or even obvious to you while being utterly foreign to everyone else. If your goal is for other people to use the darn thing — meaning you’re not just building for yourself, or tinkering for its own sake (which are totally valid reasons) — you gotta help people use it! It doesn’t matter what’s possible or what you intended; all that matters is whether people actually succeed in practice.
I strongly suspect that the single most impactful thing I did during my 5+ years at Linden Lab was shortly before I left: set up a weekly meeting between a couple of leads from Support and Engineering to go over the top 10 support issues.
Fuzzy Name Matching in Postgres. Paul Ramsey describes how to implement fuzzy name matching in PostgreSQL using the fuzzystrmatch extension and its levenshtein() and soundex() functions, plus functional indexes to query against indexed soundex first and then apply slower Levenshtein. The same tricks should also work against SQLite using the datasette-jellyfish plugin. # 22nd February 2021, 9:16 pm
Business rules engines are li’l Conway’s Law devices: a manifestation of the distrust between stakeholders, client and contractor. We require BREs so that separate business units need not talk to each other to solve problems. They are communication and organizational dysfunction made silicon.
Blazing fast CI with pytest-split and GitHub Actions (via) pytest-split is a neat looking variant on the pattern of splitting up a test suite to run different parts of it in parallel on different machines. It involves maintaining a periodically updated JSON file in the repo recording the average runtime of different tests, to enable them to be more fairly divided among test runners. Includes a recipe for running as a matrix in GitHub Actions. # 22nd February 2021, 7:06 pm
People, processes, priorities. Twitter thread from Adrienne Porter Felt outlining her model for thinking about engineering management. I like this trifecta of “people, processes, priorities” a lot. # 22nd February 2021, 5:21 pm
[... 719 words]
Back in December I decided to try something new for my Datasette open source project: Datasette Office Hours. The idea is simple: anyone can book a 25 minute conversation with me on a Friday to talk about the project. I’m interested in talking to people who are using Datasette, or who are considering using it, or who just want to have a chat.[... 786 words]
One of the hardest things I’ve had to learn is that humans aren’t pure functions: an input that works one day and gets one result, then again another day and get an entirely different result.
My big project this week was this Video introduction to Datasette and sqlite-utils. I recorded the video a few weeks ago in advance of FOSDEM, but this week I put together the annotated version. I’m really happy with it, and I’ve added it to the datasette.io homepage as a starting point for helping people understand the project.[... 690 words]
Litestream runs continuously on a test server with generated load and streams backups to S3. It uses physical replication so it’ll actually restore the data from S3 periodically and compare the checksum byte-for-byte with the current database.
trustme (via) This looks incredibly useful. Run “python -m trustme” and it will create three files for you: server.pem, server.key and a client.pem client certificate, providing a certificate for “localhost” (or another host you spefict) using a fake certificate authority. Looks like it should be the easiest way to test TLS locally. # 11th February 2021, 8 pm
Why I Built Litestream. Litestream is a really exciting new piece of technology by Ben Johnson, who previously built BoltDB, the key-value store written in Go that is used by etcd. It adds replication to SQLite by running a process that converts the SQLite WAL log into a stream that can be saved to another folder or pushed to S3. The S3 option is particularly exciting—Ben estimates that keeping a full point-in-time recovery log of a high write SQLite database should cost in the order of a few dollars a month. I think this could greatly expand the set of use-cases for which SQLite is sensible choice. # 11th February 2021, 7:25 pm
Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies (via) Alex Birsan describes a new category of security vulnerability he discovered in the npm, pip and gem packaging ecosystems: if a company uses a private repository with internal package names, uploading a package with the same name to the public repository can often result in an attacker being able to execute their own code inside the networks of their target. Alex scored over $130,000 in bug bounties from this one, from a number of name-brand companies. Of particular note for Python developers: the --extra-index-url argument to pip will consult both public and private registries and install the package with the highest version number! # 10th February 2021, 8:42 pm
Finally, remember that whatever choice is made, you’re going to need to get behind it! You should be able to make a compelling positive case for any of the options you present. If there’s an option you can’t support, don’t present it.
I put together a 17 minute video introduction to Datasette and sqlite-utils for FOSDEM 2021, showing how you can use Datasette to explore data, and demonstrating using the
sqlite-utils command-line tool to convert a CSV file into a SQLite database, and then publish it using
datasette publish. Here’s the video, plus annotated screen captures with further links and commentary.
Most of what I worked on this week was covered in Serving map tiles from SQLite with MBTiles and datasette-tiles. I built two new plugins: datasette-tiles for serving map tiles, and datasette-basemap which bundles map tiles for zoom levels 0-6 of OpenStreetMap. I also released download-tiles for downloading tiles and bundling them into an MBTiles database.[... 350 words]
Working on datasette-leaflet last week re-kindled my interest in using Datasette as a GIS (Geographic Information System) platform. SQLite already has strong GIS functionality in the form of SpatiaLite and datasette-cluster-map is currently the most downloaded plugin. Most importantly, maps are fun![... 1334 words]
Cleaning Up Your Postgres Database (via) Craig Kerstiens provides some invaluable tips on running an initial check of the health of a PostgreSQL database, by using queries against the pg_statio_user_indexes table to find the memory cache hit ratio and the pg_stat_user_tables table to see what percentage of queries to your tables are using an index. # 3rd February 2021, 7:32 am