Simon Willison’s Weblog

Items in Feb

Filters: Month: Feb ×


Trying to end the pandemic a little earlier with VaccinateCA

This week I got involved with the VaccinateCA effort. We are trying to end the pandemic a little earlier, by building the most accurate database possible of vaccination locations and availability in California.

[... 1154 words]

unasync (via) Today I started wondering out loud if one could write code that takes an asyncio Python library and transforms it into the synchronous equivalent by using some regular expressions to strip out the “await ...” keywords and suchlike. Turns out that can indeed work, and Ratan Kulshreshtha built it! unasync uses the standard library tokenize module to run some transformations against an async library and spit out the sync version automatically. I’m now considering using this for sqlite-utils. # 27th February 2021, 10:20 pm

cosmopolitan libc (via) “Cosmopolitan makes C a build-once run-anywhere language, similar to Java, except it doesn’t require interpreters or virtual machines be installed beforehand. [...] Instead, it reconfigures stock GCC to output a POSIX-approved polyglot format that runs natively on Linux + Mac + Windows + FreeBSD + OpenBSD + BIOS with the best possible performance and the tiniest footprint imaginable.” This is a spectacular piece of engineering. # 27th February 2021, 6:02 am

How to secure an Ubuntu server using Tailscale and UFW. This is the Tailscale tutorial I’ve always wanted: it explains in detail how you can run an Ubuntu server (from any cloud provider) such that only devices on your personal Tailscale network can access it. # 26th February 2021, 8:31 pm

When building a tool, it’s easy to forget how much you’ve internalized: how much knowledge and context you’ve assumed. Your tool can feel familiar or even obvious to you while being utterly foreign to everyone else. If your goal is for other people to use the darn thing — meaning you’re not just building for yourself, or tinkering for its own sake (which are totally valid reasons) — you gotta help people use it! It doesn’t matter what’s possible or what you intended; all that matters is whether people actually succeed in practice.

Mike Bostock # 23rd February 2021, 10:55 pm

I strongly suspect that the single most impactful thing I did during my 5+ years at Linden Lab was shortly before I left: set up a weekly meeting between a couple of leads from Support and Engineering to go over the top 10 support issues.

Yoz Grahame # 23rd February 2021, 4:49 am

Fuzzy Name Matching in Postgres. Paul Ramsey describes how to implement fuzzy name matching in PostgreSQL using the fuzzystrmatch extension and its levenshtein() and soundex() functions, plus functional indexes to query against indexed soundex first and then apply slower Levenshtein. The same tricks should also work against SQLite using the datasette-jellyfish plugin. # 22nd February 2021, 9:16 pm

Business rules engines are li’l Conway’s Law devices: a manifestation of the distrust between stakeholders, client and contractor. We require BREs so that separate business units need not talk to each other to solve problems. They are communication and organizational dysfunction made silicon.

Paul Smith # 22nd February 2021, 8:34 pm

Blazing fast CI with pytest-split and GitHub Actions (via) pytest-split is a neat looking variant on the pattern of splitting up a test suite to run different parts of it in parallel on different machines. It involves maintaining a periodically updated JSON file in the repo recording the average runtime of different tests, to enable them to be more fairly divided among test runners. Includes a recipe for running as a matrix in GitHub Actions. # 22nd February 2021, 7:06 pm

People, processes, priorities. Twitter thread from Adrienne Porter Felt outlining her model for thinking about engineering management. I like this trifecta of “people, processes, priorities” a lot. # 22nd February 2021, 5:21 pm

Cross-database queries in SQLite (and weeknotes)

I released Datasette 0.55 and sqlite-utils 3.6 this week with a common theme across both releases: supporting cross-database joins.

[... 719 words]

Open source projects should run office hours

Back in December I decided to try something new for my Datasette open source project: Datasette Office Hours. The idea is simple: anyone can book a 25 minute conversation with me on a Friday to talk about the project. I’m interested in talking to people who are using Datasette, or who are considering using it, or who just want to have a chat.

[... 691 words]

One of the hardest things I’ve had to learn is that humans aren’t pure functions: an input that works one day and gets one result, then again another day and get an entirely different result.

Sarah Drasner # 19th February 2021, 12 am

Weeknotes: Finally, an intro video for Datasette

My big project this week was this Video introduction to Datasette and sqlite-utils. I recorded the video a few weeks ago in advance of FOSDEM, but this week I put together the annotated version. I’m really happy with it, and I’ve added it to the datasette.io homepage as a starting point for helping people understand the project.

[... 690 words]

Litestream runs continuously on a test server with generated load and streams backups to S3. It uses physical replication so it’ll actually restore the data from S3 periodically and compare the checksum byte-for-byte with the current database.

Ben Johnson # 11th February 2021, 8:50 pm

trustme (via) This looks incredibly useful. Run “python -m trustme” and it will create three files for you: server.pem, server.key and a client.pem client certificate, providing a certificate for “localhost” (or another host you spefict) using a fake certificate authority. Looks like it should be the easiest way to test TLS locally. # 11th February 2021, 8 pm

Why I Built Litestream. Litestream is a really exciting new piece of technology by Ben Johnson, who previously built BoltDB, the key-value store written in Go that is used by etcd. It adds replication to SQLite by running a process that converts the SQLite WAL log into a stream that can be saved to another folder or pushed to S3. The S3 option is particularly exciting—Ben estimates that keeping a full point-in-time recovery log of a high write SQLite database should cost in the order of a few dollars a month. I think this could greatly expand the set of use-cases for which SQLite is sensible choice. # 11th February 2021, 7:25 pm

Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies (via) Alex Birsan describes a new category of security vulnerability he discovered in the npm, pip and gem packaging ecosystems: if a company uses a private repository with internal package names, uploading a package with the same name to the public repository can often result in an attacker being able to execute their own code inside the networks of their target. Alex scored over $130,000 in bug bounties from this one, from a number of name-brand companies. Of particular note for Python developers: the --extra-index-url argument to pip will consult both public and private registries and install the package with the highest version number! # 10th February 2021, 8:42 pm

Finally, remember that whatever choice is made, you’re going to need to get behind it! You should be able to make a compelling positive case for any of the options you present. If there’s an option you can’t support, don’t present it.

Jacob Kaplan-Moss # 8th February 2021, 3:21 pm

Video introduction to Datasette and sqlite-utils

I put together a 17 minute video introduction to Datasette and sqlite-utils for FOSDEM 2021, showing how you can use Datasette to explore data, and demonstrating using the sqlite-utils command-line tool to convert a CSV file into a SQLite database, and then publish it using datasette publish. Here’s the video, plus annotated screen captures with further links and commentary.

[... 1969 words]

Weeknotes: Mostly messing around with map tiles

Most of what I worked on this week was covered in Serving map tiles from SQLite with MBTiles and datasette-tiles. I built two new plugins: datasette-tiles for serving map tiles, and datasette-basemap which bundles map tiles for zoom levels 0-6 of OpenStreetMap. I also released download-tiles for downloading tiles and bundling them into an MBTiles database.

[... 350 words]

Serving map tiles from SQLite with MBTiles and datasette-tiles

Working on datasette-leaflet last week re-kindled my interest in using Datasette as a GIS (Geographic Information System) platform. SQLite already has strong GIS functionality in the form of SpatiaLite and datasette-cluster-map is currently the most downloaded plugin. Most importantly, maps are fun!

[... 1334 words]

Cleaning Up Your Postgres Database (via) Craig Kerstiens provides some invaluable tips on running an initial check of the health of a PostgreSQL database, by using queries against the pg_statio_user_indexes table to find the memory cache hit ratio and the pg_stat_user_tables table to see what percentage of queries to your tables are using an index. # 3rd February 2021, 7:32 am

JMeter Result Analysis using Datasette (via) NaveenKumar Namachivayam wrote a detailed tutorial on using Datasette (on Windows) and csvs-to-sqlite to analyze the results of JMeter performance test runs and then publish them online using Vercel. # 1st February 2021, 4:42 am

Wildcard: Spreadsheet-Driven Customization of Web Applications (via) What a fascinating collection of ideas. Wildcard is a browser extension (currently using Tampermonkey and sadly not yet available to try out) which lets you add “spreadsheet-driven customization” to any web application. Watching the animated screenshots in the videos helps explain what this mean—essentially it’s a two-way scraping trick, where content on the page (e.g. Airbnb listings) are extracted into a spreadsheet-like table interface using JavaScript—but then interactions you make in that spreadsheet like filtering and sorting are reflected back on the original page. It even has the ability to serve editable cells by mapping them to form inputs on the page. Lots to think about here. # 28th February 2020, 7:39 pm

Weeknotes: Datasette Writes

As discussed previously, the biggest hole in Datasette’s feature set at the moment involves writing to the database.

[... 604 words]

I’ve really come to appreciate that performance isn’t just some property of a tool independent from its functionality or its feature set. Performance — in particular, being notably fast — is a feature in and of its own right, which fundamentally alters how a tool is used and perceived.

Nelson Elhage # 24th February 2020, 2:32 pm

Why Google invested in providing Google Fonts for free. Fascinating comment from former Google Fonts team member Raph Levien. In short: text rendered as PNGs hurt Google Search, fonts were a delay in the transition from Flash, Google Docs needed them to better compete with Office and anything that helps create better ads is easy to find funding for. # 23rd February 2020, 2:13 pm

So next time someone is giving you feedback about something you made, think to yourself that to win means getting two or three insights, ideas, or suggestions that you are excited about, and that you couldn’t think up on your own.

Juliette Cezzar # 21st February 2020, 1:04 am

Things I learned about shapefiles building shapefile-to-sqlite

The latest in my series of x-to-sqlite tools is shapefile-to-sqlite. I learned a whole bunch of things about the ESRI shapefile format while building it.

[... 1073 words]

Types

Years

Tags