Simon Willison’s Weblog

Blogmarks in Dec

Filters: Type: blogmark × Month: Dec ×

Replicating SQLite with rqlite (via) I’ve been trying out rqlite, a “lightweight, distributed relational database, which uses SQLite as its storage engine”. It’s written in Go and uses the Raft consensus algorithm to allow a cluster of nodes to elect a leader and replicate SQLite statements between them. By default it uses in-memory SQLite databases with an on-disk Raft replication log—here are my notes on running it in “on disk” mode as a way to run multiple Datasette processes against replicated SQLite database files. # 28th December 2020, 7:51 pm

How Shopify Uses WebAssembly Outside of the Browser (via) I’m fascinated by applications of WebAssembly outside the browser. As a Python programmer I’m excited to see native code libraries getting compiled to WASM in a way that lets me call them from Python code via a bridge, but the other interesting application is executing untrusted code in a sandbox. Shopify are doing exactly that—they are building a kind-of plugin mechanism where partner code compiled to WASM runs inside their architecture using Fastly’s Lucet. The performance numbers are in the same ballpark as native code. Also interesting: they’re recommending AssemblyScript, a TypeScript-style language designed to compile directly to WASM without needing any additional interpreter support, as required by dynamic languages such as JavaScript, Python or Ruby. # 19th December 2020, 4:46 pm

Commits are snapshots, not diffs (via) Useful, clearly explained revision of some Git fundamentals. # 17th December 2020, 10:01 pm (via) Datasette finally has an official project website, three years after the first release of the software. I built it using Datasette, with custom templates to define the various pages. The site includes news, latest releases, example sites and a new searchable plugin directory. # 11th December 2020, 4:11 am

Deno 1.6 Release Notes. Two signature features in Deno 1.6 worth paying attention to: a built-in language server for code editors like VS Code, and the “deno compile” command which can build Deno JavaScript/TypeScript projects into standalone binaries. The ability to build binaries has turned out to be a killer feature of both Go and Rust, so seeing it ship as a default capability of a interpreted dynamic language is fascinating. I would love it if Python followed Deno’s example. # 10th December 2020, 1:25 am

The case against client certificates (via) Colm MacCárthaigh provides a passionately argued Twitter thread about client certificates and why they should be avoided. I tried using them as an extra layer of protection fir my personal Dogsheep server and ended up abandoning them—certificate management across my devices was too fiddly. # 9th December 2020, 2:41 pm

Cameras and Lenses (via) Fabulous explotable interactive essay by Bartosz Ciechanowski explaining how cameras and lenses work. # 8th December 2020, 3:38 am

The secrets of Monkey Island’s source code (via) To celebrate the thirty year anniversary of the Secret of Monkey Island the Video Game History Foundation interviewed developer Rod Gilbert and produced this comprehensive collection of cut content and material showing how the game was originally constructed. # 5th December 2020, 4:32 pm

Command Line Interface Guidelines (via) Aanand Prasad, Ben Firshman, Carl Tashian and Eva Parish provide the missing manual for designing CLI tools in 2020. Deeply researched and clearly presented—I picked up a bunch of useful tips and ideas from reading this, and I’m looking forward to applying them to my own CLI projects. # 4th December 2020, 8:44 pm

Scaling Datastores at Slack with Vitess (via) Slack spent three years migrating 99% of their MySQL query load to run against Vitess, the open source MySQL sharding system originally built by YouTube. “Today, we serve 2.3 million QPS at peak. 2M of those queries are reads and 300K are writes. Our median query latency is 2 ms, and our p99 query latency is 11 ms.” # 1st December 2020, 9:30 pm

New for AWS Lambda – Container Image Support. “You can now package and deploy Lambda functions as container images of up to 10 GB in size”—can’t wait to try this out with Datasette. # 1st December 2020, 5:34 pm

Scaling React Server-Side Rendering (via) Outstanding, detailed essay from 2017 on challenges and solutions for scaling React server-side rendering at Kijiji, Canada’s largest classified site (owned by eBay). There’s a lot of great stuff in here, including a detailed discussion of different approaches to load balancing, load shedding, component caching, client-side rendering fallbacks and more. # 30th December 2019, 10:26 pm

Guide To Using Reverse Image Search For Investigations (via) Detailed guide from Bellingcat’s Aric Toler on using reverse image search for investigative reporting. Surprisingly Google Image Search isn’t the state of the art: Russian search engine Yandex offers a much more powerful solution, mainly because it’s the largest public-facing image search engine to integrate scary levels of face recognition. # 30th December 2019, 10:23 pm

Machine Learning on Mobile and at the Edge: 2019 industry year-in-review (via) This is a fantastic detailed overview of advances made in the field of machine learning on the edge (primarily on mobile devices) over 2019. I’m really excited about this trend: I love the improved privacy implications of running models on my phone without uploading data to a server, and it’s great to see techniques like Federated Learning (from Google Labs) which enable devices to privately train models in a distributed way without having to upload their training data. # 30th December 2019, 10:17 pm (via) It’s pretty amazing how much you can build on free tiers these days—perfect for experimenting with side-projects. collects free SaaS tools for developers via pull request, and has had contributions from over 500 people. # 26th December 2019, 10:03 am

The Guardian’s nifty old-article trick is a reminder of how news organizations can use metadata to limit misinformation (via) The Guardian displays prominent banners on news stories from more than a year ago warning that it is an older article to help prevent accidental or intentional spread of misinformation using their content as ammunition. Impressively they also display the year prominently on the card images they serve as social media previews fir older articles. # 23rd December 2019, 9:36 am

Building tools to bring data-driven reporting to more newsrooms. I wrote about my fellowship project so far and my goals for the next few months for the JSK Medium publication. My next priority: an invite-only hosted version for newsrooms so that figuring out how to install and manage the software isn’t the biggest barrier to entry. # 20th December 2019, 11:17 am

athena-sqlite (via) Amazon Athena is the AWS tool for querying data stored in S3—as CSV, JSON or Apache Parquet files—using SQL. It’s an interesting way of buliding a very cheap data warehouse on top of S3 without having to run any additional services. Athena recently added a query federation SDK which lets you define additional custom data sources using Lambda functions. Damon Cortesi used this to write a custom connector for SQLite, which lets you run queries against data stored in SQLite files that you have uploaded to S3. You can then run joins between that data and other Athena sources. # 18th December 2019, 9:05 am

GitHub Actions ci.yml for deno. Spotted this today: it’s one of the cleanest examples I’ve seen of a complex CI configuration for GitHub Actions, testing, linting, benchmarking and building Ryan Dahl’s deno JavaScript runtime. # 18th December 2019, 8:49 am

Microbrowsers are Everywhere (via) Colin Bendell introduces a new-to-me term, “microbrowsers”, to describe the user-agents which hit websites to generate unfurled link previews in messenger apps. Twitter and Facebook first popularized them, but today you’re likely getting far more preview-generating traffic from chat clients such as iMessage, WhatsApp and Slack (which won’t execute script and ignore cookies, and hence won’t show up in Google Analytics). Lots of great tips here—one example: if you provide three og:image meta tags iMessage will render them as a collage. # 18th December 2019, 8:32 am

Monarch Bear Grove on Niche Museums (via) Monarch Bear Grove is my favourite hidden corner of Golden Gate Park in San Francisco. It has stone circles formed from pieces of a Spanish monastery that was exported to the USA by press baron William Randolph Hearst. And there are druids. You should read the whole thing. (I added paragraph breaks for this using datasette-render-markdown—Niche Museums is basically a full-blown blog now.) # 16th December 2019, 9:19 pm

London Silver Vaults on Niche Museums. I’m keeping up my streak of posting a new museum I’ve visited to daily—today’s entry is the London Silver Vaults, which I think are one of London’s best kept secrets: 30 specialist silver merchants in a network of vaults five storeys below Chancery Lane. # 12th December 2019, 2:40 am

The Blue Tape List (via) I’ve often thought there’s something magical about your first month at a new job—you can meet anyone and ask any question, taking advantage of your “newbie” status. I like this suggestion by Michael Lopp to encourage your new hires to take notes on things that they think are broken but reserve acting on them for long enough to gain fuller context of how the new organization works. # 10th December 2019, 6:09 pm

Two malicious Python libraries caught stealing SSH and GPG keys. Nasty. Two typosquatting libraries were spotted on PyPI—targetting dateutil and jellyfish but with tricky variants of their names. They attempted to exfiltrate SSH and GPG keys and send them to an IP address defined server. npm has seen this kind of activity too—it’s important to consider this when installing packages. # 5th December 2019, 6:07 am

flk: A LISP that runs wherever Bash is (via) This is a heck of a project: an implementation of LISP written entirely in Bash, meaning you can run it as a script on any machine that has a Bash installation. # 4th December 2019, 5:19 am

The Friendship That Made Google Huge. The New Yorker profiles Jeff Dean and Sanjay Ghemawat, Google’s first and only level 11 Senior Fellows. This is some of the best writing on complex software engineering topics (map-reduce, Tensor Flow and the like) aimed at a general audience that I’ve ever seen. Also a very compelling case study in pair programming. # 31st December 2018, 3:56 am

benfred/py-spy (via) A Python port of Julia Evans’ rbspy profiler, which she describes as “probably better” than the original. I just tried it out and it’s really impressive: it’s written in Rust but has precompiled binaries so you can just run “pip install py-spy” to install it. Shows live output in the terminal while your program is running and also includes the option to generate neat SVG flame graphs. # 29th December 2018, 5:18 am

Fast Autocomplete Search for Your Website (via) I wrote a tutorial for the 24 ways advent calendar on building fast autocomplete search for a website on top of Datasette and SQLite. I built the demo against 24 ways itself—I used wget to recursively fetch all 330 articles as HTML, then wrote code in a Jupyter notebook to extract the raw data from them (with BeautifulSoup) and load them into SQLite using my sqlite-utils Python library. I deployed the resulting database using Datasette, then wrote some vanilla JavaScript to implement autocomplete using fast SQL queries against the Datasette JSON API. # 19th December 2018, 12:26 am

Develop Your Naturalist Superpowers with Observable Notebooks and iNaturalist (via) Natalie’s article for this year’s 24 ways advent calendar shows how you can use Observable notebooks to quickly build interactive visualizations against web APIs. She uses the iNaturalist API to show species of Nudibranchs that you might see in a given month, plus a Vega-powered graph of sightings over the course of the year. This really inspired me to think harder about how I can use Observable to solve some of my API debugging needs, and I’ve already spun up a couple of private Notebooks to exercise new APIs that I’m building at work. It’s a huge productivity boost. # 18th December 2018, 10:39 pm