Blogmarks
Filters: Sorted by date
Compiling Black with mypyc (via) Richard Si is a Black contributor who recently obtained a 2x performance boost by compiling Black using the mypyc tool from the mypy project, which uses Python type annotations to generate a compiled C version of the Python logic. He wrote up this fantastic three-part series describing in detail how he achieved this, including plenty of tips on Python profiling and clever optimization tricks.
Lesser Known Features of ClickHouse (via) I keep hearing positive noises about ClickHouse. I learned about a whole bunch of capabilities from this article—including that ClickHouse can directly query tables that are stored in SQLite or PostgreSQL.
Dragonfly: A modern replacement for Redis and Memcached (via) I was initially pretty skeptical of the tagline: does Redis really need a “modern” replacement? But the Background section of the README makes this look like a genuinely interesting project. It re-imagines Redis to have its keyspace partitioned across multiple threads, and uses the VLL lock manager described in a 2014 paper to “compose atomic multi-key operations without using mutexes or spinlocks”. The initial benchmarks show up to a 25x increase in throughput compared to Redis. It’s written in C++.
Architecture Notes: Datasette (via) I was interviewed for the first edition of Architecture Notes—a new publication (website and newsletter) about software architecture created by Mahdi Yusuf. We covered a bunch of topics in detail: ASGI, SQLIte and asyncio, Baked Data, plugin hook design, Python in WebAssembly, Python in an Electron app and more. Mahdi also turned my scrappy diagrams into beautiful illustrations for the piece.
upptime (via) “Open-source uptime monitor and status page, powered entirely by GitHub Actions, Issues, and Pages.” This is a very creative (ab)use of GitHub Actions: it runs a scheduled action to check the availability of sites that you specify, records the results in a YAML file (with the commit history tracking them over time) and can automatically open a GitHub issue for you if it detects a new incident.
Benjamin “Zags” Zagorsky: Handling Timezones in Python. The talks from PyCon US have started appearing on YouTube. I found this one really useful for shoring up my Python timezone knowledge: It reminds that if your code calls datetime.now(), datetime.utcnow() or date.today(), you have timezone bugs—you’ve been working with ambiguous representations of instances in time that could span a 26 hour interval from UTC-12 to UTC+14. date.today() represents a 24 hour period and hence is prone to timezone surprises as well. My code has a lot of timezone bugs!
Paint Holding—reducing the flash of white on same-origin navigations. I missed this when it happened back in 2019: Chrome (and apparently Safari too—not sure about Firefox) implemented a feature where rather than showing a blank screen in between page navigations Chrome “waits briefly before starting to paint, especially if the page is fast enough”. As a result, fast loading multi-page applications become almost indistinguishable from SPAs (single-page apps). It’s a really neat feature, and now that I know how it works I realize that it explains why page navigations have felt a lot snappier to me over the past few years.
The balance has shifted away from SPAs (via) “There’s a feeling in the air. A zeitgeist. SPAs are no longer the cool kids they once were 10 years ago.” Nolan Lawson offers some opinions on why the pendulum seems to be swinging back in favour of server-side rendering over rendering every page entirely on the client. He argues that paint holding, back-forward caching and service workers have made the benefits of SPAs over MPAs much less apparent. I’m inclined to agree.
GOV.UK Guidance: Documenting APIs (via) Characteristically excellent guide from GOV.UK on writing great API documentation. “Task-based guidance helps users complete the most common integration tasks, based on the user needs from your research.”
Comby (via) Describes itself as “Structural search and replace for any language”. Lets you execute search and replace patterns that look a little bit like simplified regular expressions, but with some deep OCaml-powered magic that makes them aware of comment, string and nested parenthesis rules for different languages. This means you can use it to construct scripts that automate common refactoring or code upgrade tasks.
simonw/datasette-screenshots (via) I started a new GitHub repository to automate taking screenshots of Datasette for marketing purposes, using my shot-scraper browser automation tool.
Supercharging GitHub Actions with Job Summaries (via) GitHub Actions workflows can now generate a rendered Markdown summary of, well, anything that you can think to generate as part of the workflow execution. I particularly like the way this is designed: they provide a filename in a $GITHUB_STEP_SUMMARY environment variable which you can then append data to from each of your steps.
Heroku: Core Impact (via) Ex-Heroku engineer Brandur Leach pulls together some of the background information circulating concerning the now more than a month long Heroku security incident and provides some ex-insider commentary on what went right and what went wrong with a platform that left a huge, if somewhat underappreciated impact on the technology industry at large.
How Materialize and other databases optimize SQL subqueries. Jamie Brandon offers a survey of the state-of-the-art in optimizing correlated subqueries, across a number of different database engines.
Why Rust’s postfix await syntax is good (via) C J Silverio explains postfix await in Rust—where you can write a line like this, with the ? causing any errors to be caught and turned into an error return from your function:
let count = fetch_all_animals().await?.filter_for_hedgehogs().len();
sqlite-utils: a nice way to import data into SQLite for analysis (via) Julia Evans on my sqlite-utils Python library and CLI tool.
SIARD: Software Independent Archiving of Relational Databases (via) I hadn’t heard of this before but it looks really interesting: the Federal Archives of Switzerland developed a standard for archiving any relational database as a zip file full of XML which is “is used in over 50 countries around the globe”.
Simple declarative schema migration for SQLite (via) This is an interesting, clearly explained approach to the database migration problem. Create a new in-memory database and apply the current schema, then run some code to compare that with the previous schema—which tables are new, and which tables have had columns added. Then apply those changes.
I’d normally be cautious of running something like this because I can think of ways it could go wrong—but SQLite backups are so quick and cheap (just copy the file) that I could see this being a relatively risk-free way to apply migrations.
Web Scraping via Javascript Runtime Heap Snapshots (via) This is an absolutely brilliant scraping trick. Adrian Cooney figured out a way to use Puppeteer and the Chrome DevTools protocol to take a heap snapshot of all of the JavaScript running on a web page, then recursively crawl through the heap looking for any JavaScript objects that have a specified selection of properties. This allows him to scrape data from arbitrarily complex client-side web applications. He built a JavaScript library and command line tool that implements the pattern.
sqlite-utils 3.26.1 (via) I released sqlite-utils 3.36.1 with one tiny but exciting feature: I fixed its one dependency that wasn’t published as a pure Python wheel, which means it can now be used with Pyodide—Python compiled to WebAssembly running in your browser!
PyScript demos (via) PyScript was announced at PyCon this morning. It’s a new open source project that provides Web Components built on top of Pyodide, allowing you to use Python directly within your HTML pages in a way that is executed using a WebAssembly copy of Python running in your browser. These demos really help illustrate what it can do—it’s a fascinating new piece of the Python web ecosystem.
Testing Datasette parallel SQL queries in the nogil/python fork. As part of my ongoing research into whether Datasette can be sped up by running SQL queries in parallel I’ve been growing increasingly suspicious that the GIL is holding me back. I know the sqlite3 module releases the GIL and was hoping that would give me parallel queries, but it looks like there’s still a ton of work going on in Python GIL land creating Python objects representing the results of the query.
Sam Gross has been working on a nogil fork of Python and I decided to give it a go. It’s published as a Docker image and it turns out trying it out really did just take a few commands... and it produced the desired results, my parallel code started beating my serial code where previously the two had produced effectively the same performance numbers.
I’m pretty stunned by this. I had no idea how far along the nogil fork was. It’s amazing to see it in action.
HTML event handler attributes: down the rabbit hole
(via)
onclick="myfunction(event)" is an idiom for passing the click event to a function - but how does it work? It turns out the answer is buried deep in the HTML spec - the browser wraps that string of code in a function(event) { ... that string ... } function and makes the event available to its local scope that way.
Mac OS 8 emulated in WebAssembly (via) Absolutely incredible project by Mihai Parparita. This is a full, working copy of Mac OS 8 (from 1997) running in your browser via WebAssembly—and it’s fully loaded with games and applications too. I played with Photoshop 3.0 and Civilization and there’s so much more on there to explore too—I finally get to try out HyperCard!
Learn Go with tests. I really like this approach to learning a new language: start by learning to write tests (which gets you through hello world, environment setup and test running right from the beginning) and use them to explore the language. I also really like how modern Go development no longer depends on the GOPATH, which I always found really confusing.
jq language description (via) I love jq but I’ve always found it difficult to remember how to use it, and the manual hasn’t helped me as much as I would hope. It turns out the jq wiki on GitHub offers an alternative, more detailed description of the language which fits the way my brain works a lot better.
A tiny CI system (via) Christian Ştefănescu shares a recipe for building a tiny self-hosted CI system using Git and Redis. A post-receive hook runs when a commit is pushed to the repo and uses redis-cli to push jobs to a list. Then a separate bash script runs a loop with a blocking “redis-cli blpop jobs” operation which waits for new jobs and then executes the CI job as a shell script.
WebAIM guide to using iOS VoiceOver to evaluate web accessibility (via) I asked for pointers on learning to use VoiceOver on my iPhone for accessibility testing today and Matt Hobbs pointed me to this tutorial from the WebAIM group at Utah State University.
Web Components as Progressive Enhancement (via) I think this is a key aspect of Web Components I had been missing: since they default to rendering their contents, you can use them as a wrapper around regular HTML elements that can then be progressively enhanced once the JavaScript has loaded.
Glue code to quickly copy data from one Postgres table to another (via) The Python script that Retool used to migrate 4TB of data between two PostgreSQL databases. I find the structure of this script really interesting—it uses Python to spin up a queue full of ID ranges to be transferred and then starts some threads, but then each thread shells out to a command that runs “psql COPY (SELECT ...) TO STDOUT” and pipes the result to “psql COPY xxx FROM STDIN”. Clearly this works really well (“saturate the database’s hardware capacity” according to a comment on HN), and neatly sidesteps any issues with Python’s GIL.