1,471 posts tagged “datasette”
Datasette is an open source tool for exploring and publishing data.
2026
The most popular blogs of Hacker News in 2025 (via) Michael Lynch maintains HN Popularity Contest, a site that tracks personal blogs on Hacker News and scores them based on how well they perform on that platform.
The engine behind the project is the domain-meta.csv CSV on GiHub, a hand-curated list of known personal blogs with author and bio and tag metadata, which Michael uses to separate out personal blog posts from other types of content.
I came top of the rankings in 2023, 2024 and 2025 but I'm listed in third place for all time behind Paul Graham and Brian Krebs.
I dug around in the browser inspector and was delighted to find that the data powering the site is served with open CORS headers, which means you can easily explore it with external services like Datasette Lite.
Here's a convoluted window function query Claude Opus 4.5 wrote for me which, for a given domain, shows where that domain ranked for each year since it first appeared in the dataset:
with yearly_scores as ( select domain, strftime('%Y', date) as year, sum(score) as total_score, count(distinct date) as days_mentioned from "hn-data" group by domain, strftime('%Y', date) ), ranked as ( select domain, year, total_score, days_mentioned, rank() over (partition by year order by total_score desc) as rank from yearly_scores ) select r.year, r.total_score, r.rank, r.days_mentioned from ranked r where r.domain = :domain and r.year >= ( select min(strftime('%Y', date)) from "hn-data" where domain = :domain ) order by r.year desc
(I just noticed that the last and r.year >= ( clause isn't actually needed here.)
My simonwillison.net results show me ranked 3rd in 2022, 30th in 2021 and 85th back in 2007 - though I expect there are many personal blogs from that year which haven't yet been manually added to Michael's list.
Also useful is that every domain gets its own CORS-enabled CSV file with details of the actual Hacker News submitted from that domain, e.g. https://hn-popularity.cdn.refactoringenglish.com/domains/simonwillison.net.csv. Here's that one in Datasette Lite.
2025
Under the hood of Canada Spends with Brendan Samek
I talked to Brendan Samek about Canada Spends, a project from Build Canada that makes Canadian government financial data accessible and explorable using a combination of Datasette, a neat custom frontend, Ruby ingestion scripts, sqlite-utils and pieces of LLM-powered PDF extraction.
[... 561 words]Highlights from my appearance on the Data Renegades podcast with CL Kao and Dori Wilson
I talked with CL Kao and Dori Wilson for an episode of their new Data Renegades podcast titled Data Journalism Unleashed with Simon Willison.
[... 2,964 words]Nano Banana Pro aka gemini-3-pro-image-preview is the best available image generation model
Hot on the heels of Tuesday’s Gemini 3 Pro release, today it’s Nano Banana Pro, also known as Gemini 3 Pro Image. I’ve had a few days of preview access and this is an astonishingly capable image generation model.
[... 1,641 words]How I automate my Substack newsletter with content from my blog
I sent out my weekly-ish Substack newsletter this morning and took the opportunity to record a YouTube video demonstrating my process and describing the different components that make it work. There’s a lot of digital duct tape involved, taking the content from Django+Heroku+PostgreSQL to GitHub Actions to SQLite+Datasette+Fly.io to JavaScript+Observable and finally to Substack.
[... 1,345 words]Datasette 1.0a22. New Datasette 1.0 alpha, adding some small features we needed to properly integrate the new permissions system with Datasette Cloud:
datasette serve --default-denyoption for running Datasette configured to deny all permissions by default. (#2592)datasette.is_client()method for detecting if code is executing inside a datasette.client request. (#2594)
Plus a developer experience improvement for plugin authors:
datasette.pmproperty can now be used to register and unregister plugins in tests. (#2595)
I've been upgrading a ton of Datasette plugins recently for compatibility with the Datasette 1.0a20 release from last week - 35 so far.
A lot of the work is very repetitive so I've been outsourcing it to Codex CLI. Here's the recipe I've landed on:
codex exec --dangerously-bypass-approvals-and-sandbox \
'Run the command tadd and look at the errors and then
read ~/dev/datasette/docs/upgrade-1.0a20.md and apply
fixes and run the tests again and get them to pass.
Also delete the .github directory entirely and replace
it by running this:
cp -r ~/dev/ecosystem/datasette-os-info/.github .
Run a git diff against that to make sure it looks OK
- if there are any notable differences e.g. switching
from Twine to the PyPI uploader or deleting code that
does a special deploy or configures something like
playwright include that in your final report.
If the project still uses setup.py then edit that new
test.yml and publish.yaml to mention setup.py not pyproject.toml
If this project has pyproject.toml make sure the license
line in that looks like this:
license = "Apache-2.0"
And remove any license thing from the classifiers= array
Update the Datasette dependency in pyproject.toml or
setup.py to "datasette>=1.0a21"
And make sure requires-python is >=3.10'I featured a simpler version of this prompt in my Datasette plugin upgrade video, but I've expanded it quite a bit since then.
At one point I had six terminal windows open running this same prompt against six different repos - probably my most extreme case of parallel agents yet.

Here are the six resulting commits from those six coding agent sessions:



