datasette-atom: Define an Atom feed using a custom SQL query
3rd December 2019
I’ve been having a ton of fun iterating on www.niche-museums.com. I put together some notes on how the site works last week, and I’ve been taking advantage of the Thanksgiving break to continue exploring ways in which Datasette can be used to quickly build database-backed static websites.
I post a new museum to the site every day, so it was inevitable that someone would ask for a feed. And here it is: an Atom feed for Niche Museums.
This means Niche Museums is effectively a blog now, which is fitting: it’s a universal truth that any sufficiently advanced backend technology will evolve to the point where it can power a blog with an Atom feed.
datasette-atom
I built the feed by wrapping up work on the first version of a new Datasette plugin: datasette-atom. It takes advantage of the register_output_renderer plugin hook, which was contributed by Russ Garrett back in May.
The idea with the plugin is to make it possible to construct an Atom feed from an arbitrary SQL query.
This is a really powerful ability. It means that a user with sufficent knowledge of SQL can subscribe to an arbitrary feed of data from any Datasette instance that is running the plugin.
Defining an Atom feed with a SQL query
The plugin works by requiring you to provide a SQL query that produces the following columns in its output:
atom_id
, atom_title
and atom_updated
.
These correspond to the required entry elements defined by the Atom specification.
The plugin can then render the results of the query as an Atom feed.
You can also produce an atom_link
column, which will become a link.
And finally, you can produce either an atom_content
column which will be treated as text and used as the feed entry body, or an atom_content_html
column which will be treated as HTML.
(The HTML from atom_content_html
is sanitized through Mozilla’s Bleach library to ensure the plugin doesn’t act as an XSS vector.)
This means we can define a custom Atom feed by crafting a SQL query! Here’s the query I’m using on the Niche Museums website:
select
'tag:niche-museums.com,' || substr(created, 0, 11) || ':' || id as atom_id,
name as atom_title,
created as atom_updated,
'https://www.niche-museums.com/browse/museums/' || id as atom_link,
coalesce(
'<img src="' || photo_url || '?w=800&h=400&fit=crop&auto=compress">',
''
) || '<p>' || description || '</p>' as atom_content_html
from
museums
order by
created desc
limit
15
I’m using a couple of extra tricks here.
The atom_id
is defined as a tag:uri following this advice from Mark Pilgrim—since created
is an ISO 8601 timestamp substr(created, 0, 11)
returns the YYYY-MM-DD
component.
Not every museum has a photo, which means photo_url
is sometimes null
. In SQL, concatenating a null value to something else (using the ||
concatenation operator) produces another null. So this entire expression evaluates to null if photo_url
is null:
'<img src="' || photo_url ||
'?w=800&h=400&fit=crop&auto=compress">',
The coalesce(x...)
function returns the first argument passed to it. So coalesce('<img ...>' || photo_url || '...', '')
returns the empty string if the photo is not available.
Deriving created/updated fields from Git history
For the atom feed to work, I need an atom_updated
value. This should be a timestamp representing “the last time the entry was modified in a significant way”—so it’s actually more like a created
timestamp for my museums website.
My museum data is defined in a YAML file—museums.yaml—which doesn’t include created and updated timestamps. So where can I get them from?
Since the YAML file is stored in the site’s GitHub repository, I’m deriving those timestamps from the git history. I repurposed code I wrote for my PG&E outages project for this—the full script is annotate _timestamps.py in the museums repo.
It works by looping through the entire history of the museums.yaml
file comparing the list of museums in each version to the previous iteration. If a museum is new (it has an ID not seen before) we use the commit date as its created
date. If the JSON serialization of the museum differs from the previous version we reset its updated
date.
Then at the end the script uses sqlite-utils to update each record with the derived timestamps:
# updated/created are dicts of {"id": "timestamp"}
db = sqlite_utils.Database("browse.db")
for id, ts in created.items():
db["museums"].update(id, {
"created": ts,
"updated": updated[id]
}, alter=True)
The alter=True
parameter to the .update()
method causes sqlite-utils
to automatically add any missing columns that are referenced in the update.
Defining a feed as a canned query
Now that we’ve defined the feed as a SQL query, we can assign it a more pleasing URL using Datasette’s canned queries feature.
I encoded the query as a JSON string using JSON Escape Text, then added it to the metadata.json configuration file for Niche Museums. I named the query feed
, resulting in a URL of www.niche-museums.com/browse/feed
.
There’s just one catch: Atom feeds should have a name. As a quick and nasty hack I allow the name to be set using ?_feed_name=Niche+Museums
. I have an open issue to come up with a less nasty way of defining this.
Also this week
I added a simple search engine to Niche Museums! Here’s an example search for “model”: www.niche-museums.com/browse/search?q=model.
I used sqlite-utils to configure SQLite FTS (here’s the line that calls it in the CI build script), defined another canned query in metadata.json
that executes the query and built a custom template to render the results page.
I added press coverage to Niche Museums. Many of the listings now link to articles in the local media about them.
Museum pages now link to other nearby museums. Here’s the commit that implemented that feature—it works by embedding a new SQL query in the template page, using datasette-template-sql.
I identified a bug in yaml-to-sqlite
and shipped a fix in version 0.3.
I added the following seven museums to the site:
- The Comic Rock Star’s Toilet Seat Museum in San Francisco
- Fabergé Museum, Baden-Baden in Baden-Baden, Germany
- jAdis in Santa Monica
- Lundy in the Bristol Channel
- Neon Works in Oakland
- Maison des Johnnies et de l’Oignon de Roscoff in Roscoff
- Bourton-on-the-Water Model Village in the Cotswolds
Goal for the next week: make some progress on projects that aren’t related to niche museums!
More recent articles
- ChatGPT Canvas can make API requests now, but it's complicated - 10th December 2024
- I can now run a GPT-4 class model on my laptop - 9th December 2024
- Prompts.js - 7th December 2024