Simon Willison’s Weblog

Subscribe

Weeknotes: python_requires, documentation SEO

25th January 2022

Fixed Datasette on Python 3.6 for the last time. Worked on documentation infrastructure improvements. Spent some time with Fly Volumes.

Datasette 0.60.1 for Python 3.6

I got a report that users of Python 3.6 were seeing errors when they tried to install Datasette.

I actually dropped support for 3.6 a few weeks ago, but that shouldn’t have affected the already released Datasette 0.60—so something was clearly wrong.

This lead me to finally get my head around how pip install handles Python version support. It’s actually a very neat system which I hadn’t previously taken the time to understand.

Python packages can (and should!) provide a python_requires= line in their setup.py. That line for Datasette currently looks like this:

python_requires=">=3.7"

But in the 0.60 release it was still this:

python_requires=">=3.6"

When you run pip install package this becomes part of the pip resolution mechanism—it will default to attempting to install the highest available version of the package that supports your version of Python.

So why did pip install datasette break? It turned out that one of Datasette’s dependencies, Uvicorn, had dropped support for Python 3.6 but did not have a python_requires indicator that pip could use to resolve the correct version.

Coincidentally, Uvicorn actually added python_requires just a few weeks ago—but it wasn’t out in a release yet, so pip install couldn’t take it into account.

I raised this issue with the Uvicorn development team and they turned around a fix really promptly—0.17.0.post1.

But before I had seen how fast the Uvicorn team could move I figured out how to fix the issue myself, thanks to a tip from Sam Hames on Twitter.

The key to fixing it was environment markers, a feature of Python’s dependency resolution system that allows you to provide extra rules for when a dependency should be used.

Here’s an install_requires= example showing these in action:

install_requires=[
    "uvicorn~=0.11",
    'uvicorn<=0.16.0;python_version<="3.6"'
]

This will install a Uvicorn version that loosely matches 0.11, but over-rides that rule to specify that it must be <=0.16.0 if the user’s Python version is 3.6 or less.

Since Datasette 0.60.1 will be the last version of Datasette to support Python 3.6, I decided to play it safe and pin the dependencies of every library to the most recent version that I have tested in Python 3.6 myself. Here’s the setup.py file I constructed for that.

This ties into a larger open question for me about Datasette’s approach to pinning dependencies.

The rule of thumb I’ve heard is that you should pin dependencies for standalone released tools but leave dependencies loose for libraries that people will use as dependencies in their own projects—ensuring those users can run with different dependency versions if their projects require them.

Datasette is mostly a standalone tool—but it can also be used as a library. I’m actually planning to make its use as a library more obvious through improvements to the documentation in the future.

As such, pinning exact versions doesn’t feel quite right to me.

Maybe the solution here is to split the reusable library parts of Datasette out into a separate package—maybe datasette-core—and have the datasette CLI package depend on exact pinned dependencies while the datasette-core library uses loose dependencies instead.

Still thinking about this.

Datasette documentation tweaks

Datasette uses Read The Docs to host the documentation. Among other benefits, this makes it easy to host multiple documentation versions:

  • docs.datasette.io/en/latest/ is the latest version of the documentation, continuously deployed from the main branch on GitHub
  • docs.datasette.io/en/stable/ is the documentation for the most recent stable (non alpha or beta) release—currently 0.60.1. This is the version you get when you run pip install datasette.
  • docs.datasette.io/en/0.59/ is the documentation for version 0.59—and every version back to 0.22.1 is hosted under similar URLs, currently covering 73 different releases.

Those previous versions all automatically show a note at the top of the page warning that this is out-dated documentation and linking back to stable—a feature which Read The Docs provides automatically.

But... I noticed that /en/latest/ didn’t do this. I wanted a warning banner to let people know that they were looking at the in-development version of the documentation.

After some digging around, I fixed it using a little bit of extra JavaScript added to my documentation template. Here’s the key implementation detail:

jQuery(function ($) {
  // Show banner linking to /stable/ if this is a /latest/ page
  if (!/\/latest\//.test(location.pathname)) {
    return;
  }
  var stableUrl = location.pathname.replace("/latest/", "/stable/");
  // Check it's not a 404
  fetch(stableUrl, { method: "HEAD" }).then((response) => {
    if (response.status == 200) {
      // Page exists, insert a warning banner linking to it

This uses fetch() to make an HTTP HEAD request for the stable documentation page, and inserts a warning banner only if that page isn’t a 404. This avoids linking to a non-existant documentation page if it has been created in development but not yet released as part of a stable release. Example here.

Screenshot of the documentation page with a banner that says: This documentation covers the development version of Datasette. See this page for the current stable release.

Thinking about this problem got me thinking about SEO.

A problem I’ve had with other projects that host multiple versions of their documentation is that sometimes I’ll search on Google and end up landing on a page covering a much older version of their project. I think I’ve had this happen for both PostgreSQL and Python in the past.

What’s best practice for avoiding this? I asked on Twitter and also started digging around for answers. “If in doubt, imitate Django” is a good general rule of thumb, so I had a look at how Django did this and spotted the following in the HTML of one of their prior version pages:

<link rel="canonical" href="https://docs.djangoproject.com/en/4.0/topics/db/">

So Django are using the rel=canonical tag to point crawlers towards their most recent release.

I decided to implement that myself... and then discovered that the Datasette documentation was doing it already! Read The Docs implement this piece of SEO best practice out of the box.

Datasette on Fly volumes

This one isn’t released yet, but I made some good progress on it this week.

Fly.io announced this week that they would be providing 3GB of volume storage to accounts on their free tier. They called this announcement Free Postgres Databases, but tucked away in the blog post was this:

The lede is “free Postgres” because that’s what matters to full stack apps. You don’t have to use these for Postgres. If SQLite is more your jam, mount up to 3GB of volumes and use “free SQLite.” Yeah, we’re probably underselling that.

(There is evidence that they may have been nerd sniping me with that paragraph.)

I have a plugin called datasette-publish-fly which publishes Datasette instances to Fly. Obviously that needs to grow support for configuring volumes!

I’ve so far completed the research on how that feature should work. The next step is to finish implementing the feature.

sqlite-utils --help

I pushed out minor release sqlite-utils 3.22.1 today with one notable improvement: every single one of the 39 commands in the CLI tool now includes an example of usage as part of the --help text.

This feature was inspired by the new CLI reference page in the documentation, which shows the help output for every command on one page—making it much easier to spot potential improvements.

Releases this week

TIL this week