Simon Willison’s Weblog

Subscribe

Weeknotes: Datasette 0.41, photos breakthroughs

7th May 2020

Shorter weeknotes this week, because my main project for the week warrants a detailed write-up on its own (coming soon... update 21st May here it is).

Datasette 0.41

I released Datasette 0.41 this morning. Highlights include:

  • Custom pages and configuration directory mode, both described here last week
  • A new NOT LIKE table filter expression, ?colname__notlike=X#750
  • A CSS pattern portfolio! I’ve been meaning to ship one of these for over two years now—you can see it at latest.datasette.io/-/patterns#151
  • Various minor fixes and documentation improvements
  • PRAGMA function support—#761

A late addition to the release—thought of and implemented just this morning, as the result of this TIL, is the ability to use various SQLite PRAGMA functions.

Datasette blocks user-provided SQL queries containing the term PRAGMA because it can be used to dynamically update various database configuration settings, and Datasette likes to keep queries read-only.

But... SQLite provides access to a number of useful introspection capabilities through a collection of PRAGMA functions. Things like this:

select * from pragma_foreign_key_list('complex_foreign_keys')

These are harmless—the SQLite documentation specifically states that “Table-valued functions exist only for PRAGMAs that return results and that have no side-effects”.

SO I enabled them using a negative lookahead assertion in the regular expression that prevents PRAGMA from being used:

allowed_pragmas = (
    "database_list",
    "foreign_key_list",
    "function_list",
    "index_info",
    "index_list",
    "index_xinfo",
    "page_count",
    "max_page_count",
    "page_size",
    "schema_version",
    "table_info",
    "table_xinfo",
)
disallawed_sql_res = [
    (
        re.compile("pragma(?!_({}))".format("|".join(allowed_pragmas))),
        "Statement may not contain PRAGMA",
    )
]

The regular expression here is dynamically composed from the list of allowed PRAGMA functions—it looks something like this:

re.compile("pragma(?!_(database_list|foreign_key_list|...))")

In Python regular expressions pragme(?!_foo) means "pragma but only if not followed by _foo".

Plotting foreign key relationships with Observable and D3

With PRAGMA functions support shipped in Datasette 0.41 it’s now possible to make use of them in queries. Here’s an example query which finds all foreign key relationships in a database:

select
    sqlite_master.name as table_from,
    fk_info.[from] as column_from,
    fk_info.[table] as table_to,
    fk_info.[to] as column_to
from
    sqlite_master
join
    pragma_foreign_key_list(sqlite_master.name) as fk_info
order by
    sqlite_master.name

Here’s that query run against my github-to-sqlite demo database. Exported as JSON, the results look like this:

[
    {
        "table_from": "commits",
        "column_from": "committer",
        "table_to": "users",
        "column_to": "id"
    },
    {
        "table_from": "commits",
        "column_from": "author",
        "table_to": "users",
        "column_to": "id"
    }
    ...
]

I used this to put together an Observable Notebook which takes this data (from any public Datasette instance if you provide it with a URL) and renders those relationships as a force-directed graph using D3.

Force-directed D3 graph of foreign keys

Apple Photos breakthrough

I’ll write about this more in a separate post, but I had a huge breakthrough with my photos-to-sqlite project this week.

The goal of the project is to give my SQL access to the photos I’ve taken. I use Apple Photos for this, and my holy grail has always been the metadata that Apple generate around my photos using machine learning.

Rhet Turnbull’s fantastic osxphotos project gave me my first breakthrough, allowing me to easily start importing albums, people (from on-device facial recognition) and places.

Apple run sophisticated label classification against my photos on device—so if I search for “dog” they show me all of the photos I’ve taken of dogs. I was certain that this information was in a SQLite database somewhere, and this week I finally figured out where: the ~/Pictures/Photos\ Library.photoslibrary/database/search/psi.sqlite file.

After a whole lot of digging around I worked out how to join it against my photos database, and as a result I can now run SQL queries that select all of my photos of dogs! Or seals! Or pelicans!

Combine that with some intriguing automated photo quality scores I found (order by ZPLEASANTCOMPOSITIONSCORE is a thing) and my Dogsheep photos solution is beginning to take shape. I’ll write it up in more detail over the next few days.

Other projects

  • conditional-get 0.2a now streams large downloads to disk and shows a progress bar in -v mode.
  • datasette-atom 0.5 adds support for the Atom author element, so you can add author information to feed entries.
  • sqlite-utils went through versions 2.7, 2.7.1, 2.7.2 and 2.8—see release notes.
  • github-to-sqlite 2.1 and 2.2 introduced two new commands:
    • github-to-sqlite scrape-dependents scrapes the GitHub dependents page (which isn’t yet in an API) to gather details of repos that depend on your repos
    • github-to-sqlite stargazers fetches the users who have starred a specific list of repositories
    • I also added several new SQL views for better exploring the retrieved data

My github-to-sqlite demo is updated daily via a scheduled GitHub Action and retrieves all kinds of data about my various Dogsheep projects plus sqlite-utils and Datasette. Being able to see things like the most recent dependent repos across my projects in one place is extremely useful.

TIL this week