Weeknotes: software carpentry, compiling modules for SQLite
26th September 2020
This week I completed the Software Carpentry instructor training course, added two foundational features to
sqlite-utils and learned how to compile modules for SQLite.
This week I took the two day instructor training course put on by Software Carpentry. I’ve been wanting to do this for over a year now, so I’m excited to have finally found the time to attend the workshop.
The Carpentries is a really interesting non-profit organization. Their mission is to “teach foundational coding and data science skills to researchers worldwide”—but I’ve always thought of them as teaching software engineering fundamentals to scientists—things like version control!
It turns out I’ve been following the carpentries project for fifteen years now, ever since their founder Greg Wilson started writing about his efforts to bring software engineering to scientists on his blog.
As you would expect from an organization that specializes in workshop education, the workshop they put on teaching you how to run workshops is top notch. Their curriculum materials, forged by a process of continuing tweaks over hundreds if not thousands of presentations, are superb. I aspire to produce educational documentation this good some day.
An interesting aspect of Carpentries is that every workshop is presented using live coding—no slides! The instructor works through the same material as the learners, talking through each line of code as they run it. The kind of thing you would normally use slides for (diagrams etc) instead lives in the workshop handouts. Take a look through their Databases and SQL course to get an idea of how that works.
I have to complete a couple more steps, but once I’ve done that I’ll be a certified instructor for the program—which means I can volunteer to help run their workshops in the future.
I had two encounters with compiled SQLite modules this week.
One of my favourite lesser-known features of PostgreSQL is trigram indexes. The short version is: they make
where text like '%something%' queries run crazy-fast. Normal indexes just speed up
prefix% like searches, but trigrams work for strings in the middle of other strings as well.
I started a thread on the official SQLite forum asking about trigram index support, and proposed that maybe it could be achieveable using a custom SQLite FTS tokenizer. Core SQLite maintainer Dan Kennedy replied with a complete, working implementation of exactly that, written in C!
With Dan’s permission I’ve published his code in a sqlite-fts5-trigram GitHub repository. In doing so I figured out how to build it for macOS, and then hooked that build script up to a GitHub Actions workflow that produces a compiled
ftstri.so module ready for anyone on macOS to run themselves.
Just one problem: it turns out binary modules like that need to be signed for macOS before anyone can run them. That’s a step beyond me at the moment, but I may revisit that in the future.
The other complied module I learned to use this week is
spellfix.c, see my TIL Compiling the SQLite spellfix.c module on macOS. I needed these because I’ve started expereminting with Plex as a home media server, and it turns out Plex runs on SQLite! The catch is that it uses the
spellfix module, which means you can’t open the Plex database in Datasette without first compiling that module for use with
--load-extension—see the TIL for details.
sqlite-utils 2.20 and 2.21, and wrote about the big new features in two blog entries here:
- Refactoring databases with sqlite-utils extract
Executing advanced ALTER TABLE operations in SQLite using
The extract operation in sqlite-utils 2.20 was taking 12 minutes to execute against a table with 680,000 rows. I had a rethink of how it worked and got that down to just 4 seconds in sqlite-utils 2.21!
I worked on these as part of my ongoing exploration of Datasette and
sqlite-utils as tools for cleaning up and refactoring data. They fill some key holes in the tooling I’ve built so far.
I’m already using transform to build out the next phase of my
datasette-edit-tables plugin. See issue #11 for notes on work-in-progress support for editing table schemas directly in Datasette.
TIL this week
- Turning on Jinja autoescaping when using Template() directly
- Compiling the SQLite spellfix.c module on macOS
- Understanding option names in Click
Releases this week
More recent articles
- Datasette Enrichments: a new plugin framework for augmenting your data - 1st December 2023
- llamafile is the new best way to run a LLM on your own computer - 29th November 2023
- Prompt injection explained, November 2023 edition - 27th November 2023
- I'm on the Newsroom Robots podcast, with thoughts on the OpenAI board - 25th November 2023
- Weeknotes: DevDay, GitHub Universe, OpenAI chaos - 22nd November 2023
- Deciphering clues in a news article to understand how it was reported - 22nd November 2023
- Exploring GPTs: ChatGPT in a trench coat? - 15th November 2023
- Financial sustainability for open source projects at GitHub Universe - 10th November 2023
- ospeak: a CLI tool for speaking text in the terminal via OpenAI - 7th November 2023
- DALL-E 3, GPT4All, PMTiles, sqlite-migrate, datasette-edit-schema - 30th October 2023