22nd October 2023
I’ve had a bit of a slow two weeks in terms of building things and writing code, thanks mainly to a couple of conference appearances. I did review and land a couple of major contributions to Datasette though.
I gave a talk at PyBay 2023 called “Embeddings: What they are and why they matter”, digging into the weird and fun world of word embeddings (see previous posts). I’ll be posting detailed notes from that talk tomorrow.
A couple of days after that I gave the closing keynote at the AI Engineer Summit, where I tried to do justice both to the summit and the previous year of developments in AI—no small challenge!
I’ve published detailed slides and an annotated transcript to accompany that talk: Open questions for AI engineering.
Alex Garcia has been driving a major improvement to Datasette in preparation for the 1.0 release: cleaning up Datasette’s ungainly metadata system.
Metadata in Datasette was originally meant to support adding data about data—the license, source and description of data exposed through a Datasette instance.
Over time it grew in weird and unintuitive directions. Today, metadata can also be used to configure plugins, provide table-level settings, define canned queries and even control how Datasette’s authentication system works.
The name no longer fits!
Alex is fixing this by splitting all of those non-metadata parts of metadata out into a new, separate configuration file, which we’ve agreed should be called
This week we landed a big piece of this: Move permissions, allow blocks, canned queries and more out of metadata.yaml and into datasette.yaml.
There’s a bit more work to do on this: in particular, I need to upgrade the
datasette publish command to support deploying instances with the new configuration file. I’ll be shipping an alpha release as soon as that work is complete.
The other major contribution this week was authored by Cameron Yick.
I hope to get documentation and some example plugins working on top of this in time for the next Datasette alpha release.
Releases this week
Datasette plugin adding a llm_embed(model_id, text) SQL function
An open source multi-tool for exploring and publishing data
TIL this week
- Deploying the CLIP embedding model on Fly—2023-10-18
More recent articles
- Datasette Enrichments: a new plugin framework for augmenting your data - 1st December 2023
- llamafile is the new best way to run a LLM on your own computer - 29th November 2023
- Prompt injection explained, November 2023 edition - 27th November 2023
- I'm on the Newsroom Robots podcast, with thoughts on the OpenAI board - 25th November 2023
- Weeknotes: DevDay, GitHub Universe, OpenAI chaos - 22nd November 2023
- Deciphering clues in a news article to understand how it was reported - 22nd November 2023
- Exploring GPTs: ChatGPT in a trench coat? - 15th November 2023
- Financial sustainability for open source projects at GitHub Universe - 10th November 2023
- ospeak: a CLI tool for speaking text in the terminal via OpenAI - 7th November 2023
- DALL-E 3, GPT4All, PMTiles, sqlite-migrate, datasette-edit-schema - 30th October 2023