Simon Willison’s Weblog

Subscribe

Weeknotes: asynchronous LLMs, synchronous embeddings, and I kind of started a podcast

22nd November 2024

These past few weeks I’ve been bringing Datasette and LLM together and distracting myself with a new sort-of-podcast crossed with a live streaming experiment.

Project: interviewing people about their projects

My response to the recent US election was to stress-code, and then to stress-podcast. On the morning after the election I started a video series called Project (I guess you could call it a “vlog”?) where I interview people about their interesting data projects. The first episode was with Rajiv Sinclair talking about his project VERDAD, tracking misinformation on US broadcast radio. The second was with Philip James talking about Civic Band, his project to scrape and search PDF meeting minutes and agendas from US local municipalities.

I was a guest on another podcast-like thing too: an Ars Technica Live sesison with Benj Edwards, which I wrote about in Notes from Bing Chat—Our First Encounter With Manipulative AI.

Datasette Public Office Hours

I also started a new thing with Alex Garcia called Datasette Public Office Hours, which we plan to run approximately once every two weeks as a live-streamed Friday conversation about Datasette and related projects. I wrote up our first session in Visualizing local election results with Datasette, Observable and MapLibre GL. The Civic Band interview was part of our second session—I still need to write about the rest of that session about sqlite-vec, embeddings and some future Datasette AI features, but you can watch the full video on YouTube.

Async LLM

I need to write this up in full, but last weekend I quietly released LLM 0.18 with a huge new feature: plugins can now provide asynchronous versions of their models, ready to be used with Python’s asyncio. I built this for Datasette, which is built entirely around ASGI and needs to be able to run LLM models asynchronously to enable all sorts of interesting AI features.

LLM provides async OpenAI models, and I’ve also versions of the llm-gemini, llm-claude-3 and llm-mistral plugins that enable async models as well.

Here’s the documentation, but the short version is that you can now do this:

import llm

model = llm.get_async_model("claude-3.5-sonnet")

async for chunk in model.prompt(
    "Five surprising names for a pet pelican"
):
    print(chunk, end="", flush=True)

I’ve also been working on adding token accounting to LLM, to keep track of how many input and output tokens a prompt has used across multiple different models. I have an alpha release with that but it’s not yet fully stable.

The reason I want that is that I need it for both Datasette and Datasette Cloud. I want the ability to track token usage and grant users a free daily allowance of tokens that gets cut off once they’ve exhausted it. That’s an active project right now, more on that once it’s ready to ship in a release.

Various embedding models

LLM doesn’t yet offer asynchronous embeddings (see issue #628) but I’ve found myself hacking on a few different embeddings plugins anyway:

  • llm-gguf now supports embedding models distributed as GGUF files. This means you can use the excitingly small (just 30.8MB) mxbai-embed-xsmall-v1 with LLM.
  • llm-nomic-api-embed added support for the Nomic Embed Vision models. These work like CLIP in that you can embed both images and text in the same space, allowing you to do similarity search of a text string against a collection of images.

Blog entries

Releases

TILs

This is Weeknotes: asynchronous LLMs, synchronous embeddings, and I kind of started a podcast by Simon Willison, posted on 22nd November 2024.

Previous: Notes from Bing Chat—Our First Encounter With Manipulative AI