W̶e̶e̶k̶n̶o̶t̶e̶s̶ Monthnotes for October

30th October 2024

I try to publish weeknotes at least once every two weeks. It’s been four since the last entry, so I guess this one counts as monthnotes instead.

In my defense, the reason I’ve fallen behind on weeknotes is that I’ve been publishing a lot of long-form blog entries this month.

Plentiful LLM vendor news

A lot of LLM stuff happened. OpenAI had their DevDay, which I used as an opportunity to try out live blogging for the first time. I figured out video scraping with Google Gemini and generally got excited about how incredibly inexpensive the Gemini models are. Anthropic launched Computer Use and JavaScript analysis, and the month ended with GitHub Universe.

My big achievement of the month was finally shipping multi-modal support for my LLM tool. This has been almost a year in the making: GPT-4 vision kicked off the new era of vision LLMs at OpenAI DevDay last November and I’ve been watching the space with keen interest ever since.

I had a couple of false starts at the feature, which was difficult at first because LLM acts as a cross-model abstraction layer, and it’s hard to design those effectively without plenty of examples of different models.

Initially I thought the feature would just be for images, but then Google Gemini launched the ability to feed in PDFs, audio files and videos as well. That’s why I renamed it from -i/--image to -a/--attachment—I’m glad I hadn’t committed to the image UI before realizing that file attachments could be so much more.

I’m really happy with how the feature turned out. The one missing piece at the moment is local models: I prototyped some incomplete local model plugins to verify the API design would work, but I’ve not yet pushed any of them to a state where I think they’re ready to release. My research into mistral.rs was part of that process.

Now that attachments have landed I’m free to start thinking about the next major LLM feature. I’m leaning towards tool usage: enough models have tool use / structured output capabilities now that I think I can design an abstraction layer that works across all of them. The combination of tool use with LLM’s plugin system is really fun to think about.

Blog entries

Releases

llm-mistral 0.7—2024-10-29
LLM plugin providing access to Mistral models using the Mistral API
llm-claude-3 0.6—2024-10-29
LLM plugin for interacting with the Claude 3 family of models
llm-gemini 0.3—2024-10-29
LLM plugin to access Google’s Gemini family of models
llm 0.17—2024-10-29
Access large language models from the command-line
llm-whisper-api 0.1.1—2024-10-27
Run transcriptions using the OpenAI Whisper API
llm-jq 0.1.1—2024-10-27
Write and execute jq programs with the help of LLM
claude-to-sqlite 0.2—2024-10-21
Convert a Claude.ai export to SQLite
files-to-prompt 0.4—2024-10-16
Concatenate a directory full of files into a single prompt for use with LLMs
datasette-examples 0.1a0—2024-10-08
Load example SQL scripts into Datasette on startup
datasette 0.65—2024-10-07
An open source multi-tool for exploring and publishing data

TILs

Installing flash-attn without compiling it—2024-10-25
Using uv to develop Python command-line applications—2024-10-24
Setting cache-control: max-age=31536000 with a Cloudflare Transform Rule—2024-10-24
Running prompts against images, PDFs, audio and video with Google Gemini—2024-10-23
The most basic possible Hugo site—2024-10-23
Livestreaming a community election event on YouTube—2024-10-10
Upgrading Homebrew and avoiding the failed to verify attestation error—2024-10-09
Collecting replies to tweets using JavaScript—2024-10-09
Compiling and running sqlite3-rsync—2024-10-04
Building an automatically updating live blog in Django—2024-10-02

Posted 30th October 2024 at 4:20 am · Follow me on Mastodon, Bluesky, Twitter or subscribe to my newsletter

Simon Willison’s Weblog