Simon Willison’s Weblog

Subscribe

W̶e̶e̶k̶n̶o̶t̶e̶s̶ Monthnotes for October

30th October 2024

I try to publish weeknotes at least once every two weeks. It’s been four since the last entry, so I guess this one counts as monthnotes instead.

In my defense, the reason I’ve fallen behind on weeknotes is that I’ve been publishing a lot of long-form blog entries this month.

Plentiful LLM vendor news

A lot of LLM stuff happened. OpenAI had their DevDay, which I used as an opportunity to try out live blogging for the first time. I figured out video scraping with Google Gemini and generally got excited about how incredibly inexpensive the Gemini models are. Anthropic launched Computer Use and JavaScript analysis, and the month ended with GitHub Universe.

My LLM tool goes multi-modal

My big achievement of the month was finally shipping multi-modal support for my LLM tool. This has been almost a year in the making: GPT-4 vision kicked off the new era of vision LLMs at OpenAI DevDay last November and I’ve been watching the space with keen interest ever since.

I had a couple of false starts at the feature, which was difficult at first because LLM acts as a cross-model abstraction layer, and it’s hard to design those effectively without plenty of examples of different models.

Initially I thought the feature would just be for images, but then Google Gemini launched the ability to feed in PDFs, audio files and videos as well. That’s why I renamed it from -i/--image to -a/--attachment—I’m glad I hadn’t committed to the image UI before realizing that file attachments could be so much more.

I’m really happy with how the feature turned out. The one missing piece at the moment is local models: I prototyped some incomplete local model plugins to verify the API design would work, but I’ve not yet pushed any of them to a state where I think they’re ready to release. My research into mistral.rs was part of that process.

Now that attachments have landed I’m free to start thinking about the next major LLM feature. I’m leaning towards tool usage: enough models have tool use / structured output capabilities now that I think I can design an abstraction layer that works across all of them. The combination of tool use with LLM’s plugin system is really fun to think about.

Blog entries

Releases

  • llm-mistral 0.7—2024-10-29
    LLM plugin providing access to Mistral models using the Mistral API
  • llm-claude-3 0.6—2024-10-29
    LLM plugin for interacting with the Claude 3 family of models
  • llm-gemini 0.3—2024-10-29
    LLM plugin to access Google’s Gemini family of models
  • llm 0.17—2024-10-29
    Access large language models from the command-line
  • llm-whisper-api 0.1.1—2024-10-27
    Run transcriptions using the OpenAI Whisper API
  • llm-jq 0.1.1—2024-10-27
    Write and execute jq programs with the help of LLM
  • claude-to-sqlite 0.2—2024-10-21
    Convert a Claude.ai export to SQLite
  • files-to-prompt 0.4—2024-10-16
    Concatenate a directory full of files into a single prompt for use with LLMs
  • datasette-examples 0.1a0—2024-10-08
    Load example SQL scripts into Datasette on startup
  • datasette 0.65—2024-10-07
    An open source multi-tool for exploring and publishing data

TILs

More recent articles

This is W̶e̶e̶k̶n̶o̶t̶e̶s̶ Monthnotes for October by Simon Willison, posted on 30th October 2024.

Next: Claude 3.5 Haiku

Previous: You can now run prompts against images, audio and video in your terminal using LLM