W̶e̶e̶k̶n̶o̶t̶e̶s̶ Monthnotes for October
30th October 2024
I try to publish weeknotes at least once every two weeks. It’s been four since the last entry, so I guess this one counts as monthnotes instead.
In my defense, the reason I’ve fallen behind on weeknotes is that I’ve been publishing a lot of long-form blog entries this month.
Plentiful LLM vendor news
A lot of LLM stuff happened. OpenAI had their DevDay, which I used as an opportunity to try out live blogging for the first time. I figured out video scraping with Google Gemini and generally got excited about how incredibly inexpensive the Gemini models are. Anthropic launched Computer Use and JavaScript analysis, and the month ended with GitHub Universe.
My LLM tool goes multi-modal
My big achievement of the month was finally shipping multi-modal support for my LLM tool. This has been almost a year in the making: GPT-4 vision kicked off the new era of vision LLMs at OpenAI DevDay last November and I’ve been watching the space with keen interest ever since.
I had a couple of false starts at the feature, which was difficult at first because LLM acts as a cross-model abstraction layer, and it’s hard to design those effectively without plenty of examples of different models.
Initially I thought the feature would just be for images, but then Google Gemini launched the ability to feed in PDFs, audio files and videos as well. That’s why I renamed it from -i/--image
to -a/--attachment
—I’m glad I hadn’t committed to the image UI before realizing that file attachments could be so much more.
I’m really happy with how the feature turned out. The one missing piece at the moment is local models: I prototyped some incomplete local model plugins to verify the API design would work, but I’ve not yet pushed any of them to a state where I think they’re ready to release. My research into mistral.rs was part of that process.
Now that attachments have landed I’m free to start thinking about the next major LLM feature. I’m leaning towards tool usage: enough models have tool use / structured output capabilities now that I think I can design an abstraction layer that works across all of them. The combination of tool use with LLM’s plugin system is really fun to think about.
Blog entries
- You can now run prompts against images, audio and video in your terminal using LLM
- Run a prompt to generate and execute jq programs using llm-jq
- Notes on the new Claude analysis JavaScript code execution tool
- Initial explorations of Anthropic’s new Computer Use capability
- Everything I built with Claude Artifacts this week
- Running Llama 3.2 Vision and Phi-3.5 Vision on a Mac with mistral.rs
- Experimenting with audio input and output for the OpenAI Chat Completion API
- Video scraping: extracting JSON data from a 35 second screen capture for less than 1/10th of a cent
- ChatGPT will happily write you a thinly disguised horoscope
- OpenAI DevDay: Let’s build developer tools, not digital God
- OpenAI DevDay 2024 live blog
Releases
-
llm-mistral 0.7—2024-10-29
LLM plugin providing access to Mistral models using the Mistral API -
llm-claude-3 0.6—2024-10-29
LLM plugin for interacting with the Claude 3 family of models -
llm-gemini 0.3—2024-10-29
LLM plugin to access Google’s Gemini family of models -
llm 0.17—2024-10-29
Access large language models from the command-line -
llm-whisper-api 0.1.1—2024-10-27
Run transcriptions using the OpenAI Whisper API -
llm-jq 0.1.1—2024-10-27
Write and execute jq programs with the help of LLM -
claude-to-sqlite 0.2—2024-10-21
Convert a Claude.ai export to SQLite -
files-to-prompt 0.4—2024-10-16
Concatenate a directory full of files into a single prompt for use with LLMs -
datasette-examples 0.1a0—2024-10-08
Load example SQL scripts into Datasette on startup -
datasette 0.65—2024-10-07
An open source multi-tool for exploring and publishing data
TILs
- Installing flash-attn without compiling it—2024-10-25
- Using uv to develop Python command-line applications—2024-10-24
- Setting cache-control: max-age=31536000 with a Cloudflare Transform Rule—2024-10-24
- Running prompts against images, PDFs, audio and video with Google Gemini—2024-10-23
- The most basic possible Hugo site—2024-10-23
- Livestreaming a community election event on YouTube—2024-10-10
- Upgrading Homebrew and avoiding the failed to verify attestation error—2024-10-09
- Collecting replies to tweets using JavaScript—2024-10-09
- Compiling and running sqlite3-rsync—2024-10-04
- Building an automatically updating live blog in Django—2024-10-02
More recent articles
- You can now run prompts against images, audio and video in your terminal using LLM - 29th October 2024
- Run a prompt to generate and execute jq programs using llm-jq - 27th October 2024