Simon Willison’s Weblog


377 items tagged “projects”


Datasette Cloud and the Datasette 1.0 alphas. I sent out the Datasette Newsletter for the first time in quite a while, with updates on Datasette Cloud, the Datasette 1.0 alphas, a note about the security vulnerability in those alphas and a summary of some of my research into combining LLMs with Datasette. # 22nd August 2023, 7:56 pm

Datasette Cloud, Datasette 1.0a3, llm-mlc and more

Datasette Cloud is now a significant step closer to general availability. The Datasette 1.03 alpha release is out, with a mostly finalized JSON format for 1.0. Plus new plugins for LLM and sqlite-utils and a flurry of things I’ve learned.

[... 1690 words]

Welcome to Datasette Cloud. We launched the Datasette Cloud blog today! The SaaS hosted version of Datasette is ready to start onboarding more users—this post describes what it can do so far and hints at what’s planned to come next. # 16th August 2023, 1:46 am

llm-mlc (via) My latest plugin for LLM adds support for models that use the MLC Python library—which is the first library I’ve managed to get to run Llama 2 with GPU acceleration on my M2 Mac laptop. # 12th August 2023, 5:33 am

Datasette 1.0a3. A new Datasette alpha release. This one previews the new default JSON API design that’s coming in 1.0—the single most significant change in the 1.0 milestone, since I plan to keep that API stable for many years to come. # 9th August 2023, 8:49 pm

How I make annotated presentations

Giving a talk is a lot of work. I go by a rule of thumb I learned from Damian Conway: a minimum of ten hours of preparation for every one hour spent on stage.

[... 2122 words]

Weeknotes: Plugins for LLM, sqlite-utils and Datasette

The principle theme for the past few weeks has been plugins.

[... 1203 words]

Run Llama 2 on your own Mac using LLM and Homebrew

Llama 2 is the latest commercially usable openly licensed Large Language Model, released by Meta AI a few weeks ago. I just released a new plugin for my LLM utility that adds support for Llama 2 and many other llama-cpp compatible models.

[... 1423 words]

asgi-replay. As part of submitting LLM to Homebrew core I needed an automated test that demonstrated that the tool was working—but I couldn’t test against the live OpenAI API because I didn’t want to have to reveal my API token as part of the test. I solved this by creating a dummy HTTP endpoint that simulates a hit to the OpenAI API, then configuring the Homebrew test to hit that instead. As part of THAT I ended up building this tiny tool which uses my asgi-proxy-lib package to intercept and log the details of hits made to a service, then provides a mechanism to replay that traffic. # 24th July 2023, 7:51 pm

LLM can now be installed directly from Homebrew (via) I spent a bunch of time on this at the weekend: my LLM tool for interacting with large language models from the terminal has now been accepted into Homebrew core, and can be installed directly using “brew install llm”. I was previously running my own separate tap, but having it in core means that it benefits from Homebrew’s impressive set of build systems—each release of LLM now has Bottles created for it automatically across a range of platforms, so “brew install llm” should quickly download binary assets rather than spending several minutes installing dependencies the slow way. # 24th July 2023, 5:16 pm

sqlite-utils now supports plugins

sqlite-utils 3.34 is out with a major new feature: support for plugins.

[... 1327 words]

Accessing Llama 2 from the command-line with the llm-replicate plugin

The big news today is Llama 2, the new openly licensed Large Language Model from Meta AI. It’s a really big deal:

[... 1206 words]

Weeknotes: Self-hosted language models with LLM plugins, a new Datasette tutorial, a dozen package releases, a dozen TILs

A lot of stuff to cover from the past two and a half weeks.

[... 1742 words]

My LLM CLI tool now supports self-hosted language models via plugins

LLM is my command-line utility and Python library for working with large language models such as GPT-4. I just released version 0.5 with a huge new feature: you can now install plugins that add support for additional models to the tool, including models that can run on your own hardware.

[... 1656 words]

Weeknotes: symbex, LLM prompt templates, a bit of a break

I had a holiday to the UK for a family wedding anniversary and mostly took the time off... except for building symbex, which became one of those projects that kept on inspiring new features.

[... 1120 words]

Symbex: search Python code for functions and classes, then pipe them into a LLM

I just released a new Python CLI tool called Symbex. It’s a search tool, loosely inspired by ripgrep, which lets you search Python code for functions and classes by name or wildcard, then see just the source code of those matching entities.

[... 1183 words]

LLM 0.4. I released a major update to my LLM CLI tool today—version 0.4, which adds conversation mode and prompt templates so you can store and re-use interesting prompts, plus a whole bunch of other large and small improvements.

I also released 0.4.1 with some minor fixes and the ability to install the tool using Hombrew: brew install simonw/llm/llm # 17th June 2023, 10:58 pm

Understanding GPT tokenizers

Large language models such as GPT-3/4, LLaMA and PaLM work in terms of tokens. They take text, convert it into tokens (integers), then predict which tokens should come next.

[... 1570 words]

Weeknotes: Parquet in Datasette Lite, various talks, more LLM hacking

I’ve fallen a bit behind on my weeknotes. Here’s a catchup for the last few weeks.

[... 769 words]

Trogon (via) The latest project from the Textualize/Rich crew, Trogon provides a Python decorator—@tui—which, when applied to a Click CLI application, adds a new interactive TUI mode which introspects the available subcommands and their options and creates a full Text User Interface—with keyboard and mouse support—for assembling invocations of those various commands.

I just shipped sqlite-utils 3.32 with support for this—it uses an optional dependency, so you’ll need to run “sqlite-utils install trogon” and then “sqlite-utils tui” to try it out. # 21st May 2023, 9:39 pm

llm, ttok and strip-tags—CLI tools for working with ChatGPT and other LLMs

I’ve been building out a small suite of command-line tools for working with ChatGPT, GPT-4 and potentially other language models in the future.

[... 1317 words]

Weeknotes: sqlite-utils 3.31, download-esm, Python in a sandbox

A couple of speaking appearances last week—one planned, one unplanned. Plus sqlite-utils 3.31, download-esm and a new TIL.

[... 608 words]

download-esm: a tool for downloading ECMAScript modules

I’ve built a new CLI tool, download-esm, which takes the name of an npm package and will attempt to download the ECMAScript module version of that package, plus all of its dependencies, directly from the jsDelivr CDN—and then rewrite all of the import statements to point to those local copies.

[... 1240 words]

Weeknotes: Miscellaneous research into Rye, ChatGPT Code Interpreter and openai-to-sqlite

I gave myself some time off stressing about my core responsibilities this week after PyCon, which meant allowing myself to be distracted by some miscellaneous research projects.

[... 891 words]

Enriching data with GPT3.5 and SQLite SQL functions

I shipped openai-to-sqlite 0.3 yesterday with a fun new feature: you can now use the command-line tool to enrich data in a SQLite database by running values through an OpenAI model and saving the results, all in a single SQL query.

[... 1219 words]

GPT-3 token encoder and decoder. I built an Observable notebook with an interface to encode, decode and search through GPT-3 tokens, building on top of a notebook by EJ Fox and Ian Johnson. # 27th April 2023, 11:48 pm

sqlite-history: tracking changes to SQLite tables using triggers (also weeknotes)

In between blogging about ChatGPT rhetoric, micro-benchmarking with ChatGPT Code Interpreter and Why prompt injection is an even bigger problem now I managed to ship the beginnings of a new project: sqlite-history.

[... 1680 words]

image-to-jpeg (via) I built a little JavaScript app that accepts an image, then displays that image as a JPEG with a slider to control the quality setting, plus a copy and paste textarea to copy out that image with a data-uri. I didn’t actually write a single line of code for this: I got ChatGPT/GPT-4 to generate the entire thing with some prompts (transcript in the via link). # 5th April 2023, 10:10 pm

Weeknotes: A new llm CLI tool, plus automating my weeknotes and newsletter

I started publishing weeknotes in 2019 partly as a way to hold myself accountable but mainly as a way to encourage myself to write more.

[... 830 words]

Semi-automating a Substack newsletter with an Observable notebook

I recently started sending out a weekly-ish email newsletter consisting of content from my blog. I’ve mostly automated that, using an Observable Notebook to generate the HTML. Here’s how that system works.

[... 2520 words]