Weeknotes: a staging environment, a Datasette alpha and a bunch of new LLMs

6th August 2024

My big achievement for the last two weeks was finally wrapping up work on the Datasette Cloud staging environment. I also shipped a new Datasette 1.0 alpha and added support to the LLM ecosystem for a bunch of newly released models.

A staging environment for Datasette Cloud
Datasette 1.0a14
Llama 3.1 GGUFs and Mistral for LLM
Blog entries
Releases
TILs

A staging environment for Datasette Cloud

I’m a big believer in investing in projects to help accelerate future work. Having a productive development environment is critical for me—it’s why most of my projects start with templates that give me unit tests, contineous integration and a deployment pipeline from the start.

Datasette Cloud runs Datasette in containers hosted on Fly.io. When I was first putting the system together I got a little lazy—while it still had minimal user activity I could get away with iterating on the production environment directly.

That’s no longer a responsible thing to do, and as a result I found my speed of iteration dropping dramatically. Deploying new user-facing Datasette features remained productive because I could test those locally, but the systems that interacted with Fly.io in order to launch and update containers were a different story.

It was time to invest in a staging environment—which turns out to be one of those things that gets harder to set up the longer you leave it. I should add it to my list of PAGNIs—Probably Are Gonna Need Its. There ended up being all sorts of assumptions baked into the system that hard-coded production domains and endpoints.

It took longer than expected, but the staging environment is now in place. I’m really happy with it.

It’s a full clone of the production environment, replicating all aspects of production in a separate Fly organization with its own domain names, API keys, S3 buckets and other configuration.
Continuous integration and continous deployment continues to work. Any code pushed to the main branch of both the core repositories for Datasette Cloud will be deployed to both production and staging... unless staging is configured to deploy from a branch instead, in which case I can push experimental code to that branch and see it running in the staging environment without affecting production.
I added a feature to help me iterate on the end-user Datasette containers as well: I can now launch a new space and configure that to deploy changes made to a specific branch. This means I can rapidly test end-user changes in a safe, isolated environment that otherwise exactly mirrors how production works.

There are three key components to how Datasette Cloud works:

A router application, written in Go, which handles ALL traffic to *.datasette.cloud and decides which underlying container it should be routed to. Each Datasette Cloud team gets its own dedicated container under that team’s selected subdomain. Fly.io can scale containers to zero, so routed requests can cause a container to be started up if it’s not already running.
A Django application responsible for the www.datasette.cloud site. This is the site where users sign in and manage their Datasette Cloud spaces. It also offers several different APIs that the individual Datasette containers can consult for things like permission checks and configuring additional features.
The Datasette containers themselves. Each space (my term for a private team instance) gets their own container with their own encrypted volume, to minimize the chance of accidental leakage of data between different teams and ensure that performance problems in one space don’t affect others. These containers are launched and updated by the Django application.

The staging environment means that any of these three can now be aggressively iterated on without any fear of breaking production. I expect it to dramatically increase my velocity in iterating on improvements to how everything fits together.

Datasette 1.0a14

I published some annotated release notes for this yesterday. It represents several months of accumulated work, much of it by Alex Garcia. It’s already running on Datasette Cloud, which is a useful testing ground for driving improvements to Datasette itself.

Llama 3.1 GGUFs and Mistral for LLM

Llama 3.1 came out on the 23rd of July and represented a huge milestone in openly licensed LLMs: the first openly licensed model that can credibly claim to compete with GPT-4.

It came in three sizes: 403B (the GPT-4 competitor), 70B and 8B.

LLM’s llm-llama-cpp plugin has been in desperate need of an upgrade in quite a while—the plugin has some bad design decisions baked in that coupled it to the Llama 2 era.

Rather than fix that up I ended up building a brand new plugin: llm-gguf, which takes advantage of the latest features in the llama.cpp ecosystem.

Most importantly, this means LLM can run Llama 3.1 8b. The following command dowloads a 4.6GB model file and uses it to run a prompt:

llm install llm-gguf
llm gguf download-model \
  https://huggingface.co/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf \
  --alias llama-3.1-8b-instruct --alias l31i
llm -m l31i "five great names for a pet lemur"

I ran that just now and got:

Here are five great name suggestions for a pet lemur:

Loki: A mischievous and playful name that suits a lemur’s curious and energetic nature. In Norse mythology, Loki is known for his cunning and trickster ways, which makes him a fitting name for a lemur.

Zuzu: A whimsical and exotic name that captures the lemur’s unique charm. Zuzu is a nod to the magical and mysterious, evoking the idea of a lemur as a tiny, furry sorcerer.

Finnley: A charming and adventurous name that suits a lemur’s bold and curious personality. Finnley is a playful name that suggests a sense of fun and mischief, perfect for a pet lemur.

Kiko: A sweet and endearing name that means “little one” in Swahili. Kiko is a fitting name for a pet lemur, which is already small and adorable.

Jinx: A fun and mischievous name that suits a lemur’s playful and energetic nature. Jinx is a nod to the idea of a lemur as a tiny, furry troublemaker, always getting into mischief and mayhem.

I hope these suggestions help inspire you to find the perfect name for your pet lemur!

Great job my laptop!

Mistral AI also released a new model which looks to be edging into that GPT-4 territory: Mistral Large 2. This one is available as open weights under a non-commercial “research” license, or you can access it for commercial purposes via the Mistral API.

The llm-mistral plugin talks to that API, and doesn’t need to be updated to handle new models since it can refresh the list of models directly from Mistral:

llm install llm-mistral
llm mistral refresh # if already installed
llm -m mistral/mistral-large-latest 'five names for a pet mountain lion'

It reminded me to get a license:

Here are five names suitable for a pet mountain lion. Remember that big cats like mountain lions need special care, licensing, and experience, so it’s important to be responsible and ensure their well-being.

Shadow: This name represents the stealthy and elusive nature of mountain lions.

Tawny: A name inspired by the beautiful tawny coat color of mountain lions.

Koda: A Native American name meaning “friend” or “ally”, which is fitting for a beloved pet.

Sierra: After the Sierra Nevada mountains, a habitat for mountain lions, this name has a wild and majestic feel.

Leon: Derived from the Latin word for “lion”, this name signifies strength and power.

Blog entries

Datasette 1.0a14: The annotated release notes

Plus dozens of links and quotations. My link descriptions are indistinguishable from regular blog posts now, especially since I’ve started including inline images and even videos for some of them. Here’s everything in July.

Releases

datasette-remote-metadata 0.2a0—2024-08-05
Periodically refresh Datasette metadata from a remote URL
datasette 1.0a14—2024-08-05
An open source multi-tool for exploring and publishing data
fetch-github-issues 0.1.2—2024-07-29
Fetch all GitHub issues for a repository
datasette-extract 0.1a8—2024-07-26
Import unstructured data (text and images) into structured tables
llm-mistral 0.5—2024-07-24
LLM plugin providing access to Mistral models using the Mistral API
llm-gguf 0.1a0—2024-07-23
Run models distributed as GGUF files using LLM

TILs

Assistance with release notes using GitHub Issues—2024-08-05
Back-dating Git commits based on file modification dates—2024-08-01
HTML video with subtitles—2024-07-31

Posted 6th August 2024 at 3:41 pm · Follow me on Mastodon, Bluesky, Twitter or subscribe to my newsletter

Simon Willison’s Weblog