Simon Willison's Weblog: projects

Running prompts against images and PDFs with Google Gemini

2024-10-23T18:25:07+00:00

Running prompts against images and PDFs with Google Gemini

New TIL. I've been experimenting with the Google Gemini APIs for running prompts against images and PDFs (in preparation for finally adding multi-modal support to LLM) - here are my notes on how to send images or PDF files to their API using curl and the base64 -i macOS command.

I figured out the curl incantation first and then got Claude to build me a Bash script that I can execute like this:

prompt-gemini 'extract text' example-handwriting.jpg

Playing with this is really fun. The Gemini models charge less than 1/10th of a cent per image, so it's really inexpensive to try them out.

Tags: vision-llms, gemini, llm, bash, ai, llms, ai-assisted-programming, google, generative-ai, ocr, projects

Everything I built with Claude Artifacts this week

2024-10-21T14:32:57+00:00

I'm a huge fan of Claude's Artifacts feature, which lets you prompt Claude to create an interactive Single Page App (using HTML, CSS and JavaScript) and then view the result directly in the Claude interface, iterating on it further with the bot and then, if you like, copying out the resulting code.

I was digging around in my Claude activity export (I built a claude-to-sqlite tool to convert it to SQLite I could explore it in Datasette) and decided to see how much I'd used artifacts in the past week. It was more than I expected!

Being able to spin up a full interactive application - sometimes as an illustrative prototype, but often as something that directly solves a problem - is a remarkably useful tool.

Here's most of what I've used Claude Artifacts for in the past seven days. I've provided prompts or a full transcript for nearly all of them.

URL to Markdown with Jina Reader

I got frustrated at how hard it was to copy and paste the entire text of a web page into an LLM while using Mobile Safari. So I built a simple web UI that lets me enter a URL, calls the Jina Reader API to generate Markdown (which uses Puppeteer under the hood) and gives me that Markdown with a convenient "Copy" button.

Try it out: https://tools.simonwillison.net/jina-reader (Code)

I wrote more about that project here.

SQLite in WASM demo

A Hacker News conversation about SQLite's WASM build lead me to the @sqlite.org/sqlite-wasm package on NPM, and I decided to knock together a quick interactive demo.

Try it out here: tools.simonwillison.net/sqlite-wasm

Code, Claude transcript

Extract URLs

I found myself wanting to extract all of the underlying URLs that were linked to from a chunk of text on a web page. I realized the fastest way to do that would be to spin up an artifact that could accept rich-text HTML pastes and use an HTML parser to extract those links.

https://tools.simonwillison.net/extract-urls

Code, Claude transcript

Clipboard viewer

Messing around with a tool that lets you paste in rich text reminded me that the browser clipboard API is a fascinating thing. I decided to build a quick debugging tool that would let me copy and paste different types of content (plain text, rich text, files, images etc) and see what information was available to me in the browser.

https://tools.simonwillison.net/clipboard-viewer

Code, Claude transcript

Pyodide REPL

I didn't put a lot of effort into this one. While poking around with Claude Artifacts in the browser DevTools I spotted this CSP header:

content-security-policy: default-src https://www.claudeusercontent.com; script-src 'unsafe-eval' 'unsafe-inline' https://www.claudeusercontent.com https://cdnjs.cloudflare.com https://cdn.jsdelivr.net/pyodide/; connect-src https://cdn.jsdelivr.net/pyodide/; worker-src https://www.claudeusercontent.com blob:; style-src 'unsafe-inline' https://www.claudeusercontent.com https://cdnjs.cloudflare.com https://fonts.googleapis.com; img-src blob: data: https://www.claudeusercontent.com; font-src data: https://www.claudeusercontent.com; object-src 'none'; base-uri https://www.claudeusercontent.com; form-action https://www.claudeusercontent.com; frame-ancestors https://www.claudeusercontent.com https://claude.ai https://preview.claude.ai https://claude.site https://feedback.anthropic.com; upgrade-insecure-requests; block-all-mixed-content

The https://cdn.jsdelivr.net/pyodide/ in there caught my eye, because it suggested that the Anthropic development team had deliberately set it up so Pyodide - Python compiled to WebAssembly - could be loaded in an artifact.

I got Claude to spin up a very quick demo to prove that this worked:

https://claude.site/artifacts/a3f85567-0afc-4854-b3d3-3746dd1a37f2

I've not bothered to extract this one to my own tools.simonwillison.net site yet because it's purely a proof of concept that Pyodide can load correctly in that environment.

Photo Camera Settings Simulator

I was out on a photo walk and got curious about whether or not JavaScript could provide a simulation of camera settings. I didn't get very far with this one (prompting on my phone while walking along the beach) - the result was buggy and unimpressive and I quickly lost interest. It did expose me to the Fabric.js library for manipulating canvas elements though.

https://claude.site/artifacts/e645c231-8c13-4374-bb7d-271c8dd73825

LLM pricing calculator

This one I did finish. I built this pricing calculator as part of my experiments with Video scraping using Google Gemini, because I didn't trust my own calculations for how inexpensive Gemini was! Here are detailed notes on how I built that.

https://tools.simonwillison.net/llm-prices

YAML to JSON converter

I wanted to remind myself how certain aspects of YAML syntax worked, so I span up a quick YAML to JSON converter tool that shows the equivalent JSON live as you type YAML.

https://claude.site/artifacts/ffeb439c-fc95-428a-9224-434f5f968d51

Claude transcript

OpenAI Audio

This is my most interesting artifact of the week. I was exploring OpenAI's new Audio APIs and decided to see if I could get Claude to build we a web page that could request access to my microphone, record a snippet of audio, then base64 encoded that and send it to the OpenAI API.

Here are the full details on how I built this tool.

https://tools.simonwillison.net/openai-audio

Claude Artifacts can't make API requests to external hosts directly, but it can still spin up enough of a working version that it's easy to take that, move it to different hosting and finish getting it working.

I wrote more about this API pattern in Building a tool showing how Gemini Pro can return bounding boxes for objects in images.

QR Code Decoder

I was in a meeting earlier this week where one of the participants shared a slide with a QR code (for joining a live survey tool). I didn't have my phone with me, so I needed a way to turn that QR code into a regular URL.

https://tools.simonwillison.net/qr

Knocking up this QR decoder in Claude Artifacts took just a few seconds:

Build an artifact (no react) that lets me paste in a QR code and displays the decoded information, with a hyperlink if necessary

[ ... ]

have a file open box that also lets you drag and drop and add a onpaste handler to the page that catches pasted images as well

Full conversation here.

Image Converter and Page Downloader

Another very quick prototype. On Hacker News someone demonstrated a neat idea for a tool that let you drop photos onto a page and it would bake them into the page as base64 URLs such that you could "save as HTML" and get a self-contained page with a gallery.

I suggested they could add a feature that generated a "Download link" with the new page baked in - useful on mobile phones that don't let you "Save as HTML" - and got Claude to knock up a quick prototype:

In this case I shared the code in a Gist and then used the new-to-me https://gistpreview.github.io/?GIST_ID_GOES_HERE trick to render the result:

https://gistpreview.github.io/?14a2c3ef508839f26377707dbf5dd329

gistpreview turns out to be a really quick way to turn a LLM-generated demo into a page people can view.

Code, Claude transcript

HTML Entity Escaper

Another example of on-demand software: I needed to escape the HTML entities in a chunk of text on my phone, so I got Claude to build me a tool for that:

https://claude.site/artifacts/46897436-e06e-4ccc-b8f4-3df90c47f9bc

Here's the prompt I used:

Build an artifact (no react) where I can paste text into a textarea and it will return that text with all HTML entities - single and double quotes and less than greater than ampersand - correctly escaped. The output should be in a textarea accompanied by a "Copy to clipboard" button which changes text to "Copied!" for 1.5s after you click it. Make it mobile friendly

Claude transcript

text-wrap-balance-nav

Inspired by Terence Eden I decided to do a quick experiment with the text-wrap: balance CSS property. I got Claude to build me an example nav bar with a slider and a checkbox. I wrote about that here.

https://tools.simonwillison.net/text-wrap-balance-nav

ARES Phonetic Alphabet Converter

I was volunteering as a HAM radio communications operator for the Half Moon Bay Pumpkin Run and got nervous that I'd mess up using the phonetic alphabet - so I had Claude build me this tool:

https://claude.site/artifacts/aaadab20-968a-4291-8ce9-6435f6d53f4c

Claude transcript here. Amusingly it built it in Python first, then switched to JavaScript after I reminded it that I wanted "an interactive web app".

This is so useful, and so much fun!

As you can see, I'm a heavy user of this feature - I just described 14 projects produced in a single week. I've been using artifacts since they were released on 20th June (alongside the excellent Claude 3.5 Sonnet, still my daily-driver LLM) and I'm now at a point where I fire up a new interactive artifact several times a day.

I'm using artifacts for idle curiosity, rapid prototyping, library research and to spin up tools that solve immediate problems.

Most of these tools took less than five minutes to build. A few of the more involved ones took longer than that, but even the OpenAI Audio one took 11:55am to 12:07pm for the first version and 12:18pm to 12:27pm for the second iteration - so 21 minutes total.

Take a look at my claude-artifacts tag for even more examples, including SVG to JPG/PNG, Markdown and Math Live Renderer and Image resize and quality comparison.

I also have a dashboard of every post that links to my tools.simonwillison.net site, and the underlying simonw/tools GitHub repo includes more unlisted tools, most of which link to their Claude conversation transcripts in their commit history.

I'm beginning to get a little frustrated at their limitations - in particular the way artifacts are unable to make API calls, submit forms or even link out to other pages. I'll probably end up spinning up my own tiny artifacts alternative based on everything I've learned about them so far.

If you're not using artifacts, I hope I've given you a sense of why they're one of my current favourite LLM-based tools.

Tags: anthropic, claude, ai, llms, claude-artifacts, javascript, ai-assisted-programming, generative-ai, projects, claude-3-5-sonnet, tools

Dashboard: Tools

2024-10-21T03:33:41+00:00

Dashboard: Tools

I used Django SQL Dashboard to spin up a dashboard that shows all of the URLs to my tools.simonwillison.net site that I've shared on my blog so far. It uses this (Claude assisted) regular expression in a PostgreSQL SQL query:

select distinct on (tool_url)
    unnest(regexp_matches(
        body,
        '(https://tools\.simonwillison\.net/[^<"\s)]+)',
        'g'
    )) as tool_url,
    'https://simonwillison.net/' || left(type, 1) || '/' || id as blog_url,
    title,
    date(created) as created
from content

I've been really enjoying having a static hosting platform (it's GitHub Pages serving my simonw/tools repo) that I can use to quickly deploy little HTML+JavaScript interactive tools and demos.

Tags: django-sql-dashboard, ai-assisted-programming, tools, projects, postgresql, sql, javascript

Experimenting with audio input and output for the OpenAI Chat Completion API

2024-10-18T15:17:40+00:00

OpenAI promised this at DevDay a few weeks ago and now it's here: their Chat Completion API can now accept audio as input and return it as output. OpenAI still recommend their WebSocket-based Realtime API for audio tasks, but the Chat Completion API is a whole lot easier to write code against.

Generating audio

For the moment you need to use the new gpt-4o-audio-preview model. OpenAI tweeted this example:

curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-audio-preview",
    "modalities": ["text", "audio"],
    "audio": {
      "voice": "alloy",
      "format": "wav"
    },
    "messages": [
      {
        "role": "user",
        "content": "Recite a haiku about zeros and ones."
      }
    ]
  }' | jq > response.json

I tried running that and got back JSON with a HUGE base64 encoded block in it:

{
  "id": "chatcmpl-AJaIpDBFpLleTUwQJefzs1JJE5p5g",
  "object": "chat.completion",
  "created": 1729231143,
  "model": "gpt-4o-audio-preview-2024-10-01",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "refusal": null,
        "audio": {
          "id": "audio_6711f92b13a081908e8f3b61bf18b3f3",
          "data": "UklGRsZr...AA==",
          "expires_at": 1729234747,
          "transcript": "Digits intertwine,  \nIn dance of noughts and unity,  \nCode's whispers breathe life."
        }
      },
      "finish_reason": "stop",
      "internal_metrics": []
    }
  ],
  "usage": {
    "prompt_tokens": 17,
    "completion_tokens": 181,
    "total_tokens": 198,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "cached_tokens_internal": 0,
      "text_tokens": 17,
      "image_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "text_tokens": 33,
      "audio_tokens": 148
    }
  },
  "system_fingerprint": "fp_6e2d124157"
}

The full response is here - I've truncated that data field since the whole thing is 463KB long!

Next I used jq and base64 to save the decoded audio to a file:

cat response.json | jq -r '.choices[0].message.audio.data' \
  | base64 -D > decoded.wav

That gave me a 7 second, 347K WAV file. I converted that to MP3 with the help of llm cmd and ffmpeg:

llm cmd ffmpeg convert decoded.wav to code-whispers.mp3
> ffmpeg -i decoded.wav -acodec libmp3lame -b:a 128k code-whispers.mp3

That gave me a 117K MP3 file.

Your browser does not support the audio element.

The "usage" field above shows that the output used 148 audio tokens. OpenAI's pricing page says audio output tokens are $200/million, so I plugged that into my LLM pricing calculator and got back a cost of 2.96 cents.

Audio input via a Bash script

Next I decided to try the audio input feature. You can now embed base64 encoded WAV files in the list of messages you send to the model, similar to how image inputs work.

I started by pasting a curl example of audio input into Claude and getting it to write me a Bash script wrapper. Here's the full audio-prompt.sh script. The part that does the work (after some argument parsing) looks like this:

# Base64 encode the audio file
AUDIO_BASE64=$(base64 < "$AUDIO_FILE" | tr -d '\n')

# Construct the JSON payload
JSON_PAYLOAD=$(jq -n \
    --arg model "gpt-4o-audio-preview" \
    --arg text "$TEXT_PROMPT" \
    --arg audio "$AUDIO_BASE64" \
    '{
        model: $model,
        modalities: ["text"],
        messages: [
            {
                role: "user",
                content: [
                    {type: "text", text: $text},
                    {
                        type: "input_audio",
                        input_audio: {
                            data: $audio,
                            format: "wav"
                        }
                    }
                ]
            }
        ]
    }')

# Make the API call
curl -s "https://api.openai.com/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d "$JSON_PAYLOAD" | jq .

From the documentation it looks like you can send an "input_audio"."format" of either "wav" or "mp3".

You can run it like this:

./audio-prompt.sh 'describe this audio' decoded.wav

This dumps the raw JSON response to the console. Here's what I got for that sound clip I generated above, which gets a little creative:

The audio features a spoken phrase that is poetic in nature. It discusses the intertwining of "digits" in a coordinated and harmonious manner, as if engaging in a dance of unity. It mentions "codes" in a way that suggests they have an almost life-like quality. The tone seems abstract and imaginative, possibly metaphorical, evoking imagery related to technology or numbers.

A web app for recording and prompting against audio

I decided to turn this into a tiny web application. I started by asking Claude to create a prototype with a "record" button, just to make sure that was possible:

Build an artifact - no React - that lets me click a button to start recording, shows a counter running up, then lets me click again to stop. I can then play back the recording in an audio element. The recording should be a WAV

Then I pasted in one of my curl experiments from earlier and told it:

Now add a textarea input called "prompt" and a button which, when clicked, submits the prompt and the base64 encoded audio file using fetch() to this URL

The JSON that comes back should be displayed on the page, pretty-printed

The API key should come from localStorage - if localStorage does not have it ask the user for it with prompt()

I iterated through a few error messages and got to a working application! I then did one more round with Claude to add a basic pricing calculator showing how much the prompt had cost to run.

You can try the finished application here:

tools.simonwillison.net/openai-audio

Here's the finished code. It uses all sorts of APIs I've never used before: AudioContext().createMediaStreamSource(...) and a DataView() to build the WAV file from scratch, plus a trick with FileReader() .. readAsDataURL() for in-browser base64 encoding.

Audio inputs are charged at $100/million tokens, and processing 5 seconds of audio her cost 0.6 cents.

The problem is the price

Audio tokens are currently charged at $100/million for input and $200/million for output. Tokens are hard to reason about, but a note on the pricing page clarifies that:

Audio input costs approximately 6¢ per minute; Audio output costs approximately 24¢ per minute

Translated to price-per-hour, that's $3.60 per hour of input and $14.40 per hour of output. I think the Realtime API pricing is about the same. These are not cheap APIs.

Meanwhile, Google's Gemini models price audio at 25 tokens per second (for input only, they don't yet handle audio output). That means that for their three models:

Gemini 1.5 Pro is $1.25/million input tokens, so $0.11 per hour
Gemini 1.5 Flash is $0.075/milllion, so $0.00675 per hour (that's less than a cent)
Gemini 1.5 Flash 8B is $0.0375/million, so $0.003375 per hour (a third of a cent!)

This means even Google's most expensive Pro model is still 32 times less costly than OpenAI's gpt-4o-audio-preview model when it comes to audio input, and Flash 8B is 1,066 times cheaper.

(I really hope I got those numbers right. I had ChatGPT double-check them. I keep find myself pricing out Gemini and not believing the results.)

I'm going to cross my fingers and hope for an OpenAI price drop in the near future, because it's hard to justify building anything significant on top of these APIs at the current price point, especially given the competition.

Tags: claude, audio, openai, gpt-4, ai, llms, ai-assisted-programming, generative-ai, projects

files-to-prompt 0.4

2024-10-16T23:29:08+00:00

files-to-prompt 0.4

New release of my files-to-prompt tool adding an option for filtering just for files with a specific extension.

The following command will output Claude XML-style markup for all Python and Markdown files in the current directory, and copy that to the macOS clipboard ready to be pasted into an LLM:

files-to-prompt . -e py -e md -c | pbcopy

Tags: projects, python, llms

My Jina Reader tool

2024-10-14T16:47:56+00:00

My Jina Reader tool

I wanted to feed the Cloudflare Durable Objects SQLite documentation into Claude, but I was on my iPhone so copying and pasting was inconvenient. Jina offer a Reader API which can turn any URL into LLM-friendly Markdown and it turns out it supports CORS, so I got Claude to build me this tool (second iteration, third iteration, final source code).

Paste in a URL to get the Jina Markdown version, along with an all important "Copy to clipboard" button.

Tags: projects, markdown, ai-assisted-programming, jina, claude-3-5-sonnet, claude, generative-ai, ai, llms

Datasette 0.65

2024-10-07T18:07:03+00:00

Datasette 0.65

Python 3.13 was released today, which broke compatibility with the Datasette 0.x series due to an issue with an underlying dependency. I've fixed that problem by vendoring and fixing the dependency and the new 0.65 release works on Python 3.13 (but drops support for Python 3.8, which is EOL this month). Datasette 1.0a16 added support for Python 3.13 last month.

Tags: projects, datasette, python

django-plugin-datasette

2024-09-26T21:57:52+00:00

django-plugin-datasette

I did some more work on my DJP plugin mechanism for Django at the DjangoCon US sprints today. I added a new plugin hook, asgi_wrapper(), released in DJP 0.3 and inspired by the similar hook in Datasette.

The hook only works for Django apps that are served using ASGI. It allows plugins to add their own wrapping ASGI middleware around the Django app itself, which means they can do things like attach entirely separate ASGI-compatible applications outside of the regular Django request/response cycle.

Datasette is one of those ASGI-compatible applications!

django-plugin-datasette uses that new hook to configure a new URL, /-/datasette/, which serves a full Datasette instance that scans through Django’s settings.DATABASES dictionary and serves an explore interface on top of any SQLite databases it finds there.

It doesn’t support authentication yet, so this will expose your entire database contents - probably best used as a local debugging tool only.

I did borrow some code from the datasette-mask-columns plugin to ensure that the password column in the auth_user column is reliably redacted. That column contains a heavily salted hashed password so exposing it isn’t necessarily a disaster, but I like to default to keeping hashes safe.

Tags: projects, sqlite, plugins, djp, datasette, django

DJP: A plugin system for Django

2024-09-25T14:00:42+00:00

DJP is a new plugin mechanism for Django, built on top of Pluggy. I announced the first version of DJP during my talk yesterday at DjangoCon US 2024, How to design and implement extensible software with plugins. I'll post a full write-up of that talk once the video becomes available - this post describes DJP and how to use what I've built so far.

Why plugins?

Django already has a thriving ecosystem of third-party apps and extensions. What can a plugin system add here?

If you've ever installed a Django extension - such as django-debug-toolbar or django-extensions - you'll be familiar with the process. You pip install the package, then add it to your list of INSTALLED_APPS in settings.py - and often configure other picees, like adding something to MIDDLEWARE or updating your urls.py with new URL patterns.

This isn't exactly a huge burden, but it's added friction. It's also the exact kind of thing plugin systems are designed to solve.

DJP addresses this. You configure DJP just once, and then any additional DJP-enabled plugins you pip install can automatically register configure themselves within your Django project.

Setting up DJP

There are three steps to adding DJP to an existing Django project:

pip install djp - or add it to your requirements.txt or similar.

Modify your settings.py to add these two lines:

# Can be at the start of the file:
import djp

# This MUST be the last line:
djp.settings(globals())

Modify your urls.py to contain the following:

import djp

urlpatterns = [
    # Your existing URL patterns
] + djp.urlpatterns()

That's everything. The djp.settings(globals()) line is a little bit of magic - it gives djp an opportunity to make any changes it likes to your configured settings.

You can see what that does here. Short version: it adds "djp" and any other apps from plugins to INSTALLED_APPS, modifies MIDDLEWARE for any plugins that need to do that and gives plugins a chance to modify any other settings they need to.

One of my personal rules of plugin system design is that you should never ship a plugin hook (a customization point) without releasing at least one plugin that uses it. This validates the design and provides executable documentation in the form of working code.

I've released three plugins for DJP so far.

django-plugin-django-header

django-plugin-django-header is a very simple initial example. It registers a Django middleware class that adds a Django-Composition: HTTP header to every response with the name of a random Composition by Django Reinhardt (thanks,Wikipedia).

pip install django-plugin-django-header

Then try it out with curl:

curl -I http://localhost:8000/

You should get back something like this:

...
Django-Composition: Nuages
...

I'm running this on my blog right now! Try this command to see it in action:

curl -I https://simonwillison.net/

The plugin is very simple. Its __init__.py registers middleware like this:

import djp

@djp.hookimpl
def middleware():
    return [
        "django_plugin_django_header.middleware.DjangoHeaderMiddleware"
    ]

That string references the middleware class in this file.

django-plugin-blog

django-plugin-blog is a much bigger example. It implements a full blog system for your Django application, with bundled models and templates and views and a URL configuration.

You'll need to have configured auth and the Django admin already (those already there by default in the django-admin startproject template). Now install the plugin:

pip install django-plugin-blog

And run migrations to create the new database tables:

python manage.py migrate

That's all you need to do. Navigating to /blog/ will present the index page of the blog, including a link to a working Atom feed.

You can add entries and tags through the Django admin (configured for you by the plugin) and those will show up on /blog/, get their own URLs at /blog/2024/<slug>/ and be included in the Atom feed, the /blog/archive/ list and the /blog/2024/ year-based index too.

The default design is very basic, but you can customize that by providing your own base template or providing custom templates for each of the pages. There are details on the templates in the README.

The blog implementation is directly adapted from my Building a blog in Django TIL.

The primary goal of this plugin is to demonstrate what a plugin with views, templates, models and a URL configuration looks like. Here's the full __init__.py for the plugin:

from django.urls import path
from django.conf import settings
import djp

@djp.hookimpl
def installed_apps():
    return ["django_plugin_blog"]

@djp.hookimpl
def urlpatterns():
    from .views import index, entry, year, archive, tag, BlogFeed

    blog = getattr(settings, "DJANGO_PLUGIN_BLOG_URL_PREFIX", None) or "blog"
    return [
        path(f"{blog}/", index, name="django_plugin_blog_index"),
        path(f"{blog}/<int:year>/<slug:slug>/", entry, name="django_plugin_blog_entry"),
        path(f"{blog}/archive/", archive, name="django_plugin_blog_archive"),
        path(f"{blog}/<int:year>/", year, name="django_plugin_blog_year"),
        path(f"{blog}/tag/<slug:slug>/", tag, name="django_plugin_blog_tag"),
        path(f"{blog}/feed/", BlogFeed(), name="django_plugin_blog_feed"),
    ]

It still only needs to implement two hooks: one to add django_plugin_blog to the INSTALLED_APPS list and another to add the necessary URL patterns to the project.

The from .views import ... line is nested inside the urlpatterns() hook because I was hitting circular import issues with those imports at the top of the module.

django-plugin-database-url

django-plugin-database-url is the smallest of my example plugins. It exists mainly to exercise the settings() plugin hook, which allows plugins to further manipulate settings in any way they like.

Quoting the README:

Once installed, any DATABASE_URL environment variable will be automatically used to configure your Django database setting, using dj-database-url.

Here's the full implementation of that plugin, most of which is copied straight from the dj-database-url documentation:

import djp
import dj_database_url

@djp.hookimpl
def settings(current_settings):
    current_settings["DATABASES"]["default"] = dj_database_url.config(
        conn_max_age=600,
        conn_health_checks=True,
    )

If DJP gains traction, I expect that a lot of plugins will look like this - thin wrappers around existing libraries where the only added value is that they configure those libraries automatically once the plugin is installed.

Writing a plugin

A plugin is a Python package bundling a module that implements one or more of the DJP plugin hooks.

As I've shown above, the Python code for plugins can be very short. The larger challenge is correctly packaging and distributing the plugin - plugins are discovered using Entry Points which are defined in a pyproject.toml file, and you need to get those exactly right for your plugin to be discovered.

DJP includes documentation on creating a plugin, but to make it as frictionless as possible I've released a new django-plugin cookiecutter template.

This means you can start a new plugin like this:

pip install cookiecutter
cookiecutter gh:simonw/django-plugin

Then answer the questions:

  [1/6] plugin_name (): django-plugin-example
  [2/6] description (): A simple example plugin
  [3/6] hyphenated (django-plugin-example):
  [4/6] underscored (django_plugin_example):
  [5/6] github_username (): simonw
  [6/6] author_name (): Simon Willison

And you'l get a django-plugin-example directory with a fully configured plugin ready to be published to PyPI.

The template includes a .github/workflows directory with actions that can run tests, and an action that publishes your plugin to PyPI any time you create a new release on GitHub.

I've used that pattern myself for hundreds of plugin projects for Datasette and LLM, so I'm confident this is an effective way to release plugins.

The workflows use PyPI's Trusted Publishers mechanism (see my TIL), which means you don't need to worry about API keys or PyPI credentials - configure the GitHub repo once using the PyPI UI and everything should just work.

Writing tests for plugins

Writing tests for plugins can be a little tricky, especially if they need to spin up a full Django environemnt in order to run the tests.

I previously published a TIL about that, showing how to have tests with their own tests/test_project project that can be used by pytest-django.

I've baked that pattern into the simon/django-plugin cookiecutter template as well, plus a single default test which checks that a hit to the / index page returns a 200 status code - still a valuable default test since it confirms the plugin hasn't broken everything!

The tests for django-plugin-django-header and for django-plugin-blog should provide a useful starting point for writing tests for your own plugins.

Why call it DJP?

Because django-plugins already existed on PyPI, and I like my three letter acronyms there!

What's next for DJP?

I presented this at DjangoCon US 2024 yesterday afternoon. Initial response seemed positive, and I'm going to be attending the conference sprints on Thursday morning to see if anyone wants to write their own plugin or help extend the system further.

Is this a good idea? I think so. Plugins have been transformative for both Datasette and LLM, and I think Pluggy provides a mature, well-designed foundation for this kind of system.

I'm optimistic about plugins as a natural extension of Django's existing ecosystem. Let's see where this goes.

Tags: djp, projects, django, plugins

simonw/docs cookiecutter template

2024-09-23T21:45:15+00:00

simonw/docs cookiecutter template

Over the last few years I’ve settled on the combination of Sphinx, the Furo theme and the myst-parser extension (enabling Markdown in place of reStructuredText) as my documentation toolkit of choice, maintained in GitHub and hosted using ReadTheDocs.

My LLM and shot-scraper projects are two examples of that stack in action.

Today I wanted to spin up a new documentation site so I finally took the time to construct a cookiecutter template for my preferred configuration. You can use it like this:

pipx install cookiecutter
cookiecutter gh:simonw/docs

Or with uv:

uv tool run cookiecutter gh:simonw/docs

Answer a few questions:

[1/3] project (): shot-scraper
[2/3] author (): Simon Willison
[3/3] docs_directory (docs):

And it creates a docs/ directory ready for you to start editing docs:

cd docs
pip install -r requirements.txt
make livehtml

Tags: uv, markdown, sphinx-docs, cookiecutter, read-the-docs, python, projects, documentation

LLM 0.16

2024-09-12T23:20:59+00:00

LLM 0.16

New release of LLM adding support for the o1-preview and o1-mini OpenAI models that were released today.

Tags: llm, projects, generative-ai, openai, ai, llms, o1

files-to-prompt 0.3

2024-09-09T05:57:35+00:00

files-to-prompt 0.3

New version of my files-to-prompt CLI tool for turning a bunch of files into a prompt suitable for piping to an LLM, described here previously.

It now has a -c/--cxml flag for outputting the files in Claude XML-ish notation (XML-ish because it's not actually valid XML) using the format Anthropic describe as recommended for long context:

files-to-prompt llm-*/README.md --cxml | llm -m claude-3.5-sonnet \
  --system 'return an HTML page about these plugins with usage examples' \
  > /tmp/fancy.html

Here's what that gave me.

The format itself looks something like this:

<documents>
<document index="1">
<source>llm-anyscale-endpoints/README.md</source>
<document_content>
# llm-anyscale-endpoints
...
</document_content>
</document>
</documents>

Tags: anthropic, claude, tools, projects, generative-ai, ai, llms, prompt-engineering

json-flatten, now with format documentation

2024-09-07T05:43:01+00:00

json-flatten, now with format documentation

json-flatten is a fun little Python library I put together a few years ago for converting JSON data into a flat key-value format, suitable for inclusion in an HTML form or query string. It lets you take a structure like this one:

{"foo": {"bar": [1, True, None]}

And convert it into key-value pairs like this:

foo.bar.[0]$int=1
foo.bar.[1]$bool=True
foo.bar.[2]$none=None

The flatten(dictionary) function function converts to that format, and unflatten(dictionary) converts back again.

I was considering the library for a project today and realized that the 0.3 README was a little thin - it showed how to use the library but didn't provide full details of the format it used.

On a hunch, I decided to see if files-to-prompt plus LLM plus Claude 3.5 Sonnet could write that documentation for me. I ran this command:

files-to-prompt *.py | llm -m claude-3.5-sonnet --system 'write detailed documentation in markdown describing the format used to represent JSON and nested JSON as key/value pairs, include a table as well'

That *.py picked up both json_flatten.py and test_json_flatten.py - I figured the test file had enough examples in that it should act as a good source of information for the documentation.

This worked really well! You can see the first draft it produced here.

It included before and after examples in the documentation. I didn't fully trust these to be accurate, so I gave it this follow-up prompt:

llm -c "Rewrite that document to use the Python cog library to generate the examples"

I'm a big fan of Cog for maintaining examples in READMEs that are generated by code. Cog has been around for a couple of decades now so it was a safe bet that Claude would know about it.

This almost worked - it produced valid Cog syntax like the following:

[[[cog
example = {
"fruits": ["apple", "banana", "cherry"]
}

cog.out("```json\n")
cog.out(str(example))
cog.out("\n```\n")
cog.out("Flattened:\n```\n")
for key, value in flatten(example).items():
    cog.out(f"{key}: {value}\n")
cog.out("```\n")
]]]
[[[end]]]

But that wasn't entirely right, because it forgot to include the Markdown comments that would hide the Cog syntax, which should have looked like this:

<!-- [[[cog -->
...
<!-- ]]] -->
...
<!-- [[[end]]] -->

I could have prompted it to correct itself, but at this point I decided to take over and edit the rest of the documentation by hand.

The end result was documentation that I'm really happy with, and that I probably wouldn't have bothered to write if Claude hadn't got me started.

Tags: claude-3-5-sonnet, llm, anthropic, claude, ai, llms, ai-assisted-programming, generative-ai, projects, json

Datasette 1.0a16

2024-09-06T05:55:28+00:00

Datasette 1.0a16

This latest release focuses mainly on performance, as discussed here in Optimizing Datasette a couple of weeks ago.

It also includes some minor CSS changes that could affect plugins, and hence need to be included before the final 1.0 release. Those are outlined in detail in issues #2415 and #2420.

Tags: projects, datasette

New improved commit messages for scrape-hacker-news-by-domain

2024-09-06T05:40:01+00:00

New improved commit messages for scrape-hacker-news-by-domain

My simonw/scrape-hacker-news-by-domain repo has a very specific purpose. Once an hour it scrapes the Hacker News /from?site=simonwillison.net page (and the equivalent for datasette.io) using my shot-scraper tool and stashes the parsed links, scores and comment counts in JSON files in that repo.

It does this mainly so I can subscribe to GitHub's Atom feed of the commit log - visit simonw/scrape-hacker-news-by-domain/commits/main and add .atom to the URL to get that.

NetNewsWire will inform me within about an hour if any of my content has made it to Hacker News, and the repo will track the score and comment count for me over time. I wrote more about how this works in Scraping web pages from the command line with shot-scraper back in March 2022.

Prior to the latest improvement, the commit messages themselves were pretty uninformative. The message had the date, and to actually see which Hacker News post it was referring to, I had to click through to the commit and look at the diff.

I built my csv-diff tool a while back to help address this problem: it can produce a slightly more human-readable version of a diff between two CSV or JSON files, ideally suited for including in a commit message attached to a git scraping repo like this one.

I got that working, but there was still room for improvement. I recently learned that any Hacker News thread has an undocumented URL at /latest?id=x which displays the most recently added comments at the top.

I wanted that in my commit messages, so I could quickly click a link to see the most recent comments on a thread.

So... I added one more feature to csv-diff: a new --extra option lets you specify a Python format string to be used to add extra fields to the displayed difference.

My GitHub Actions workflow now runs this command:

csv-diff simonwillison-net.json simonwillison-net-new.json \
  --key id --format json \
  --extra latest 'https://news.ycombinator.com/latest?id={id}' \
  >> /tmp/commit.txt

This generates the diff between the two versions, using the id property in the JSON to tie records together. It adds a latest field linking to that URL.

The commits now look like this:

Tags: shot-scraper, github-actions, projects, hacker-news, git-scraping, json

Calling LLMs from client-side JavaScript, converting PDFs to HTML + weeknotes

2024-09-06T02:28:38+00:00

I've been having a bunch of fun taking advantage of CORS-enabled LLM APIs to build client-side JavaScript applications that access LLMs directly. I also span up a new Datasette plugin for advanced permission management.

LLMs from client-side JavaScript

Anthropic recently added CORS support to their Claude APIs. It's a little hard to use - you have to add anthropic-dangerous-direct-browser-access: true to your request headers to enable it - but once you know the trick you can start building web applications that talk to Anthropic's LLMs directly, without any additional server-side code.

I later found out that both OpenAI and Google Gemini have this capability too, without needing the special header.

The problem with this approach is security: it's very important not to embed an API key attached to your billing account in client-side HTML and JavaScript for anyone to see!

For my purposes though that doesn't matter. I've been building tools which prompt() a user for their own API key (sadly restricting their usage to the tiny portion of people who both understand API keys and have created API accounts with one of the big providers) - then I stash that key in localStorage and start using it to make requests.

My simonw/tools repository is home to a growing collection of pure HTML+JavaScript tools, hosted at tools.simonwillison.net using GitHub Pages. I love not having to even think about hosting server-side code for these tools.

I've published three tools there that talk to LLMs directly so far:

haiku is a fun demo that requests access to the user's camera and then writes a Haiku about what it sees. It uses Anthropic's Claude 3 Haiku model for this - the whole project is one terrible pun. Haiku source code here.
gemini-bbox uses the Gemini 1.5 Pro (or Flash) API to prompt those models to return bounding boxes for objects in an image, then renders those bounding boxes. Gemini Pro is the only of the vision LLMs that I've tried that has reliable support for bounding boxes. I wrote about this in Building a tool showing how Gemini Pro can return bounding boxes for objects in images.
Gemini Chat App is a more traditional LLM chat interface that again talks to Gemini models (including the new super-speedy gemini-1.5-flash-8b-exp-0827). I built this partly to try out those new models and partly to experiment with implementing a streaming chat interface agaist the Gemini API directly in a browser. I wrote more about how that works in this post.

Here's that Gemini Bounding Box visualization tool:

All three of these tools made heavy use of AI-assisted development: Claude 3.5 Sonnet wrote almost every line of the last two, and the Haiku one was put together a few months ago using Claude 3 Opus.

My personal style of HTML and JavaScript apps turns out to be highly compatible with LLMs: I like using vanilla HTML and JavaScript and keeping everything in the same file, which makes it easy to paste the entire thing into the model and ask it to make some changes for me. This approach also works really well with Claude Artifacts, though I have to tell it "no React" to make sure I get an artifact I can hack on without needing to configure a React build step.

Converting PDFs to HTML and Markdown

I have a long standing vendetta against PDFs for sharing information. They're painful to read on a mobile phone, they have poor accessibility, and even things like copying and pasting text from them can be a pain.

Complaining without doing something about it isn't really my style. Twice in the past few weeks I've taken matters into my own hands:

Google Research released a PDF paper describing their new pipe syntax for SQL. I ran it through Gemini 1.5 Pro to convert it to HTML (prompts here) and got this - a pretty great initial result for the first prompt I tried!
Nous Research released a preliminary report PDF about their DisTro technology for distributed training of LLMs over low-bandwidth connections. I ran a prompt to use Gemini 1.5 Pro to convert that to this Markdown version, which even handled tables.

Within six hours of posting it my Pipe Syntax in SQL conversion was ranked third on Google for the title of the paper, at which point I set it to <meta name="robots" content="noindex> to try and keep the unverified clone out of search. Yet more evidence that HTML is better than PDF!

I've spent less than a total of ten minutes on using Gemini to convert PDFs in this way and the results have been very impressive. If I were to spend more time on this I'd target figures: I have a hunch that getting Gemini to return bounding boxes for figures on the PDF pages could be the key here, since then each figure could be automatically extracted as an image.

I bet you could build that whole thing as a client-side app against the Gemini Pro API, too...

Adding some class to Datasette forms

I've been working on a new Datasette plugin for permissions management, datasette-acl, which I'll write about separately soon.

I wanted to integrate Choices.js with it, to provide a nicer interface for adding permissions to a user or group.

My first attempt at integrating Choices ended up looking like this:

The weird visual glitches are caused by Datasette's core CSS, which included the following rule:

form input[type=submit], form button[type=button] {
    font-weight: 400;
    cursor: pointer;
    text-align: center;
    vertical-align: middle;
    border-width: 1px;
    border-style: solid;
    padding: .5em 0.8em;
    font-size: 0.9rem;
    line-height: 1;
    border-radius: .25rem;
}

These style rules apply to any submit button or button-button that occurs inside a form!

I'm glad I caught this before Datasette 1.0. I've now started the process of fixing that, by ensuring these rules only apply to elements with class="core" (or that class on a wrapping element). This ensures plugins can style these elements without being caught out by Datasette's defaults.

The problem is... there are a whole bunch of existing plugins that currently rely on that behaviour. I have a tricking issue about that, which identified 28 plugins that need updating. I've worked my way through 8 of those so far, hence the flurry of releases listed at the bottom of this post.

This is also an excuse to revisit a bunch of older plugins, some of which had partially complete features that I've been finishing up.

datasette-write for example now has a neat row action menu item for updating a selected row using a pre-canned UPDATE query. Here's an animated demo of my first prototype of that feature:

On the blog

anthropic

Claude's API now supports CORS requests, enabling client-side applications - 2024-08-23
Explain ACLs by showing me a SQLite table schema for implementing them - 2024-08-23
Musing about OAuth and LLMs on Mastodon - 2024-08-24
Building a tool showing how Gemini Pro can return bounding boxes for objects in images - 2024-08-26
Long context prompting tips - 2024-08-26
Anthropic Release Notes: System Prompts - 2024-08-26
Alex Albert: We've read and heard that you'd appreciate more t... - 2024-08-26
Gemini Chat App - 2024-08-27
System prompt for val.town/townie - 2024-08-28
How Anthropic built Artifacts - 2024-08-28
Anthropic's Prompt Engineering Interactive Tutorial - 2024-08-30
llm-claude-3 0.4.1 - 2024-08-30

ai-assisted-programming

Andy Jassy, Amazon CEO: [...] here’s what we found when we integrated [Am... - 2024-08-24
AI-powered Git Commit Function - 2024-08-26
OpenAI: Improve file search result relevance with chunk ranking - 2024-08-30
Forrest Brazeal: I think that AI has killed, or is about to kill, ... - 2024-08-31

gemini

SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL - 2024-08-24
NousResearch/DisTrO - 2024-08-27

python

uvtrick - 2024-09-01
Anatomy of a Textual User Interface - 2024-09-02
Why I Still Use Python Virtual Environments in Docker - 2024-09-02
Python Developers Survey 2023 Results - 2024-09-03

security

Top companies ground Microsoft Copilot over data governance concerns - 2024-08-23
Frederik Braun: In 2021 we [the Mozilla engineering team] found “... - 2024-08-26
OAuth from First Principles - 2024-09-05

projects

My @covidsewage bot now includes useful alt text - 2024-08-25

armin-ronacher

MiniJinja: Learnings from Building a Template Engine in Rust - 2024-08-27

ethics

John Gruber: Everyone alive today has grown up in a world wher... - 2024-08-27

open-source

Debate over “open source AI” term brings new push to formalize definition - 2024-08-27
Elasticsearch is open source, again - 2024-08-29

performance

Cerebras Inference: AI at Instant Speed - 2024-08-28

sqlite

D. Richard Hipp: My goal is to keep SQLite relevant and viable thr... - 2024-08-28

aws

Leader Election With S3 Conditional Writes - 2024-08-30

javascript

Andreas Giammarchi: whenever you do this: `el.innerHTML += HTML` ... - 2024-08-31

openai

OpenAI says ChatGPT usage has doubled since last year - 2024-08-31

art

Ted Chiang: Art is notoriously hard to define, and so are the... - 2024-08-31

llm

anjor: `history | tail -n 2000 | llm -s "Write aliases f... - 2024-09-03

vision-llms

Qwen2-VL: To See the World More Clearly - 2024-09-04

Releases

datasette-import 0.1a5 - 2024-09-04
Tools for importing data into Datasette
datasette-search-all 1.1.3 - 2024-09-04
Datasette plugin for searching all searchable tables at once
datasette-write 0.4 - 2024-09-04
Datasette plugin providing a UI for executing SQL writes against the database
datasette-debug-events 0.1a0 - 2024-09-03
Print Datasette events to standard error
datasette-auth-passwords 1.1.1 - 2024-09-03
Datasette plugin for authentication using passwords
datasette-enrichments 0.4.3 - 2024-09-03
Tools for running enrichments against data stored in Datasette
datasette-configure-fts 1.1.4 - 2024-09-03
Datasette plugin for enabling full-text search against selected table columns
datasette-auth-tokens 0.4a10 - 2024-09-03
Datasette plugin for authenticating access using API tokens
datasette-edit-schema 0.8a3 - 2024-09-03
Datasette plugin for modifying table schemas
datasette-pins 0.1a4 - 2024-09-01
Pin databases, tables, and other items to the Datasette homepage
datasette-acl 0.4a2 - 2024-09-01
Advanced permission management for Datasette
llm-claude-3 0.4.1 - 2024-08-30
LLM plugin for interacting with the Claude 3 family of models

TILs

Testing HTML tables with Playwright Python - 2024-09-04
Using namedtuple for pytest parameterized tests - 2024-08-31

Tags: css, claude-3-5-sonnet, gemini, anthropic, claude, cors, ai, llms, pdf, javascript, datasette, projects, generative-ai, weeknotes

llm-claude-3 0.4.1

2024-08-30T23:28:54+00:00

llm-claude-3 0.4.1

New minor release of my LLM plugin that provides access to the Claude 3 family of models. Claude 3.5 Sonnet recently upgraded to a 8,192 output limit recently (up from 4,096 for the Claude 3 family of models). LLM can now respect that.

The hardest part of building this was convincing Claude to return a long enough response to prove that it worked. At one point I got into an argument with it, which resulted in this fascinating hallucination:

I eventually got a 6,162 token output using:

cat long.txt | llm -m claude-3.5-sonnet-long --system 'translate this document into french, then translate the french version into spanish, then translate the spanish version back to english. actually output the translations one by one, and be sure to do the FULL document, every paragraph should be translated correctly. Seriously, do the full translations - absolutely no summaries!'

Tags: llm, anthropic, claude, generative-ai, projects, ai, llms, prompt-engineering, claude-3-5-sonnet

Gemini Chat App

2024-08-27T22:48:56+00:00

Gemini Chat App

Google released three new Gemini models today: improved versions of Gemini 1.5 Pro and Gemini 1.5 Flash plus a new model, Gemini 1.5 Flash-8B, which is significantly faster (and will presumably be cheaper) than the regular Flash model.

The Flash-8B model is described in the Gemini 1.5 family of models paper in section 8:

By inheriting the same core architecture, optimizations, and data mixture refinements as its larger counterpart, Flash-8B demonstrates multimodal capabilities with support for context window exceeding 1 million tokens. This unique combination of speed, quality, and capabilities represents a step function leap in the domain of single-digit billion parameter models.

While Flash-8B’s smaller form factor necessarily leads to a reduction in quality compared to Flash and 1.5 Pro, it unlocks substantial benefits, particularly in terms of high throughput and extremely low latency. This translates to affordable and timely large-scale multimodal deployments, facilitating novel use cases previously deemed infeasible due to resource constraints.

The new models are available in AI Studio, but since I built my own custom prompting tool against the Gemini CORS-enabled API the other day I figured I'd build a quick UI for these new models as well.

Building this with Claude 3.5 Sonnet took literally ten minutes from start to finish - you can see that from the timestamps in the conversation. Here's the deployed app and the finished code.

The feature I really wanted to build was streaming support. I started with this example code showing how to run streaming prompts in a Node.js application, then told Claude to figure out what the client-side code for that should look like based on a snippet from my bounding box interface hack. My starting prompt:

Build me a JavaScript app (no react) that I can use to chat with the Gemini model, using the above strategy for API key usage

I still keep hearing from people who are skeptical that AI-assisted programming like this has any value. It's honestly getting a little frustrating at this point - the gains for things like rapid prototyping are so self-evident now.

Tags: claude-3-5-sonnet, gemini, ai-assisted-programming, javascript, generative-ai, ai, llms, projects, anthropic, claude

Building a tool showing how Gemini Pro can return bounding boxes for objects in images

2024-08-26T04:55:28+00:00

I was browsing through Google's Gemini documentation while researching how different multi-model LLM APIs work when I stumbled across this note in the vision documentation:

You can ask the model for the coordinates of bounding boxes for objects in images. For object detection, the Gemini model has been trained to provide these coordinates as relative widths or heights in range [0,1], scaled by 1000 and converted to an integer. Effectively, the coordinates given are for a 1000x1000 version of the original image, and need to be converted back to the dimensions of the original image.

This is a pretty neat capability! OpenAI's GPT-4o and Anthropic's Claude 3 and Claude 3.5 models can't do this (yet).

I tried a few prompts using Google's Python library and got back what looked like bounding boxes!

>>> import google.generativeai as genai
>>> genai.configure(api_key="...")
>>> model = genai.GenerativeModel(model_name="gemini-1.5-pro-latest")
>>> import PIL.Image
>>> goats = PIL.Image.open("/tmp/goats.jpeg")
>>> prompt = 'Return bounding boxes around every goat, for each one return [ymin, xmin, ymax, xmax]'
>>> response = model.generate_content([goats, prompt])
print(response.text)
>>> print(response.text)
- [200, 90, 745, 527]
- [300, 610, 904, 937]

But how to verify that these were useful co-ordinates? I fired up Claude 3.5 Sonnet and started iterating on Artifacts there to try and visualize those co-ordinates against the original image.

After some fiddling around, I built an initial debug tool that I could paste co-ordinates into and select an image and see that image rendered.

A tool for prompting with an image and rendering the bounding boxes

I wrote the other day about Anthropic's new support for CORS headers, enabling direct browser access to their APIs.

Google Gemini supports CORS as well! So do OpenAI, which means that all three of the largest LLM providers can now be accessed directly from the browser.

I decided to build a combined tool that could prompt Gemini 1.5 Pro with an image directly from the browser, then render the returned bounding boxes on that image.

The new tool lives here: https://tools.simonwillison.net/gemini-bbox

The first time you run a prompt it will ask you for a Gemini API key, which it stores in your browser's localStorage. I promise not to add code that steals your keys in the future, but if you don't want to trust that you can grab a copy of the code, verify it and then run it yourself.

Building this tool with Claude 3.5 Sonnet

This is yet another example of a tool that I mostly built by prompting Claude 3.5 Sonnet. Here are some more.

I started out with this lengthy conversation (transcript exported with this tool) to help build the original tool for opening an image and pasting in those bounding box coordinates. That sequence started like this:

Build an artifact where I can open an image from my browser and paste the following style of text into a textarea:
- [488, 945, 519, 999]
- [460, 259, 487, 307]
- [472, 574, 498, 612]
(The hyphens may not be there, so scan with a regex for [ num, num, num, num ])

Each of those represent [ymin, xmin, ymax, xmax] coordinates on the image - but they are numbers between 0 and 1000 so they correspond to the image is if it had been resized to 1000x1000

As soon as the coords are pasted the corresponding boxes should be drawn on the images, corrected for its actual dimensions

The image should be show with a width of 80% of the page

The boxes should be in different colours, and hovering over each box should show the original bounding box coordinates below the image

Once that tool appeared to be doing the right thing (I had to muck around with how the coordinates were processed a bunch) I used my favourite prompting trick to build the combined tool that called the Gemini API. I found this example that calls the @google/generative-ai API from a browser, pasted the full example into Claude along with my previous bounding box visualization tool and had it combine them to achieve the desired result:

Based on that example text, build me an HTML page with Vanilla JS that loads the Gemini API from esm.run - it should have a file input and a textarea and a submit button - you attach an image, enter a prompt and then click the button and it does a Gemini prompt with that image and prompt and injects the returned result into a div on the page

Then this follow-up prompt:

now incorporate the logic from this tool (I pasted in that HTML too), such that when the response is returned from the prompt the image is displayed with any rendered bounding boxes

Dealing with image orientation bugs

Bounding boxes are fiddly things. The code I had produced above seemed to work... but in some of my testing the boxes didn't show up in quite the right place. Was this just Gemini 1.5 Pro being unreliable in how it returned the boxes? That seemed likely, but I had some nagging doubts.

On a hunch, I took an image that was behaving strangely, took a screenshot of it and tried that screenshot as a JPEG. The bounding boxes that came back were different - they appeared rotated!

I've seen this kind of thing before with photos taken on an iPhone. There's an obscure piece of JPEG metadata which can set the orientation on a photo, and some software fails to respect that.

Was that affecting my bounding box tool? I started digging into those photos to figure that out, using a combination of ChatGPT Code Interpreter (since that can read JPEG binary data using Python) and Claude Artifacts (to build me a visible UI for exploring my photos).

My hunch turned out to be correct: my iPhone photos included TIFF orientation metadata which the Gemini API appeared not to respect. As a result, some photos taken by my phone would return bounding boxes that were rotated 180 degrees.

My eventual fix was to take the image provided by the user, render it to a <canvas> element and then export it back out as a JPEG again - code here. I got Claude to add that for me based on code I pasted in from my earlier image resize quality tool, also built for me by Claude.

As part of this investigation I built another tool, which can read orientation TIFF data from a JPEG entirely in JavaScript and help show what's going on:

https://tools.simonwillison.net/tiff-orientation

Here's the source code for that. The source code is a great example of the kind of thing that LLMs can do much more effectively than I can - here's an illustrative snippet:

// Determine endianness
const endian = view.getUint16(tiffStart, false);
const isLittleEndian = (endian === 0x4949);  // 'II' in ASCII
debugInfo += `Endianness: ${isLittleEndian ? 'Little Endian' : 'Big Endian'}\n`;

// Check TIFF header validity
const tiffMagic = view.getUint16(tiffStart + 2, isLittleEndian);
if (tiffMagic !== 42) {
    throw Object.assign(new Error('Not a valid TIFF header'), { debugInfo });
}
debugInfo += 'Valid TIFF header\n';

// Get offset to first IFD
const ifdOffset = view.getUint32(tiffStart + 4, isLittleEndian);
const ifdStart = tiffStart + ifdOffset;
debugInfo += `IFD start: ${ifdStart}\n`;

LLMs know their binary file formats, so I frequently find myself asking them to write me custom binary processing code like this.

Here's the Claude conversation I had to build that tool. After failing to get it to work several times I pasted in Python code that I'd built using ChatGPT Code Interpreter and prompted:

Here's Python code that finds it correctly:

Which turned out to provide the missing details to help it build me the JavaScript version I could run in my browser. Here's the ChatGPT conversation that got me that Python code.

Mixing up a whole bunch of models

This whole process was very messy, but it's a pretty accurate representation of my workflow when using these models. I used three different tools here:

Gemini 1.5 Pro and the Gemini API to take images and a prompt and return bounding boxes
Claude 3.5 Sonnet and Claude Artifacts to write code for working against that API and build me interactive tools for visualizing the results
GPT-4o and ChatGPT Code Interpreter to write and execute Python code to try and help me figure out what was going on with my weird JPEG image orientation bugs

I copied code between models a bunch of times too - pasting Python code written by GPT-4o into Claude 3.5 Sonnet to help it write the correct JavaScript for example.

How good is the code that I produced by the end of this all? It honestly doesn't matter very much to me: this is a very low-stakes project, where the goal was a single web page tool that can run a prompt through a model and visualize the response.

If I was writing code "for production" - for a long-term project, or code that I intended to package up and release as an open source library - I would sweat the details a whole lot more. But for this kind of exploratory and prototyping work I'm increasingly comfortable hacking away at whatever the models spit out until it achieves the desired effect.

Tags: vision-llms, claude-3-5-sonnet, gemini, anthropic, claude, cors, ai, llms, claude-artifacts, ai-assisted-programming, google, generative-ai, projects

My @covidsewage bot now includes useful alt text

2024-08-25T16:09:49+00:00

My @covidsewage bot now includes useful alt text

I've been running a @covidsewage Mastodon bot for a while now, posting daily screenshots (taken with shot-scraper) of the Santa Clara County COVID in wastewater dashboard.

Prior to today the screenshot was accompanied by the decidedly unhelpful alt text "Screenshot of the latest Covid charts".

I finally fixed that today, closing issue #2 more than two years after I first opened it.

The screenshot is of a Microsoft Power BI dashboard. I hoped I could scrape the key information out of it using JavaScript, but the weirdness of their DOM proved insurmountable.

Instead, I'm using GPT-4o - specifically, this Python code (run using a python -c block in the GitHub Actions YAML file):

import base64, openai
client = openai.OpenAI()
with open('/tmp/covid.png', 'rb') as image_file:
    encoded_image = base64.b64encode(image_file.read()).decode('utf-8')
messages = [
    {'role': 'system',
     'content': 'Return the concentration levels in the sewersheds - single paragraph, no markdown'},
    {'role': 'user', 'content': [
        {'type': 'image_url', 'image_url': {
            'url': 'data:image/png;base64,' + encoded_image
        }}
    ]}
]
completion = client.chat.completions.create(model='gpt-4o', messages=messages)
print(completion.choices[0].message.content)

I'm base64 encoding the screenshot and sending it with this system prompt:

Return the concentration levels in the sewersheds - single paragraph, no markdown

Given this input image:

Here's the text that comes back:

The concentration levels of SARS-CoV-2 in the sewersheds from collected samples are as follows: San Jose Sewershed has a high concentration, Palo Alto Sewershed has a high concentration, Sunnyvale Sewershed has a high concentration, and Gilroy Sewershed has a medium concentration.

The full implementation can be found in the GitHub Actions workflow, which runs on a schedule at 7am Pacific time every day.

Tags: shot-scraper, openai, covid19, gpt-4, ai, llms, generative-ai, projects, alt-attribute, accessibility

Claude's API now supports CORS requests, enabling client-side applications

2024-08-23T02:29:08+00:00

Anthropic have enabled CORS support for their JSON APIs, which means it's now possible to call the Claude LLMs directly from a user's browser.

This massively significant new feature is tucked away in this pull request: anthropic-sdk-typescript: add support for browser usage, via this issue.

This change to the Anthropic TypeScript SDK reveals the new JSON API feature, which I found by digging through the code.

You can now add the following HTTP request header to enable CORS support for the Anthropic API, which means you can make calls to Anthropic's models directly from a browser:

anthropic-dangerous-direct-browser-access: true

Anthropic had been resistant to adding this feature because it can encourage a nasty anti-pattern: if you embed your API key in your client code, anyone with access to that site can steal your API key and use it to make requests on your behalf.

Despite that, there are legitimate use cases for this feature. It's fine for internal tools exposed to trusted users, or you can implement a "bring your own API key" pattern where users supply their own key to use with your client-side app.

As it happens, I've built one of those apps myself! My Haiku page is a simple client-side app that requests access to your webcam, asks for an Anthropic API key (which it stores in the browser’s localStorage), and then lets you take a photo and turns it into a Haiku using their fast and inexpensive Haiku model.

Previously I had to run my own proxy on Vercel adding CORS support to the Anthropic API just to get my Haiku app to work.

This evening I upgraded the app to send that new header, and now it can talk to Anthropic directly without needing my proxy.

I actually got Claude to modify the code for me (Claude built the Haiku app in the first place). Amusingly Claude first argued against it:

I must strongly advise against making direct API calls from a browser, as it exposes your API key and violates best practices for API security.

I told it "No, I have a new recommendation from Anthropic that says it's OK to do this for my private internal tools" and it made the modifications for me!

The full source code can be seen here. Here's a simplified JavaScript snippet illustrating how to call their API from the browser using the new header:

fetch("https://api.anthropic.com/v1/messages", {
  method: "POST",
  headers: {
    "x-api-key": apiKey,
    "anthropic-version": "2023-06-01",
    "content-type": "application/json",
    "anthropic-dangerous-direct-browser-access": "true",
  },
  body: JSON.stringify({
    model: "claude-3-haiku-20240307",
    max_tokens: 1024,
    messages: [
      {
        role: "user",
        content: [
          { type: "text", text: "Return a haiku about how great pelicans are" },
        ],
      },
    ],
  }),
})
  .then((response) => response.json())
  .then((data) => {
    const haiku = data.content[0].text;
    alert(haiku);
  });

Tags: anthropic, claude, security, javascript, projects, cors, apis, ai-assisted-programming, llms, ai, generative-ai

Fix @covidsewage bot to handle a change to the underlying website

2024-08-18T17:26:32+00:00

Fix @covidsewage bot to handle a change to the underlying website

I've been running @covidsewage on Mastodon since February last year tweeting a daily screenshot of the Santa Clara County charts showing Covid levels in wastewater.

A few days ago the county changed their website, breaking the bot. The chart now lives on their new COVID in wastewater page.

It's still a Microsoft Power BI dashboard in an <iframe>, but my initial attempts to scrape it didn't quite work. Eventually I realized that Cloudflare protection was blocking my attempts to access the page, but thankfully sending a Firefox user-agent fixed that problem.

The new recipe I'm using to screenshot the chart involves a delightfully messy nested set of calls to shot-scraper - first using shot-scraper javascript to extract the URL attribute for that <iframe>, then feeding that URL to a separate shot-scraper call to generate the screenshot:

shot-scraper -o /tmp/covid.png $(
  shot-scraper javascript \
    'https://publichealth.santaclaracounty.gov/health-information/health-data/disease-data/covid-19/covid-19-wastewater' \
    'document.querySelector("iframe").src' \
    -b firefox \
    --user-agent 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:128.0) Gecko/20100101 Firefox/128.0' \
    --raw
) --wait 5000 -b firefox --retina

Tags: projects, covid19, shot-scraper

Upgrading my cookiecutter templates to use python -m pytest

2024-08-17T05:12:47+00:00

Upgrading my cookiecutter templates to use python -m pytest

Every now and then I get caught out by weird test failures when I run pytest and it turns out I'm running the wrong installation of that tool, so my tests fail because that pytest is executing in a different virtual environment from the one needed by the tests.

The fix for this is easy: run python -m pytest instead, which guarantees that you will run pytest in the same environment as your currently active Python.

Yesterday I went through and updated every one of my cookiecutter templates (python-lib, click-app, datasette-plugin, sqlite-utils-plugin, llm-plugin) to use this pattern in their READMEs and generated repositories instead, to help spread that better recipe a little bit further.

Tags: cookiecutter, projects, python, pytest

datasette-checkbox

2024-08-16T21:28:09+00:00

datasette-checkbox

I built this fun little Datasette plugin today, inspired by a conversation I had in Datasette Office Hours.

If a user has the update-row permission and the table they are viewing has any integer columns with names that start with is_ or should_ or has_, the plugin adds interactive checkboxes to that table which can be toggled to update the underlying rows.

This makes it easy to quickly spin up an interface that allows users to review and update boolean flags in a table.

I have ambitions for a much more advanced version of this, where users can do things like add or remove tags from rows directly in that table interface - but for the moment this is a neat starting point, and it only took an hour to build (thanks to help from Claude to build an initial prototype, chat transcript here).

Tags: projects, claude-3-5-sonnet, datasette, plugins

Datasette 1.0a15

2024-08-16T05:06:51+00:00

Datasette 1.0a15

Mainly bug fixes, but a couple of minor new features:

Datasette now defaults to hiding SQLite "shadow" tables, as seen in extensions such as SQLite FTS and sqlite-vec. Virtual tables that it makes sense to display, such as FTS core tables, are no longer hidden. Thanks, Alex Garcia. (#2296)
The Datasette homepage is now duplicated at /-/, using the default index.html template. This ensures that the information on that page is still accessible even if the Datasette homepage has been customized using a custom index.html template, for example on sites like datasette.io. (#2393)

Datasette also now serves more user-friendly CSRF pages, an improvement which required me to ship asgi-csrf 0.10.

Tags: releases, datasette, projects, csrf

Share Claude conversations by converting their JSON to Markdown

2024-08-08T20:40:20+00:00

Share Claude conversations by converting their JSON to Markdown

Anthropic's Claude is missing one key feature that I really appreciate in ChatGPT: the ability to create a public link to a full conversation transcript. You can publish individual artifacts from Claude, but I often find myself wanting to publish the whole conversation.

Before ChatGPT added that feature I solved it myself with this ChatGPT JSON transcript to Markdown Observable notebook. Today I built the same thing for Claude.

Here's how to use it:

The key is to load a Claude conversation on their website with your browser DevTools network panel open and then filter URLs for chat_. You can use the Copy -> Response right click menu option to get the JSON for that conversation, then paste it into that new Observable notebook to get a Markdown transcript.

I like sharing these by pasting them into a "secret" Gist - that way they won't be indexed by search engines (adding more AI generated slop to the world) but can still be shared with people who have the link.

Here's an example transcript from this morning. I started by asking Claude:

I want to breed spiders in my house to get rid of all of the flies. What spider would you recommend?

When it suggested that this was a bad idea because it might attract pests, I asked:

What are the pests might they attract? I really like possums

It told me that possums are attracted by food waste, but "deliberately attracting them to your home isn't recommended" - so I said:

Thank you for the tips on attracting possums to my house. I will get right on that! [...] Once I have attracted all of those possums, what other animals might be attracted as a result? Do you think I might get a mountain lion?

It emphasized how bad an idea that would be and said "This would be extremely dangerous and is a serious public safety risk.", so I said:

OK. I took your advice and everything has gone wrong: I am now hiding inside my house from the several mountain lions stalking my backyard, which is full of possums

Claude has quite a preachy tone when you ask it for advice on things that are clearly a bad idea, which makes winding it up with increasingly ludicrous questions a lot of fun.

Tags: anthropic, claude, markdown, ai, llms, tools, generative-ai, projects, json, observable

django-http-debug, a new Django app mostly written by Claude

2024-08-08T15:26:27+00:00

Yesterday I finally developed something I’ve been casually thinking about building for a long time: django-http-debug. It’s a reusable Django app - something you can pip install into any Django project - which provides tools for quickly setting up a URL that returns a canned HTTP response and logs the full details of any incoming request to a database table.

This is ideal for any time you want to start developing against some external API that sends traffic to your own site - a webhooks provider like Stripe, or an OAuth or OpenID connect integration (my task yesterday morning).

You can install it right now in your own Django app: add django-http-debug to your requirements (or just pip install django-http-debug), then add the following to your settings.py:

INSTALLED_APPS = [
    # ...
    'django_http_debug',
    # ...
]

MIDDLEWARE = [
    # ...
    "django_http_debug.middleware.DebugMiddleware",
    # ...
]

You'll need to have the Django Admin app configured as well. The result will be two new models managed by the admin - one for endpoints:

And a read-only model for viewing logged requests:

It’s possible to disable logging for an endpoint, which means django-http-debug doubles as a tool for adding things like a robots.txt to your site without needing to deploy any additional code.

How it works

The key to how this works is this piece of middleware:

class DebugMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        response = self.get_response(request)
        if response.status_code == 404:
            path = request.path.lstrip("/")
            debug_response = debug_view(request, path)
            if debug_response:
                return debug_response
        return response

This dispatches to the default get_response() function, then intercepts the result and checks if it's a 404. If so, it gives the debug_view() function an opportunity to respond instead - which might return None, in which case that original 404 is returned to the client.

That debug_view() function looks like this:

@csrf_exempt
def debug_view(request, path):
    try:
        endpoint = DebugEndpoint.objects.get(path=path)
    except DebugEndpoint.DoesNotExist:
        return None  # Allow normal 404 handling to continue

    if endpoint.logging_enabled:
        log_entry = RequestLog(
            endpoint=endpoint,
            method=request.method,
            query_string=request.META.get("QUERY_STRING", ""),
            headers=dict(request.headers),
        )
        log_entry.set_body(request.body)
        log_entry.save()

    content = endpoint.content
    if endpoint.is_base64:
        content = base64.b64decode(content)

    response = HttpResponse(
        content=content,
        status=endpoint.status_code,
        content_type=endpoint.content_type,
    )
    for key, value in endpoint.headers.items():
        response[key] = value

    return response

It checks the database for an endpoint matching the incoming path, then logs the response (if the endpoint has logging_enabled set) and returns a canned response based on the endpoint configuration.

Here are the models:

from django.db import models
import base64


class DebugEndpoint(models.Model):
    path = models.CharField(max_length=255, unique=True)
    status_code = models.IntegerField(default=200)
    content_type = models.CharField(max_length=64, default="text/plain; charset=utf-8")
    headers = models.JSONField(default=dict, blank=True)
    content = models.TextField(blank=True)
    is_base64 = models.BooleanField(default=False)
    logging_enabled = models.BooleanField(default=True)

    def __str__(self):
        return self.path

    def get_absolute_url(self):
        return f"/{self.path}"


class RequestLog(models.Model):
    endpoint = models.ForeignKey(DebugEndpoint, on_delete=models.CASCADE)
    method = models.CharField(max_length=10)
    query_string = models.CharField(max_length=255, blank=True)
    headers = models.JSONField()
    body = models.TextField(blank=True)
    is_base64 = models.BooleanField(default=False)
    timestamp = models.DateTimeField(auto_now_add=True)

    def __str__(self):
        return f"{self.method} {self.endpoint.path} at {self.timestamp}"

    def set_body(self, body):
        try:
            # Try to decode as UTF-8
            self.body = body.decode("utf-8")
            self.is_base64 = False
        except UnicodeDecodeError:
            # If that fails, store as base64
            self.body = base64.b64encode(body).decode("ascii")
            self.is_base64 = True

    def get_body(self):
        if self.is_base64:
            return base64.b64decode(self.body.encode("ascii"))
        return self.body

The admin screens are defined in admin.py.

Claude built the first version of this for me

This is a classic example of a project that I couldn’t quite justify building without assistance from an LLM. I wanted it to exist, but I didn't want to spend a whole day building it.

Claude 3.5 Sonnet got me 90% of the way to a working first version. I had to make a few tweaks to how the middleware worked, but having done that I had a working initial prototype within a few minutes of starting the project.

Here’s the full sequence of prompts I used, each linking to the code that was produced for me (as a Claude artifact):

I want a Django app I can use to help create HTTP debugging endpoints. It should let me configure a new path e.g. /webhooks/receive/ that the Django 404 handler then hooks into - if one is configured it can be told which HTTP status code, headers and content to return.

ALL traffic to those endpoints is logged to a Django table - full details of incoming request headers, method and body. Those can be browsed read-only in the Django admin (and deleted)

Produced Claude v1

make it so I don't have to put it in the urlpatterns because it hooks ito Django's 404 handling mechanism instead

Produced Claude v2

Suggestions for how this could handle request bodies that don't cleanly decode to utf-8

Produced Claude v3

don't use a binary field, use a text field but still store base64 data in it if necessary and have a is_base64 boolean column that gets set to true if that happens

Produced Claude v4

I took that code and ran with it - I fired up a new skeleton library using my python-lib cookiecutter template, copied the code into it, made some tiny changes to get it to work and shipped it as an initial alpha release - mainly so I could start exercising it on a couple of sites I manage.

Using it in the wild for a few minutes quickly identified changes I needed to make. I filed those as issues:

Then I worked though fixing each of those one at a time. I did most of this work myself, though GitHub Copilot helped me out be typing some of the code for me.

Adding the base64 preview

There was one slightly tricky feature I wanted to add that didn’t justify spending much time on but was absolutely a nice-to-have.

The logging mechanism supports binary data: if incoming request data doesn’t cleanly encode as UTF-8 it gets stored as Base 64 text instead, with the is_base64 flag set to True (see the set_body() method in the RequestLog model above).

I asked Claude for a curl one-liner to test this and it suggested:

curl -X POST http://localhost:8000/foo/ \
  -H "Content-Type: multipart/form-data" \
  -F "image=@pixel.gif"

I do this a lot - knocking out quick curl commands is an easy prompt, and you can tell it the URL and headers you want to use, saving you from having to edit the command yourself later on.

I decided to have the Django Admin view display a decoded version of that Base 64 data. But how to render that, when things like binary file uploads may not be cleanly renderable as text?

This is what I came up with:

The trick here I'm using here is to display the decoded data as a mix between renderable characters and hex byte pairs, with those pairs rendered using a different font to make it clear that they are part of the binary data.

This is achieved using a body_display() method on the RequestLogAdmin admin class, which is then listed in readonly_fields. The full code is here, this is that method:

    def body_display(self, obj):
        body = obj.get_body()
        if not isinstance(body, bytes):
            return format_html("<pre>{}</pre>", body)

        # Attempt to guess filetype
        suggestion = None
        match = filetype.guess(body[:1000])
        if match:
            suggestion = "{} ({})".format(match.extension, match.mime)

        encoded = repr(body)
        # Ditch the b' and trailing '
        if encoded.startswith("b'") and encoded.endswith("'"):
            encoded = encoded[2:-1]

        # Split it into sequences of octets and characters
        chunks = sequence_re.split(encoded)
        html = []
        if suggestion:
            html.append(
                '<p style="margin-top: 0; font-family: monospace; font-size: 0.8em;">Suggestion: {}</p>'.format(
                    suggestion
                )
            )
        for chunk in chunks:
            if sequence_re.match(chunk):
                octets = octet_re.findall(chunk)
                octets = [o[2:] for o in octets]
                html.append(
                    '<code style="color: #999; font-family: monospace">{}</code>'.format(
                        " ".join(octets).upper()
                    )
                )
            else:
                html.append(chunk.replace("\\\\", "\\"))

        return mark_safe(" ".join(html).strip().replace("\\r\\n", "<br>"))

I got Claude to write that using one of my favourite prompting tricks. I'd solved this problem once before in the past, in my datasette-render-binary project. So I pasted that code into Claude, told it:

With that code as inspiration, modify the following Django Admin code to use that to display decoded base64 data:

And then pasted in my existing Django admin class. You can see my full prompt here.

Claude replied with this code, which almost worked exactly as intended - I had to make one change, swapping out the last line for this:

        return mark_safe(" ".join(html).strip().replace("\\r\\n", "<br>"))

I love this pattern: "here's my existing code, here's some other code I wrote, combine them together to solve this problem". I wrote about this previously when I described how I built my PDF OCR JavaScript tool a few months ago.

Adding automated tests

The final challenge was the hardest: writing automated tests. This was difficult because Django tests need a full Django project configured for them, and I wasn’t confident about the best pattern for doing that in my standalone django-http-debug repository since it wasn’t already part of an existing Django project.

I decided to see if Claude could help me with that too, this time using my files-to-prompt and LLM command-line tools:

files-to-prompt . --ignore LICENSE | \
  llm -m claude-3.5-sonnet -s \
  'step by step advice on how to implement automated tests for this, which is hard because the tests need to work within a temporary Django project that lives in the tests/ directory somehow. Provide all code at the end.'

Here's Claude's full response. It almost worked! It gave me a minimal test project in tests/test_project and an initial set of quite sensible tests.

Sadly it didn’t quite solve the most fiddly problem for me: configuring it so running pytest would correctly set the Python path and DJANGO_SETTINGS_MODULE in order run the tests. I saw this error instead:

django.core.exceptions.ImproperlyConfigured: Requested setting INSTALLED_APPS, but settings are not configured. You must either define the environment variable DJANGO_SETTINGS_MODULE or call settings.configure() before accessing settings.

I spent some time with the relevant pytest-django documentation and figure out a pattern that worked. Short version: I added this to my pyproject.toml file:

[tool.pytest.ini_options]
DJANGO_SETTINGS_MODULE = "tests.test_project.settings"
pythonpath = ["."]

For the longer version, take a look at my full TIL: Using pytest-django with a reusable Django application.

Test-supported cleanup

The great thing about having comprehensive tests in place is it makes iterating on the project much faster. Claude had used some patterns that weren’t necessary. I spent a few minutes seeing if the tests still passed if I deleted various pieces of code, and cleaned things up quite a bit.

Was Claude worth it?

This entire project took about two hours - just within a tolerable amount of time for what was effectively a useful sidequest from my intended activity for the day.

Claude didn't implement the whole project for me. The code it produced didn't quite work - I had to tweak just a few lines of code, but knowing which code to tweak took a development environment and manual testing and benefited greatly from my 20+ years of Django experience!

This is yet another example of how LLMs don't replace human developers: they augment us.

The end result is a tool that I'm already using to solve real-world problems, and a code repository that I'm proud to put my name to. Without LLM assistance this project would have stayed on my ever-growing list of "things I'd love to build one day".

I'm also really happy to have my own documented solution to the challenge of adding automated tests to a standalone reusable Django application. I was tempted to skip this step entirely, but thanks to Claude's assistance I was able to break that problem open and come up with a solution that I'm really happy with.

Last year I wrote about how AI-enhanced development makes me more ambitious with my projects. It's also helping me be more diligent in not taking shortcuts like skipping setting up automated tests.

Tags: anthropic, claude, webhooks, ai, django, llms, ai-assisted-programming, python, generative-ai, projects, claude-3-5-sonnet

Datasette 1.0a14: The annotated release notes

2024-08-05T23:20:01+00:00

Released today: Datasette 1.0a14. This alpha includes significant contributions from Alex Garcia, including some backwards-incompatible changes in the run-up to the 1.0 release.

Metadata now lives in a database
datasette-remote-metadata 0.2a0
SQLite isolation_level="IMMEDIATE"
Updating the URLs
Everything else
Tricks to help construct the release notes

Metadata now lives in a database

The biggest change in the alpha concerns how Datasette's metadata system works.

Datasette can record and serve metadata about the databases, tables and columns that it is serving. This includes things like the source of the data, the license it is made available under and descriptions of the tables and columns.

Historically this has been powered by a metadata.json file. Over time, this file grew to include all sorts of things that weren't strictly metadata - things like plugin configuration. Cleaning this up is a major breaking change for Datasette 1.0, and Alex has been working on this across several alphas.

The latest alpha adds a new upgrade guide describing changes plugin authors will need to make to support the new metadata system.

The big change in 1.0a14 is that metadata now lives in Datasette's hidden _internal SQLite database, in four new tables called metadata_instance, metadata_databases, metadata_resources and metadata_columns. The schema for these is now included in the documentation (updated using this Cog code), but rather than accessing those tables directly plugins are encouraged to use the new set_*_metadata() and get_*_metadata() methods on the Datasette class.

I plan to use these new tables to build a new performant, paginated homepage that shows all of the databases and tables that Datasette is serving, complete with their metadata - without needing to make potentially hundreds of calls to the now-removed get_metadata() plugin hook.

datasette-remote-metadata 0.2a0

When introducing new plugin internals like this it's always good to accompany them with a plugin that exercises them. datasette-remote-metadata is a few years old now, and provides a mechanism for hosting the metadata for a Datasette instance at a separate URL. This means you can deploy a stateless Datasette instance with a large database and later update the attached metadata without having to re-deploy the whole thing.

I released a new alpha of that plugin which switches over to the new metadata mechanism. The core code ended up looking like this, imitating code Alex wrote for Datasette Core:

async def apply_metadata(datasette, metadata_dict):
    for key in metadata_dict or {}:
        if key == "databases":
            continue
        await datasette.set_instance_metadata(key, metadata_dict[key])
    # database-level
    for dbname, db in metadata_dict.get("databases", {}).items():
        for key, value in db.items():
            if key == "tables":
                continue
            await datasette.set_database_metadata(dbname, key, value)
        # table-level
        for tablename, table in db.get("tables", {}).items():
            for key, value in table.items():
                if key == "columns":
                    continue
                await datasette.set_resource_metadata(dbname, tablename, key, value)
            # column-level
            for columnname, column_description in table.get("columns", {}).items():
                await datasette.set_column_metadata(
                    dbname, tablename, columnname, "description", column_description
                )

SQLite isolation_level="IMMEDIATE"

Sylvain Kerkour wrote about the benefits of IMMEDIATE transactions back in February. The key issue here is that SQLite defaults to starting transactions in DEFERRED mode, which can lead to SQLITE_BUSY errors if a transaction is upgraded to a write transaction mid-flight. Starting in IMMEDIATE mode for Datasette's dedicated write connection should help avoid this.

Frustratingly I failed to replicate the underlying problem in my own tests, despite having anecdotally seen it happen in the past.

After spending more time than I had budgeted for on this, I decided to ship it as an alpha to get it properly exercised before the 1.0 stable release.

Updating the URLs

Here's another change that was important to get out before 1.0.

Datasette's URL design had a subtle blemish. The following page had two potential meanings:

/databasename - list all of the tables in the specified database
/databasename?sql= - execute an arbitrary SQL query against that database

This also meant that the JSON structure returned by /database.json v.s. /database.json?sql= was different.

Alex and I decided to fix that. Alex laid out the new design in issue #2360 - there are quite a few other changes, but the big one is that we are splitting out the SQL query interface to a new URL: /databasename/-/query?sql= - or /databasename/-/query.json?sql= for the JSON API.

We've added redirects from the old URLs to the new ones, so existing links should continue to work.

Everything else

Fix for a bug where canned queries with named parameters could fail against SQLite 3.46. (#2353)

This reflects a bug fix that went out in Datasette 0.64.7.

Datasette now serves E-Tag headers for static files. Thanks, Agustin Bacigalup. (#2306)

There's still more to be done making Datasette play well with caches, but this is a great, low-risk start.

Dropdown menus now use a z-index that should avoid them being hidden by plugins. (#2311)

A cosmetic bug that showed up on Datasette Cloud when using the datasette-cluster-map plugin.

Incorrect table and row names are no longer reflected back on the resulting 404 page. (#2359)

This was reported as a potential security issue. The table names were correctly escaped, so this wasn't an XSS, but there was still potential for confusion if an attacker constructed a URL along the lines of /database-does-not-exist-visit-www.attacker.com-for-more-info. A similar fix went out in Datasette 0.64.8.

Improved documentation for async usage of the track_event(datasette, event) hook. (#2319)

Fixed some HTTPX deprecation warnings. (#2307)

Datasette now serves a <html lange="en"> attribute. Thanks, Charles Nepote. (#2348)

Datasette's automated tests now run against the maximum and minimum supported versions of SQLite: 3.25 (from September 2018) and 3.46 (from May 2024). Thanks, Alex Garcia. (#2352)

Fixed an issue where clicking twice on the URL output by datasette --root produced a confusing error. (#2375)

Tricks to help construct the release notes

I still write the Datasette release notes entirely by hand (aside from a few words auto-completed by GitHub Copilot) - I find the process of writing them to be really useful as a way to construct a final review of everything before it goes out.

I used a couple of tricks to help this time. I always start my longer release notes with an issue. The GitHub diff view is useful for seeing what's changed since the last release, but I took it a step further this time with the following shell command:

git log --pretty=format:"- %ad: %s %h" --date=short --reverse 1.0a13...81b68a14

This outputs a summary of each commit in the range, looking like this (truncated):

- 2024-03-12: Added two things I left out of the 1.0a13 release notes 8b6f155b
- 2024-03-15: Fix httpx warning about app=self.app, refs #2307 5af68377
- 2024-03-15: Fixed cookies= httpx warning, refs #2307 54f5604c
...

Crucially, the syntax of this output is in GitHub Flavored Markdown - and pasting it into an issue comment causes both the issue references and the commit hashes to be expanded into links that look like this:

It's a neat way to get a quick review of what's changed, and also means that those issues will automatically link back to the new issue where I'm constructing the release notes.

I wrote this up in a TIL here, along with another trick I used where I used LLM to get Claude 3.5 Sonnet to summarize my changes for me:

curl 'https://github.com/simonw/datasette/compare/1.0a13...2ad51baa3.diff' \
  | llm -m claude-3.5-sonnet --system \
  'generate a short summary of these changes, then a bullet point list of detailed release notes'

Tags: llm, releases, sqlite, datasette, projects, annotated-release-notes

Image resize and quality comparison

2024-07-26T13:20:16+00:00

Image resize and quality comparison

Another tiny tool I built with Claude 3.5 Sonnet and Artifacts. This one lets you select an image (or drag-drop one onto an area) and then displays that same image as a JPEG at 1, 0.9, 0.7, 0.5, 0.3 quality settings, then again but with at half the width. Each image shows its size in KB and can be downloaded directly from the page.

I'm trying to use more images on my blog (example 1, example 2) and I like to reduce their file size and quality while keeping them legible.

The prompt sequence I used for this was:

Build an artifact (no React) that I can drop an image onto and it presents that image resized to different JPEG quality levels, each with a download link

Claude produced this initial artifact. I followed up with:

change it so that for any image it provides it in the following:

original width, full quality

original width, 0.9 quality

original width, 0.7 quality

original width, 0.5 quality

original width, 0.3 quality

half width - same array of qualities

For each image clicking it should toggle its display to full width and then back to max-width of 80%

Images should show their size in KB

Claude produced this v2.

I tweaked it a tiny bit (modifying how full-width images are displayed) - the final source code is available here. I'm hosting it on my own site which means the Download links work correctly - when hosted on claude.site Claude's CSP headers prevent those from functioning.

Tags: ai-assisted-programming, claude, tools, projects, generative-ai, ai, llms, claude-artifacts, claude-3-5-sonnet

llm-gguf

2024-07-23T22:18:40+00:00

llm-gguf

I just released a new alpha plugin for LLM which adds support for running models from Meta's new Llama 3.1 family that have been packaged as GGUF files - it should work for other GGUF chat models too.

If you've already installed LLM the following set of commands should get you setup with Llama 3.1 8B:

llm install llm-gguf
llm gguf download-model \
  https://huggingface.co/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf \
  --alias llama-3.1-8b-instruct --alias l31i

This will download a 4.92GB GGUF from lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF on Hugging Face and save it (at least on macOS) to your ~/Library/Application Support/io.datasette.llm/gguf/models folder.

Once installed like that, you can run prompts through the model like so:

llm -m l31i "five great names for a pet lemur"

Or use the llm chat command to keep the model resident in memory and run an interactive chat session with it:

llm chat -m l31i

I decided to ship a new alpha plugin rather than update my existing llm-llama-cpp plugin because that older plugin has some design decisions baked in from the Llama 2 release which no longer make sense, and having a fresh plugin gave me a fresh slate to adopt the latest features from the excellent underlying llama-cpp-python library by Andrei Betlen.

Tags: meta, llm, generative-ai, llama, projects, ai, llms