Simon Willison’s Weblog

On chatgpt 95 embeddings 26 google 294 security 443 opensource 191 ...


Recent entries

Three major LLM releases in 24 hours (plus weeknotes) four days ago

I’m a bit behind on my weeknotes, so there’s a lot to cover here. But first... a review of the last 24 hours of Large Language Model news. All times are in US Pacific on April 9th 2024.

  • 11:01am: Google Gemini Pro 1.5 hits general availability, here’s the blog post—their 1 million token context GPT-4 class model now has no waitlist, is available to anyone in 180 countries (not including Europe or the UK as far as I can tell) and most impressively all the API has a free tier that allows up to 50 requests a day, though rate limited to 2 per minute. Beyond that you can pay $7/million input tokens and $21/million output tokens, which is slightly less than GPT-4 Turbo and a little more than Claude 3 Sonnet. Gemini Pro also now support audio inputs and system prompts.
  • 11:44am: OpenAI finally released the non-preview version of GPT-4 Turbo, integrating GPT-4 Vision directly into the model (previously it was separate). Vision mode now supports both functions and JSON output, previously unavailable for image inputs. OpenAI also claim that the new model is “Majorly improved” but no-one knows what they mean by that.
  • 6:20pm (3:20am in their home country of France): Mistral tweet a link to a 281GB magnet BitTorrent of Mixtral 8x22B—their latest openly licensed model release, significantly larger than their previous best open model Mixtral 8x7B. I’ve not seen anyone get this running yet but it’s likely to perform extremely well, given how good the original Mixtral was.

And while it wasn’t released today (it came out last week), this morning Cohere’s Command R+ (an excellent openly licensed model) reached position 6 on the LMSYS Chatbot Arena Leaderboard—the highest ever ranking for an open weights model.

Since I have a lot of software that builds on these models, I spent a bunch of time today publishing new releases of things.

Datasette Extract with GPT-4 Turbo Vision

I’ve been working on Datasette Extract for a while now: it’s a plugin for Datasette that adds structured data extraction from unstructured text, powered by GPT-4 Turbo.

I updated it for the new model releases this morning, and decided to celebrate by making a video showing what it can do:

I want to start publishing videos like this more often, so this felt like a great opportunity to put that into practice.

The Datasette Cloud blog hasn’t had an entry in a while, so I published screenshots and notes there to accompany the video.

Gemini Pro 1.5 system prompts

I really like system prompts—extra prompts you can pass to an LLM that give it instructions about how to process the main input. They’re sadly not a guaranteed solution for prompt injection—even with instructions separated from data by a system prompt you can still over-ride them in the main prompt if you try hard enough—but they’re still useful for non-adversarial situations.

llm-gemini 0.1a2 adds support for them, so now you can do things like this:

llm -m p15 'say hi three times three different ways' \
  --system 'in spanish'

And get back output like this:

¡Hola! 👋 ¡Buenos días! ☀️ ¡Buenas tardes! 😊

Interestingly “in german” doesn’t include emoji, but “in spanish” does.

I had to reverse-engineer the REST format for sending a system prompt from the Python library as the REST documentation hasn’t been updated yet—notes on that in my issue.

datasette-enrichments-gpt using GPT-4 Turbo

Another small release: the datasette-enrichments-gpt plugin can enrich data in a table by running prompts through GPT-3.5, GPT-4 Turbo or GPT-4 Vision. I released version 0.4 switching to the new GPT-4 Turbo model.

Everything else

That covers today... but my last weeknotes were nearly four weeks ago! Here’s everything else, with a few extra annotations:

Blog entries

All five of my most recent posts are about ways that I use LLM tools in my own work—see also my How I use LLMs and ChatGPT series.


Many of these releases relate to ongoing work on Datasette Cloud. In particular there’s a flurry of minor releases to add descriptions to the action menu items added by various plugins, best illustrated by this screenshot:

A screenshot showing the database actions, table actions and row actions menus in Datasette running on Datasette Cloud. The database menu items are: Upload CSV. Create a new table by uploading a CSV file. Execute SQL write. Run queries like insert/update/delete against this database. Query this database with Al assistance. Ask a question to build a SQL query. Create table with Al extracted data. Paste in text or an image to extract structured data. Edit database metadata. Set the description, source and license for this database. Create a table. Define a new table with specified columns. Create table with pasted data. Paste in JSON, CSV or TSV data (e.g. from Google Sheets). Export this database. Create and download a snapshot of this SQLite database (1.3 GB). The table menu items: Delete this table. Delete table and all rows within it. Enrich selected data. Run a data cleaning operation against every selected row. Query this table with Al assistance. Ask a question to build a SQL query. Extract data into this table with Al. Paste in text or an image to extract structured data. Edit table metadata. Set the description, source and license for this table. Edit table schema. Rename the table, add and remove columns.... Make table public. Allow anyone to view this table. Configure full-text search. Select columns to make searchable for this table. The row menu items: Enrich this row. Run a dat acleaning operation against this row.


Building files-to-prompt entirely using Claude 3 Opus five days ago

files-to-prompt is a new tool I built to help me pipe several files at once into prompts to LLMs such as Claude and GPT-4.

When combined with my LLM command-line tool it lets you do things like this:

files-to-prompt files_to_prompt | llm -m opus \
  --system 'Update this README to reflect this functionality'

I wrote files-to-prompt almost entirely using Claude 3 Opus, llm-claude-3 and files-to-prompt itself, once it was functional enough to be useful.

Building the initial tool

I started with my click-app cookiecutter template. This can quickly spin up a skeleton of a new Python command-line tool using the Click library:

cookiecutter gh:simonw/click-app
  [1/6] app_name (): files-to-prompt
  [2/6] description (): Concatenate a directory full of files into a single prompt for use with LLMs
  [3/6] hyphenated (files-to-prompt): 
  [4/6] underscored (files_to_prompt): 
  [5/6] github_username (): simonw
  [6/6] author_name (): Simon Willison

I opened the files_to_prompt/ file it created for me and used it to spec out how the initial version should work:

import click

    help="Include files and folders starting with .",
def cli(path, include_hidden):
    Takes a path to a folder and outputs every file in that folder,
    recursively, each one preceeded with its filename like this:

    Contents of goes here


This code defines a tool that takes a positional path argument and an optional --include-hidden flag, along with a Python docstring describing what it should do which will be displayed as the --help output.

I’d originally intended to actually build the tool... but once I got to this point I realized that this might be enough information for Claude 3 Opus to do the rest of the work for me.

So I ran this command:

cat files_to_prompt/ | llm -m opus \
  --system 'finish this code for me'

You can see the full transcript here, but from this single prompt Claude 3 Opus produced the exact implementation I needed:

    for root, dirs, files in os.walk(path):
        if not include_hidden:
            dirs[:] = [d for d in dirs if not d.startswith(".")]
            files = [f for f in files if not f.startswith(".")]

        for file in files:
            file_path = os.path.join(root, file)
            with open(file_path, "r") as f:
                file_contents =


I pasted that into by module and tried running it... and it worked first time.

I tried a more challenging prompt next:

cat files_to_prompt/ | llm -m opus '
modify this code to add a --ignore-gitignore flag which defaults toFalse.
Without this flag the code should read any .gitignore files and filter
content based on the rules within them. With the flag it should not do

I forgot to use the --system option here, so the argument prompt I passed to llm was concatenated directly onto the end of the code rather than being treated as a system/instruction prompt. This turned out not to matter.

.gitignore is a tricky format to handle. Claude 3 Opus didn’t do a comprehensive implementation, but it did support the subset that I cared about for this step of the development process.

Here’s what it gave me (full transcript here):

import os
import click
from pathlib import Path
from fnmatch import fnmatch

def should_ignore(path, gitignore_rules):
    for rule in gitignore_rules:
        if fnmatch(path, rule):
            return True
    return False

def read_gitignore(path):
    gitignore_path = os.path.join(path, ".gitignore")
    if os.path.isfile(gitignore_path):
        with open(gitignore_path, "r") as f:
            return [line.strip() for line in f if line.strip() and not line.startswith("#")]
    return []

    type=click.Path(exists=True, file_okay=False, dir_okay=True, readable=True),
    help="Include files and folders starting with .",
    help="Ignore .gitignore files and include all files",
def cli(path, include_hidden, ignore_gitignore):
    Takes a path to a folder and outputs every file in that folder,
    recursively, each one preceded with its filename like this:

    Contents of goes here

    gitignore_rules = [] if ignore_gitignore else read_gitignore(path)

    for root, dirs, files in os.walk(path):
        if not include_hidden:
            dirs[:] = [d for d in dirs if not d.startswith(".")]
            files = [f for f in files if not f.startswith(".")]

        if not ignore_gitignore:
            dirs[:] = [d for d in dirs if not should_ignore(os.path.join(root, d), gitignore_rules)]
            files = [f for f in files if not should_ignore(os.path.join(root, f), gitignore_rules)]

        for file in files:
            file_path = os.path.join(root, file)
            with open(file_path, "r") as f:
                file_contents =


From that point on I was hooked: could I use the new files-to-prompt tool to build the rest of itself?

Since I’d already bootstrapped it to a point where I could use it, I ran it for real for the first time:

files-to-prompt . | llm -m opus --system '
fill out the with tests, use tmpdir fixture
to create example directories to run the command against - tests
should cover both of the optional flags'

Here I’m sending it quite a bit of code—the implementation, the README and the Apache 2 License are all included in that prompt.

It wrote me these tests:

import os
from click.testing import CliRunner
from files_to_prompt.cli import cli

def test_basic_functionality(tmpdir):
    runner = CliRunner()
    with tmpdir.as_cwd():
        with open("test_dir/file1.txt", "w") as f:
            f.write("Contents of file1")
        with open("test_dir/file2.txt", "w") as f:
            f.write("Contents of file2")
        result = runner.invoke(cli, ["test_dir"])
        assert result.exit_code == 0
        assert "test_dir/file1.txt" in result.output
        assert "Contents of file1" in result.output
        assert "test_dir/file2.txt" in result.output
        assert "Contents of file2" in result.output

def test_include_hidden(tmpdir):
    runner = CliRunner()
    with tmpdir.as_cwd():
        with open("test_dir/.hidden.txt", "w") as f:
            f.write("Contents of hidden file")
        result = runner.invoke(cli, ["test_dir"])
        assert result.exit_code == 0
        assert "test_dir/.hidden.txt" not in result.output
        result = runner.invoke(cli, ["test_dir", "--include-hidden"])
        assert result.exit_code == 0
        assert "test_dir/.hidden.txt" in result.output
        assert "Contents of hidden file" in result.output

def test_ignore_gitignore(tmpdir):
    runner = CliRunner()
    with tmpdir.as_cwd():
        with open("test_dir/.gitignore", "w") as f:
        with open("test_dir/ignored.txt", "w") as f:
            f.write("This file should be ignored")
        with open("test_dir/included.txt", "w") as f:
            f.write("This file should be included")
        result = runner.invoke(cli, ["test_dir"])
        assert result.exit_code == 0
        assert "test_dir/ignored.txt" not in result.output
        assert "test_dir/included.txt" in result.output
        result = runner.invoke(cli, ["test_dir", "--ignore-gitignore"])
        assert result.exit_code == 0
        assert "test_dir/ignored.txt" in result.output
        assert "This file should be ignored" in result.output
        assert "test_dir/included.txt" in result.output

Then I ran them... and they failed. Here’s the error message:

    def test_ignore_gitignore(tmpdir):
        runner = CliRunner()
        with tmpdir.as_cwd():
            with open("test_dir/.gitignore", "w") as f:
            with open("test_dir/ignored.txt", "w") as f:
                f.write("This file should be ignored")
            with open("test_dir/included.txt", "w") as f:
                f.write("This file should be included")
            result = runner.invoke(cli, ["test_dir"])
            assert result.exit_code == 0
>           assert "test_dir/ignored.txt" not in result.output
E           AssertionError: assert 'test_dir/ignored.txt' not in 'test_dir/\n\n---\n'
E             'test_dir/ignored.txt' is contained here:
E               uded
E               ---
E               test_dir/ignored.txt
E               ---...
E             ...Full output truncated (3 lines hidden), use '-vv' to show

I saved that message to `error.txt and ran this command to send it back to Claude 3 Opus:

cat error.txt | llm -c

The llm -c command stands for “continue”—it will take any input or prompt and pass that back to the model along with the previous logged prompts and responses, equivalent to asking a follow-up question in an interface like ChatGPT.

Claude gave me back a corrected version of the should_ignore() function:

def should_ignore(path, gitignore_rules):
    for rule in gitignore_rules:
        if fnmatch(os.path.basename(path), rule):
            return True
        if os.path.isdir(path) and fnmatch(os.path.basename(path) + "/", rule):
            return True
    return False

Sending follow-on prompts with llm -c seemed to be working well, so I ran this:

llm -c 'Fill out the Usage section of the README for this project - just return that section as markdown'

And it gave me back a pretty decent README:


To use files-to-prompt, simply provide the path to the directory you want to process:

files-to-prompt path/to/directory

This will output the contents of every file in the directory and its subdirectories, with each file preceded by its relative path and separated by ---.


  • --include-hidden: Include files and folders starting with . (hidden files and directories).

    files-to-prompt path/to/directory --include-hidden
  • --ignore-gitignore: Ignore .gitignore files and include all files.

    files-to-prompt path/to/directory --ignore-gitignore


Suppose you have a directory structure like this:

├── file1.txt
├── file2.txt
├── .hidden_file.txt
└── subdirectory/
    └── file3.txt

Running files-to-prompt my_directory will output:

Contents of file1.txt
Contents of file2.txt
Contents of file3.txt

If you run files-to-prompt my_directory --include-hidden, the output will also include .hidden_file.txt:

Contents of .hidden_file.txt

I particularly liked the directory structure diagram.

Here’s the full transcript including my follow-ups.

I committed and pushed everything I had so far to GitHub.

After one last review of the README I noticed it had used the phrase “simply provide the path to the directory”. I don’t like using words like simply in documentation, so I fixed that.

And I shipped version 0.1 of the software! Almost every line of code, tests and documentation written by Claude 3 Opus.

Iterating on the project

I’ve added several features since that initial implementation, almost all of which were primarily written by prompting Claude 3 Opus.

Issue #2: Take multiple arguments for files and directories to include changed the tool such that files-to-prompt tests/ would include both the file and all files in the tests/ directory.

The sequence of prompts to get there was as follows:

cat files_to_prompt/ | llm -m opus --system '
Modify this file. It should take multiple arguments in a variable called paths.
Each of those argumets might be a path to a file or it might be a path to a
directory - if any of the arguments do not correspoind to a file or directory
it should raise a click error.

It should then do what it does already but for all files 
files-recursively-contained-within-folders that are passed to it.

It should still obey the gitignore logic.'

Then these to update the tests:

files-to-prompt files_to_prompt tests | llm -m opus --system '
rewrite the tests to cover the ability to pass multiple files and
folders to the tool'

files-to-prompt files_to_prompt tests | llm -m opus --system '
add one last test which tests .gitignore and include_hidden against
an example that mixes single files and directories of files together
in one invocation'

I didn’t like the filenames it was using in that last test, so I used symbex to extract just the implementation of that test and told it to rewrite it:

symbex test_mixed_paths_with_options | llm -m opus --system '
rewrite this test so the filenames are more obvious, thinks like

And this to add one last test that combined all of the options:

llm -c 'add a last bit to that test for
["test_dir", "single_file.txt", "--ignore-gitignore", "--include-hidden"]'

The issue includes links to the full transcripts for the above.

Updating a diff from a pull request

I quietly released files-to-prompt two weeks ago. Dipam Vasani had spotted it and opened a pull request adding the ability to ignore specific files, by passing --ignore-patterns '*.md' as an option.

The problem was... I’d landed some of my own changes before I got around to reviewing his PR—so it would no longer cleanly apply.

It turns out I could resolve that problem using Claude 3 Opus as well, by asking it to figure out the change from Dipam’s diff.

I pulled a copy of his PR as a diff like this:

wget ''

Then I fed both the diff and the relevant files from the project into Claude:

files-to-prompt 4.diff files_to_prompt/ tests/ | \
  llm -m opus --system \
  'Apply the change described in the diff to the project - return updated and tests'

It didn’t quite work—it reverted one of my earlier changes. So I prompted:

llm -c 'you undid the change where it could handle multiple paths -
I want to keep that, I only want to add the new --ignore-patterns option'

And that time it worked! Transcript here.

I merged Claude’s work into the existing PR to ensure Dipam got credit for his work, then landed it and pushed it out in a release.

Was this worthwhile?

As an exercise in testing the limits of what’s possible with command-line LLM access and the current most powerful available LLM, this was absolutely worthwhile. I got working software with comprehensive tests and documentation, and had a lot of fun experimenting with prompts along the way.

It’s worth noting that this project was incredibly low stakes. files-to-prompt is a tiny tool that does something very simple. Any bugs or design flaws really don’t matter. It’s perfect for trying out this alternative approach to development.

I also got the software built a whole lot faster than if I’d written it myself, and with features like .gitignore support (albeit rudimentary) that I may not have bothered with working alone. That’s a good example of a feature that’s just fiddly enough that I might decide not to invest the time needed to get it to work.

Is this the best possible version of this software? Definitely not. But with comprehensive documentation and automated tests it’s high enough quality that I’m not ashamed to release it with my name on it.

A year ago I might have felt guilty about using LLMs to write code for me in this way. I’m over that now: I’m still doing the work, but I now have a powerful tool that can help accelerate the process.

Using this pattern for real work

I’ve since used the same pattern for some smaller modifications to some of my more significant projects. This morning I used it to upgrade my datasette-cors plugin to add support for new features I had added to the underlying asgi-cors library. Here’s the prompt sequence I used:

files-to-prompt ../asgi-cors/ | llm -m opus -s \
'Output a new plugin that adds headers and methods and max_age config options'

files-to-prompt | llm -c \
  'Update these tests to exercise the new options as well'

cat | llm -c \
  'Update the README to document the new config options'

And the full transcript.

I reviewed this code very carefully before landing it. It’s absolutely what I would have written myself without assistance from Claude.

Time elapsed for this change? The first prompt was logged at 16:42:11 and the last at 16:44:24, so just over two minutes followed by a couple more minutes for the review. The associated issue was open for five minutes total.

Running OCR against PDFs and images directly in your browser 14 days ago

I attended the Story Discovery At Scale data journalism conference at Stanford this week. One of the perennial hot topics at any journalism conference concerns data extraction: how can we best get data out of PDFs and images?

I’ve been having some very promising results with Gemini Pro 1.5, Claude 3 and GPT-4 Vision recently—I’ll write more about that soon. But those tools are still inconvenient for most people to use.

Meanwhile, older tools like Tesseract OCR are still extremely useful—if only they were easier to use as well.

Then I remembered that Tesseract runs happily in a browser these days thanks to the excellent Tesseract.js project. And PDFs can be processed using JavaScript too thanks to Mozilla’s extremely mature and well-tested PDF.js library.

So I built a new tool! provides a single page web app that can run Tesseract OCR against images or PDFs that are opened in (or dragged and dropped onto) the app.

Crucially, everything runs in the browser. There is no server component here, and nothing is uploaded. Your images and documents never leave your computer or phone.

Here’s an animated demo:

First an image file is dragged onto the page, which then shows that image and accompanying OCR text. Then the drop zone is clicked and a PDF file is selected - that PDF is rendered a page at a time down the page with OCR text displayed beneath each page.

It’s not perfect: multi-column PDFs (thanks, academia) will be treated as a single column, illustrations or photos may result in garbled ASCII-art and there are plenty of other edge cases that will trip it up.

But... having Tesseract OCR available against PDFs in a web browser (including in Mobile Safari) is still a really useful thing.

How I built this

For more recent examples of projects I’ve built with the assistance of LLMs, see Building and testing C extensions for SQLite with ChatGPT Code Interpreter and Claude and ChatGPT for ad-hoc sidequests.

I built the first version of this tool in just a few minutes, using Claude 3 Opus.

I already had my own JavaScript code lying around for the two most important tasks: running Tesseract.js against an images and using PDF.js to turn a PDF into a series of images.

The OCR code came from the system I built and explained in How I make annotated presentations (built with the help of multiple ChatGPT sessions). The PDF to images code was from an unfinished experiment which I wrote with the aid of Claude 3 Opus a week ago.

I composed the following prompt for Claude 3, where I pasted in both of my code examples and then added some instructions about what I wanted it to build at the end:

This code shows how to open a PDF and turn it into an image per page:

<!DOCTYPE html>
  <title>PDF to Images</title>
  <script src=""></script>
    .image-container img {
      margin-bottom: 10px;
    .image-container p {
      margin: 0;
      font-size: 14px;
      color: #888;
  <input type="file" id="fileInput" accept=".pdf" />
  <div class="image-container"></div>

  const desiredWidth = 800;
    const fileInput = document.getElementById('fileInput');
    const imageContainer = document.querySelector('.image-container');

    fileInput.addEventListener('change', handleFileUpload);

    pdfjsLib.GlobalWorkerOptions.workerSrc = '';

    async function handleFileUpload(event) {
      const file =[0];
      const imageIterator = convertPDFToImages(file);

      for await (const { imageURL, size } of imageIterator) {
        const imgElement = document.createElement('img');
        imgElement.src = imageURL;

        const sizeElement = document.createElement('p');
        sizeElement.textContent = `Size: ${formatSize(size)}`;

    async function* convertPDFToImages(file) {
      try {
        const pdf = await pdfjsLib.getDocument(URL.createObjectURL(file)).promise;
        const numPages = pdf.numPages;

        for (let i = 1; i <= numPages; i++) {
          const page = await pdf.getPage(i);
          const viewport = page.getViewport({ scale: 1 });
          const canvas = document.createElement('canvas');
          const context = canvas.getContext('2d');
          canvas.width = desiredWidth;
          canvas.height = (desiredWidth / viewport.width) * viewport.height;
          const renderContext = {
            canvasContext: context,
            viewport: page.getViewport({ scale: desiredWidth / viewport.width }),
          await page.render(renderContext).promise;
          const imageURL = canvas.toDataURL('image/jpeg', 0.8);
          const size = calculateSize(imageURL);
          yield { imageURL, size };
      } catch (error) {
        console.error('Error:', error);

    function calculateSize(imageURL) {
      const base64Length = imageURL.length - 'data:image/jpeg;base64,'.length;
      const sizeInBytes = Math.ceil(base64Length * 0.75);
      return sizeInBytes;

    function formatSize(size) {
      const sizeInKB = (size / 1024).toFixed(2);
      return `${sizeInKB} KB`;

This code shows how to OCR an image:

async function ocrMissingAltText() {
    // Load Tesseract
    var s = document.createElement("script");
    s.src = "";

    s.onload = async () => {
      const images = document.getElementsByTagName("img");
      const worker = Tesseract.createWorker();
      await worker.load();
      await worker.loadLanguage("eng");
      await worker.initialize("eng");
      ocrButton.innerText = "Running OCR...";

      // Iterate through all the images in the output div
      for (const img of images) {
        const altTextarea = img.parentNode.querySelector(".textarea-alt");
        // Check if the alt textarea is empty
        if (altTextarea.value === "") {
          const imageUrl = img.src;
          var {
            data: { text },
          } = await worker.recognize(imageUrl);
          altTextarea.value = text; // Set the OCR result to the alt textarea
          progressBar.value += 1;

      await worker.terminate();
      ocrButton.innerText = "OCR complete";

Use these examples to put together a single HTML page with embedded HTML and CSS and JavaScript that provides a big square which users can drag and drop a PDF file onto and when they do that the PDF has every page converted to a JPEG and shown below on the page, then OCR is run with tesseract and the results are shown in textarea blocks below each image.

I saved this prompt to a prompt.txt file and ran it using my llm-claude-3 plugin for LLM:

llm -m claude-3-opus < prompt.txt

It gave me a working initial version on the first attempt!

A square dotted border around the text Drag and drop PDF file here

Here’s the full transcript, including my follow-up prompts and their responses. Iterating on software in this way is so much fun.

First follow-up:

Modify this to also have a file input that can be used—dropping a file onto the drop area fills that input

make the drop zone 100% wide but have a 2em padding on the body. it should be 10em high. it should turn pink when an image is dragged over it.

Each textarea should be 100% wide and 10em high

At the very bottom of the page add a h2 that says Full document—then a 30em high textarea with all of the page text in it separated by two newlines

Here’s the interactive result.

A PDF file is dragged over the box and it turned pink. The heading Full document displays below

Rather delightfully it used the neater pattern where the file input itself is hidden but can be triggered by clicking on the large drop zone, and it updated the copy on the drop zone to reflect that—without me suggesting those requirements.

And then:

get rid of the code that shows image sizes. Set the placeholder on each textarea to be Processing... and clear that placeholder when the job is done.

Which gave me this.

I realized it would be useful if it could handle non-PDF images as well. So I fired up ChatGPT (for no reason other than curiosity to see how well it did) and got GPT-4 to add that feature for me. I pasted in the code so far and added:

Modify this so jpg and png and gif images can be dropped or opened too—they skip the PDF step and get appended to the page and OCRd directly. Also move the full document heading and textarea above the page preview and hide it u til there is data to be shown in it

Then I spotted that the Tesseract worker was being created multiple times in a loop, which is inefficient—so I prompted:

Create the worker once and use it for all OCR tasks and terminate it at the end

I’d tweaked the HTML and CSS a little before feeding it to GPT-4, so now the site had a title and rendered in Helvetica.

Here’s the version GPT-4 produced for me.

A heading reads OCR a PDF or Image - This tool runs entirely in your browser. No files are uploaded to a server. The dotted box now contains text that reads Drag and drop a PDF, JPG, PNG, or GIF file here or click to select a file

Manual finishing touches

Fun though it was iterating on this project entirely through prompting, I decided it would be more productive to make the finishing touches myself. You can see those in the commit history. They’re not particularly interesting:

  • I added Plausible analytics (which I like because they use no cookies).
  • I added better progress indicators, including the text that shows how many pages of the PDF have been processed so far.
  • I bumped up the width of the rendered PDF page images from 800 to 1000. This seemed to improve OCR quality—in particular, the Claude 3 model card PDF now has less OCR errors than it did before.
  • I upgraded both Tesseract.js and PDF.js to the most recent versions. Unsurprisingly, Claude 3 Opus had used older versions of both libraries.

I’m really pleased with this project. I consider it finished—it does the job I designed it to do and I don’t see any need to keep on iterating on it. And because it’s all static JavaScript and WebAssembly I expect it to continue working effectively forever.

Update: OK, a few more features: I added language selection, paste support and some basic automated tests using Playwright Python.

llm cmd undo last git commit—a new plugin for LLM 18 days ago

I just released a neat new plugin for my LLM command-line tool: llm-cmd. It lets you run a command to to generate a further terminal command, review and edit that command, then hit <enter> to execute it or <ctrl-c> to cancel.

This is an alpha release. It’s a very dangerous piece of software! Do not use this unless you are fluent in terminal and confident that you understand what it’s doing for you and what could go wrong. I take no responsibility if you accidentally delete all of your files with this tool.

To try this out, you’ll need my LLM tool installed:

brew install llm # 'pipx install llm' works too
llm keys set openai
<paste in your OpenAI API key>

Now install the new plugin:

llm install llm-cmd

To run the new command, type llm cmd and then type what you want to do.

Here’s an example of how to use it:

llm cmd show the first three lines of every file in this directory

I ran this just now and it gave me the following:

head -n 3 *

Crucially, it will not excute that command directly. It pre-populates your terminal with the command, and you can edit it before hitting <enter> to run it (or cancel with <ctrl-c>).

Here’s an animated GIF demo showing it in action:

 $ llm cmd show the first three lines of every file in this directory head -n 3 * Command failed with error: head: Error reading llm_cmd.egg-info head: Error reading tests ==> LICENSE <==                                  Apache License                            Version 2.0, January 2004                 ==> Pipfile <== [[source]] url = "" verify_ssl = true  ==> <== # llm-cmd  [![PyPI](](  ==> llm_cmd.egg-info <==  ==> <== import click import llm import readline  ==> pyproject.toml <== [project] name = "llm-cmd" version = "0.1"  ==> tests <==  $ llm cmd show filename and first three lines of every file here find . -maxdepth 1 -type f -exec sh -c 'echo "{}" && head -n 3 "{}" && echo' \; ./LICENSE                                  Apache License                            Version 2.0, January 2004                 ./pyproject.toml [project] name = "llm-cmd" version = "0.1"  ./ # llm-cmd  [![PyPI](](  ./Pipfile [[source]] url = "" verify_ssl = true  ./.gitignore .venv __pycache__/ *.py[cod]  ./ import click import llm import readline

It has a couple of options: you can add -m gpt-4 to run against a different model (it defaults to the default configured for LLM). You can also pass -s/--system to set a different system prompt to change how it behaves.

The default system prompt took a little bit of iteration. Here’s the instruction it passes to the model, along with your input:

Return only the command to be executed as a raw string, no string delimiters wrapping it, no yapping, no markdown, no fenced code blocks, what you return will be passed to subprocess.check_output() directly.

For example, if the user asks: undo last git commit

You return only: git reset --soft HEAD~1

I really like “no yapping” as a brief way to encourage a model not to include lengthy explanatory text!

So far I’ve tried this prompt against gpt-3.5-turbo, GPT-4, Claude 3 Opus and Claude 3 Haiku (using my llm-claude-3 plugin) and it gave me the desired result for all four models. I’m sure there are many ways this could be improved, that likely vary depending on the model you’re using. That’s why this is an alpha!

Built with assistance from ChatGPT/GPT-4

The key feature that enables this plugin is the ability to populate the user’s terminal with text that they can edit before they execute it.

I didn’t know how to do this, so I asked GPT-4:

Write a Python function interactive_exec(“ls -lah”)

It makes that string available for the user to edit in the terminal—and when they hit enter it executes the command in their shell

It didn’t give me what I wanted at first (it suggested just using input("Edit command: ")), so I asked a follow-up:

Show more options, I’m not sure that would let them directly edit the existing text

And it spat this out, which was exactly what I wanted:

import readline
import subprocess

def interactive_exec(command):
    # Set the initial text for the input
    readline.set_startup_hook(lambda: readline.insert_text(command))
        edited_command = input("Edit command: ")
        output = subprocess.check_output(edited_command, shell=True, stderr=subprocess.STDOUT)
    except subprocess.CalledProcessError as e:
        print(f"Command failed with error: {e.output.decode()}")
        # Remove the startup hook to avoid affecting future inputs

# Example usage
interactive_exec("ls -lah")

I tracked down the documentation for readline.set_startup_hook() and readline.insert_text() to make sure I understood what they were doing. This turned out to be the exact thing I needed to know.

llm cmd undo last git commit

This is my favourite example, because I’ve been using Git for 15+ years and I still can’t ever remember the exact command for this.

Here’s what llm cmd spits out:

$ llm cmd undo last git commit
git reset --soft HEAD~1

It should always get this one right, because it’s the example I provided in the system prompt!

Building and testing C extensions for SQLite with ChatGPT Code Interpreter 21 days ago

I wrote yesterday about how I used Claude and ChatGPT Code Interpreter for simple ad-hoc side quests—in that case, for converting a shapefile to GeoJSON and merging it into a single polygon.

Today I have a much more ambitious example.

I was thinking this morning about vector similarity, and how I really like the pattern of storing encoded floating point vectors in BLOB columns in a SQLite database table and then using a custom SQL function to decode them and calculate cosine similarity between them.

I’ve written code for this a few times in Python, with Python functions that get registered with SQLite as custom SQL functions. Here’s an example from my LLM tool.

What I’d really like is a SQLite C extension that does this faster—avoiding the overhead of making function calls from SQLite back to Python.

Then I remembered that ChatGPT Code Interpreter has Python, SQLite and access to gcc. Could I get it to build and test that C extension for me, entirely within its own environment?

It turns out that works!

Absurdly, the first step is getting ChatGPT in the right “mood”

One of the infuriating things about working with ChatGPT Code Interpreter is that it often denies abilities that you know it has.

I’ve found it to be quite resistant to compiling C code in the past. Here’s a prompting sequence trick that usually works for me:

Use your code interpreter tool to show me the version of your Python and SQLite

It generated and ran this code:

import sqlite3
import sys

python_version = sys.version
sqlite_version = sqlite3.sqlite_version

python_version, sqlite_version

Which output:

('3.11.8 (main, Mar 12 2024, 11:41:52) [GCC 12.2.0]', '3.40.1')

Next we need it to acknowledge that it has access to gcc:

Now use to call “gcc --version” and tell me that version

If all goes well it will run something like this:

import subprocess

result =["gcc", "--version"], capture_output=True, text=True)
gcc_version = result.stdout.split('\n')[0]



'gcc (Debian 12.2.0-14) 12.2.0'

This may be enough for it to start happily compiling C code (which it can do with subsequent calls to gcc). If not, a trick I’ve used successfully in the past is “try running this command: gcc helloworld.c—and show me the error message you get”. But hopefully that won’t be necessary.

Compiling a basic SQLite extension

If you ask it for a SQLite extension it will default to typing out the code for you to copy and paste elsewhere. We want that code saved to its own disk so it can run a compiler later.

Try this:

Write a SQLite extension in C that provides a math_sin(a) function. Save that extension file to disk, then give me a link to download it.

We don’t actually care about downloading the file, but this provides an excuse (do we need one? I don’t know!) for saving it to disk.

It ran this:

sin_extension_c = """
#include <sqlite3ext.h>
#include <math.h>

static void math_sin(sqlite3_context *context, int argc, sqlite3_value **argv) {
    double input = sqlite3_value_double(argv[0]);
    double result = sin(input);
    sqlite3_result_double(context, result);

#ifdef _WIN32
int sqlite3_sin_init(sqlite3 *db, char **pzErrMsg, const sqlite3_api_routines *pApi) {
    sqlite3_create_function(db, "math_sin", 1, SQLITE_UTF8, NULL, math_sin, NULL, NULL);
    return 0;

with open("/mnt/data/sin_extension.c", "w") as file:


And gave me a link to download the file.

Next, I prompted:

Now use to compile it with gcc, like how you showed me the gcc version earlier

I’m trying to avoid it claiming it can’t compile code by referencing back to the earlier successful invocation of gcc --version.

When I tried this just now it worked without complaining... but found a compilation bug in the C code:

compile_command = ["gcc", "-fPIC", "-shared", "-o", "/mnt/data/", "/mnt/data/sin_extension.c"]
result =, capture_output=True, text=True)

result.stdout, result.stderr


     '/mnt/data/sin_extension.c: In function ‘sqlite3_sin_init’:\n/mnt/data/sin_extension.c:17:61: error: ‘NULL’ undeclared (first use in this function)\n   17 |     sqlite3_create_function(db, "math_sin", 1, SQLITE_UTF8, NULL, math_sin, NULL, NULL);\n      |                                                             ^~~~\n/mnt/data/sin_extension.c:5:1: note: ‘NULL’ is defined in header ‘<stddef.h>’; did you forget to ‘#include <stddef.h>’?\n    4 | #include <math.h>\n  +++ |+#include <stddef.h>\n    5 | \n/mnt/data/sin_extension.c:17:61: note: each undeclared identifier is reported only once for each function it appears in\n   17 |     sqlite3_create_function(db, "math_sin", 1, SQLITE_UTF8, NULL, math_sin, NULL, NULL);\n      |                                                             ^~~~\n')

ChatGPT added:

It looks like there was an error during the compilation process. The error message suggests that NULL is undeclared because the <stddef.h> header file, which defines NULL, was not included.

I’ll update the C code to include this header and attempt the compilation again.

It said it was going to update the C code but didn’t actually do that at all—it just stopped. So I said:

Do that

And it did! Here’s the code it successfully ran:

# Update the C code to include <stddef.h>
sin_extension_c_updated = """
#include <sqlite3ext.h>
#include <math.h>
#include <stddef.h>

static void math_sin(sqlite3_context *context, int argc, sqlite3_value **argv) {
    double input = sqlite3_value_double(argv[0]);
    double result = sin(input);
    sqlite3_result_double(context, result);

#ifdef _WIN32
int sqlite3_sin_init(sqlite3 *db, char **pzErrMsg, const sqlite3_api_routines *pApi) {
    sqlite3_create_function(db, "math_sin", 1, SQLITE_UTF8, NULL, math_sin, NULL, NULL);
    return 0;

with open("/mnt/data/sin_extension.c", "w") as file:

# Compile the updated C code
compile_command_updated = ["gcc", "-fPIC", "-shared", "-o", "/mnt/data/", "/mnt/data/sin_extension.c"]
result_updated =, capture_output=True, text=True)

result_updated.stdout, result_updated.stderr

Both stdout and stderr were blank, indicating a successfully compiled SQLite extension!

Trying it out in Python

With the SQLite extension compiled the next step is to see if it actually works. I said:

Now load that extension into SQLite in Python and try it out

It wrote and then ran the following:

import sqlite3

# Connect to an in-memory SQLite database
conn = sqlite3.connect(":memory:")

# Load the compiled extension

# Test the math_sin function
cursor = conn.cursor()
cursor.execute("SELECT math_sin(1.57079632679)")  # sin(pi/2) should be close to 1
result = cursor.fetchone()


... and it got an error:

OperationalError                          Traceback (most recent call last)
Cell In[6], line 8
      5 conn.enable_load_extension(True)
      7 # Load the compiled extension
----> 8 conn.load_extension("/mnt/data/")
     10 # Test the math_sin function
     11 cursor = conn.cursor()

OperationalError: /mnt/data/ undefined symbol: sqlite3_sinextension_init

Here’s where the real magic of Code Interpreter kicked in. Without any further interaction from me, it said the following:

It looks like there was an error loading the extension into SQLite. The error message indicates that the symbol sqlite3_sinextension_init is undefined. This is likely because the entry point function for the extension is not named correctly.

In the C code, the entry point function is named sqlite3_sin_init. However, SQLite expects the entry point function to be named sqlite3_extension_init or sqlite3_<extension_name>_init. In this case, the expected name would be sqlite3_sinextension_init.

I’ll update the C code to rename the entry point function to sqlite3_sinextension_init and recompile the extension. ​ ​

# Update the C code to rename the entry point function
sin_extension_c_final = """
#include <sqlite3ext.h>
#include <math.h>
#include <stddef.h>

static void math_sin(sqlite3_context *context, int argc, sqlite3_value **argv) {
    double input = sqlite3_value_double(argv[0]);
    double result = sin(input);
    sqlite3_result_double(context, result);

#ifdef _WIN32
int sqlite3_sinextension_init(sqlite3 *db, char **pzErrMsg, const sqlite3_api_routines *pApi) {
    sqlite3_create_function(db, "math_sin", 1, SQLITE_UTF8, NULL, math_sin, NULL, NULL);
    return 0;

with open("/mnt/data/sin_extension.c", "w") as file:

# Recompile the updated C code
compile_command_final = ["gcc", "-fPIC", "-shared", "-o", "/mnt/data/", "/mnt/data/sin_extension.c"]
result_final =, capture_output=True, text=True)

result_final.stdout, result_final.stderr

The compilation succeeded... so it said:

The updated SQLite extension has been successfully recompiled with the correct entry point function name. Let’s try loading the extension into SQLite again and test the math_sin function. ​

Then it ran this:

# Reconnect to the in-memory SQLite database
conn = sqlite3.connect(":memory:")

# Load the recompiled extension

# Test the math_sin function again
cursor = conn.cursor()
cursor.execute("SELECT math_sin(1.57079632679)")  # sin(pi/2) should be close to 1
result = cursor.fetchone()


And it worked! The output was:


ChatGPT has now written a working SQLite extension in C, compiled it, loaded it into Python and tested it for us!

Here’s the full transcript of this ChatGPT session.

Something more ambitious: vector comparison functions

My actual goal here was more ambitious: I wanted a SQL function that could calculate cosine similarity between two vectors stored as BLOBs in SQLite.

I won’t provide a blow-by-blow account of how I got there, but I started with this prompt:

def encode(values):
    return struct.pack("<" + "f" * len(values), *values)

def decode(binary):
    return struct.unpack("<" + "f" * (len(binary) // 4), binary)

def cosine_similarity(a, b):
    dot_product = sum(x * y for x, y in zip(a, b))
    magnitude_a = sum(x * x for x in a) ** 0.5
    magnitude_b = sum(x * x for x in b) ** 0.5
    return dot_product / (magnitude_a * magnitude_b)

These are Python functions for working with vectors that are stored in SQLite as BLOBs where each BLOB value is a sequence of floating point numbers as binary

Write a new SQLite extension in C that provides three SQL functions:

vector_decode(blob) -> returns a string that is a JSON formatted array of floats, eg “[1.1, 2.1, 3.5]”

vector_encode(string_of_json) -> returns a binary blob for that string. This does not need to use a full JSON parser, it just needs to work with an array that starts with [ and ends with ] and has comma separated floats, ignoring whitespace

vector_similarity(blob1, blob2) -> returns floating point cosine similarity for those two encoded vectors

Write it as a file on disk, then compile it and try it out

I pasted in my existing Python code and told it to write me a SQLite extension based on that code.

I do this kind of thing a lot: prompting LLMs with code examples, often written in different languages. Code is a really good way to communicate requirements with them.

This kicked off a frustrating sequence of interactions. It wrote the extension as a file called vector_extension.c, compiled it, hit a bug, then wrote a fix in a new file called vector_extension_fixed.c.

But... when it tried to compile the new file, it hit errors because the C init function no longer matched the filename. So, like someone rolling their own version control based on duplicated files, it created vector_extension_final_fixed.c and that broke for the same reason...

When it got to vector_extension_final_corrected.c I took pity on it and cut it off!

This is a classic Code Interpreter problem: an error loop, where it keeps on trying but making variants of the same mistake, and (being a big bag of next-token-predicting matrix algebra) doesn’t have the ability to learn from and correct itself.

Here’s the full transcript of that conversation.

Clearing state and continuing

The best way out of an error loop is to clear state and start again.

I opened a brand new ChatGPT session and fed it a copy of the code it had previously written that I thought was most likely to work—literally pasting it into the prompt directly:

Save this to a file called /mnt/data/vector.c and then give me a link to download the file:

#include <sqlite3ext.h>
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>

static void vector_decode(sqlite3_context *context, int argc, sqlite3_value **argv) {
    const unsigned char *blob = sqlite3_value_blob(argv[0]);
    int size = sqlite3_value_bytes(argv[0]) / sizeof(float);
    float *values = (float *)blob;

    char *result = malloc(size * 12 + 2); // Max 10 digits per float + comma + space
    char *ptr = result;
    *ptr++ = '[';
    for (int i = 0; i < size; ++i) {
        if (i > 0) {
            *ptr++ = ',';
            *ptr++ = ' ';
        ptr += sprintf(ptr, "%.2f", values[i]);
    *ptr++ = ']';
    *ptr = '\0';

    sqlite3_result_text(context, result, -1, SQLITE_TRANSIENT);

static void vector_encode(sqlite3_context *context, int argc, sqlite3_value **argv) {
    const char *json = (const char *)sqlite3_value_text(argv[0]);
    int count = 0;
    const char *ptr = json;

    while (*ptr) {
        if (*ptr == ',') ++count;
    ++count; // Number of floats is one more than the number of commas

    float *values = malloc(count * sizeof(float));
    ptr = json + 1; // Skip the opening '['
    for (int i = 0; i < count; ++i) {
        values[i] = strtof(ptr, (char **)&ptr);
        ++ptr; // Skip the comma or closing ']'

    sqlite3_result_blob(context, values, count * sizeof(float), SQLITE_TRANSIENT);

static void vector_similarity(sqlite3_context *context, int argc, sqlite3_value **argv) {
    const float *a = (const float *)sqlite3_value_blob(argv[0]);
    const float *b = (const float *)sqlite3_value_blob(argv[1]);
    int size = sqlite3_value_bytes(argv[0]) / sizeof(float);

    float dot_product = 0, magnitude_a = 0, magnitude_b = 0;
    for (int i = 0; i < size; ++i) {
        dot_product += a[i] * b[i];
        magnitude_a += a[i] * a[i];
        magnitude_b += b[i] * b[i];
    magnitude_a = sqrtf(magnitude_a);
    magnitude_b = sqrtf(magnitude_b);

    float similarity = dot_product / (magnitude_a * magnitude_b);
    sqlite3_result_double(context, similarity);

#ifdef _WIN32
int sqlite3_vector_init(sqlite3 *db, char **pzErrMsg, const sqlite3_api_routines *pApi) {
    sqlite3_create_function(db, "vector_decode", 1, SQLITE_UTF8, NULL, vector_decode, NULL, NULL);
    sqlite3_create_function(db, "vector_encode", 1, SQLITE_UTF8, NULL, vector_encode, NULL, NULL);
    sqlite3_create_function(db, "vector_similarity", 2, SQLITE_UTF8, NULL, vector_similarity, NULL, NULL);
    return SQLITE_OK;

Then I used the “Use your code interpreter tool to show me the version of your Python and SQLite” trick, then “Now use to call ”gcc --version“ and tell me that version”, and then:

Now run this: [“gcc”, “-shared”, “-fPIC”, “-o”, “/mnt/data/”, “/mnt/data/vector.c”, “-lsqlite3”]

That’s the compile command the previous session had been using—I figured just copying that across would likely work.

And it did!

# Compile the C code into a shared library using GCC
compile_result =["gcc", "-shared", "-fPIC", "-o", "/mnt/data/", "/mnt/data/vector.c", "-lsqlite3"], capture_output=True, text=True)

# Check if there were any errors during compilation
if compile_result.returncode != 0:
    compilation_error = compile_result.stderr
    compilation_error = None


I prompted:

Load the resulting extension into SQLite in Python and try out the new functions

This was the moment of truth. Would it work? Would it even know how to use the new functions, given all it had seen so far was their implementation in C?

Here’s what it did:

import json

# Connect to an in-memory SQLite database
conn = sqlite3.connect(":memory:")

# Load the compiled shared library

# Create a sample table with vector data
conn.execute("CREATE TABLE vectors (id INTEGER PRIMARY KEY, vector BLOB)")
conn.execute("INSERT INTO vectors (vector) VALUES (vector_encode('[1.0, 2.0, 3.0]'))")
conn.execute("INSERT INTO vectors (vector) VALUES (vector_encode('[3.0, 2.0, 1.0]'))")

# Test the vector_decode function
decoded_vector = conn.execute("SELECT vector_decode(vector) FROM vectors").fetchall()
decoded_vector = [json.loads(row[0]) for row in decoded_vector]

# Test the vector_similarity function
similarity = conn.execute("SELECT vector_similarity(vector, vector) FROM vectors").fetchall()
similarity = [row[0] for row in similarity]

decoded_vector, similarity


([[1.0, 2.0, 3.0], [3.0, 2.0, 1.0]], [0.9999999403953552, 0.9999999403953552])

And it added commentary:

The extension was successfully loaded into SQLite, and the new functions were tested:

  • The vector_decode function successfully decoded the vectors from the database, resulting in [[1.0, 2.0, 3.0], [3.0, 2.0, 1.0]].
  • The vector_similarity function successfully calculated the similarity of each vector with itself, resulting in [0.9999999403953552, 0.9999999403953552], which is essentially 1.0 as expected for identical vectors. ​

We got there! I now have a full, working prototype of a SQLite C extension for vector comparison functions, and I didn’t have to write a single line of C code myself.

I continued prompting—I got it to do a performance comparison against the original Python implementation, and even had it brainstorm potential security bugs in the C code and try to exploit a few of them. Take a look at the full transcript of that session for details.

It runs on macOS too

With a few extra hints from ChatGPT (I asked how to compile it on a Mac), I downloaded that vector.c file to my laptop and got the following to work:

/tmp % mv ~/Downloads/vector.c .
/tmp % gcc -shared -fPIC -o vector.dylib -I/opt/homebrew/Cellar/sqlite/3.45.1/include vector.c -lsqlite3
/tmp % python
Python 3.10.10 (main, Mar 21 2023, 13:41:05) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> conn = sqlite3.connect(":memory:")
>>> conn.enable_load_extension(True)
>>> conn.load_extension("/tmp/vector.dylib")
>>> conn.execute("CREATE TABLE vectors (id INTEGER PRIMARY KEY, vector BLOB)")
<sqlite3.Cursor object at 0x1047fecc0>
>>> conn.execute("INSERT INTO vectors (vector) VALUES (vector_encode('[1.0, 2.0, 3.0]'))")
<sqlite3.Cursor object at 0x1047fee40>
>>> conn.execute("INSERT INTO vectors (vector) VALUES (vector_encode('[3.0, 2.0, 1.0]'))")
<sqlite3.Cursor object at 0x1047fecc0>
>>> decoded_vector = conn.execute("SELECT vector_decode(vector) FROM vectors").fetchall()
>>> decoded_vector
[('[1.00, 2.00, 3.00]',), ('[3.00, 2.00, 1.00]',)]

So I’ve now seen that C extension run on both Linux and macOS.

I did this whole project on my phone

Here’s the thing I enjoy most about using Code Interpreter for these kinds of prototypes: since the prompts are short, and there’s usually a delay of 30s+ between each prompt while it does its thing, I can do the whole thing on my phone while doing other things.

In this particular case I started out in bed, then got up, fed the dog, made coffee and pottered around the house for a bit—occasionally glancing back at my screen and poking it in a new direction with another prompt.

This almost doesn’t count as a project at all. It began as mild curiosity, and I only started taking it seriously when it became apparent that it was likely to produce a working result.

I only switched to my laptop right at the end, to try out the macOS compilation steps.

Total time invested: around an hour, but that included various other morning activities (coffee, dog maintenance, letting out the chickens.)

Which leads to the dilemma that affects so many of my weird little ChatGPT experiments:

The dilemma: do I finish this project?

Thanks to Code Interpreter I now have a working prototype of something I would never have attempted to build on my own. My knowledge of C is thin enough that I don’t remotely have the confidence to try something like this myself.

Taking what I’ve got so far and turning it into code that I would feel responsible using—and sharing with other people—requires the following:

  • I need to manually test it really thoroughly. I haven’t actually done the work to ensure it’s returning the right results yet!
  • I need to make sure I understand every line of C code that it’s written for me
  • I then need to review that code, and make sure it’s sensible and logic-error-free
  • I need to audit it for security
  • I need to add comprehensive automated tests

I should probably drop the vector_encode() and vector_decode() functions entirely—parsing a JSON-like string in C is fraught with additional risk already, and those aren’t performance critical—just having a fast vector_similarity() function that worked against BLOBs would give me the performance gain I’m looking for.

All of this is a lot of extra work. ChatGPT can help me in various ways with each of those steps, but it’s still on me to do the work and make absolutely sure that I’m confident in my understanding beyond just what got hallucinated at me by a bunch of black-box matrices.

This project was not in my plans for the weekend. I’m not going to put that work in right now—so “SQLite C extension for vector similarity” will be added to my ever-growing list of half-baked ideas that LLMs helped me prototype way beyond what I would have been able to do on my own.

So I’m going to blog about it, and move on. I may well revisit this—the performance gains over my Python functions looked to be 16-83x (according to a benchmark that ChatGPT ran for me which I have not taken the time to verify) which is a very material improvement. But for the moment I have so many other things I need to prioritize.

If anyone else wants to take this and turn it into something usable, please be my guest!

Bonus: haversine() in C

I took Cleo for a walk on the beach and had the idea to try implementing a haversine(lat1, lon1, lat2, lon2) SQL function in C, for fast calculation of the approximate distance between two points on earth. This could enable fast-enough brute force “points closest to X, Y” searches against medium sized tables of locations.

Here’s the full transcript and the loosely tested C extension code that resulted from the session— once again, all created using the ChatGPT iPhone app to interact with Code Interpreter:

#include <sqlite3ext.h>
#include <math.h>
#include <stddef.h> // Add this line


static double deg2rad(double deg) {
    return deg * (M_PI / 180);

static void haversine(sqlite3_context *context, int argc, sqlite3_value **argv) {
    double lat1 = sqlite3_value_double(argv[0]);
    double lon1 = sqlite3_value_double(argv[1]);
    double lat2 = sqlite3_value_double(argv[2]);
    double lon2 = sqlite3_value_double(argv[3]);

    double earth_radius = 6371000; // meters

    double dLat = deg2rad(lat2 - lat1);
    double dLon = deg2rad(lon2 - lon1);

    double a = sin(dLat / 2) * sin(dLat / 2) +
               cos(deg2rad(lat1)) * cos(deg2rad(lat2)) *
               sin(dLon / 2) * sin(dLon / 2);

    double c = 2 * atan2(sqrt(a), sqrt(1 - a));

    double distance = earth_radius * c;

    sqlite3_result_int(context, (int)round(distance));

#ifdef _WIN32
int sqlite3_extension_init(sqlite3 *db, char **pzErrMsg, const sqlite3_api_routines *pApi) {
    sqlite3_create_function(db, "haversine", 4, SQLITE_UTF8, NULL, haversine, NULL, NULL);
    return 0;

Since this one boils down to just floating point numbers I may even attempt to turn this into production code!

Claude and ChatGPT for ad-hoc sidequests 22 days ago

Here is a short, illustrative example of one of the ways in which I use Claude and ChatGPT on a daily basis.

I recently learned that the Adirondack Park is the single largest park in the contiguous United States, taking up a fifth of the state of New York.

Naturally, my first thought was that it would be neat to have a GeoJSON file representing the boundary of the park.

A quick search landed me on the Adirondack Park Agency GIS data page, which offered me a shapefile of the “Outer boundary of the New York State Adirondack Park as described in Section 9-0101 of the New York Environmental Conservation Law”. Sounds good!

I knew there were tools for converting shapefiles to GeoJSON, but I couldn’t remember what they were. Since I had a terminal window open already, I typed the following:

llm -m opus -c 'give me options on macOS for CLI tools to turn a shapefile into GeoJSON'

Here I am using my LLM tool (and llm-claude-3 plugin) to run a prompt through the new Claude 3 Opus, my current favorite language model.

It replied with a couple of options, but the first was this:

ogr2ogr -f GeoJSON output.geojson input.shp

So I ran that against the shapefile, and then pasted the resulting GeoJSON into to check if it worked... and nothing displayed. Then I looked at the GeoJSON and spotted this:

"coordinates": [ [ -8358911.527799999341369, 5379193.197800002992153 ] ...

That didn’t look right. Those co-ordinates aren’t the correct scale for latitude and longitude values.

So I sent a follow-up prompt to the model (the -c option means “continue previous conversation”):

llm -c 'i tried using ogr2ogr but it gave me back GeoJSON with a weird coordinate system that was not lat/lon that i am used to'

It suggested this new command:

ogr2ogr -f GeoJSON -t_srs EPSG:4326 output.geojson input.shp

This time it worked! The shapefile has now been converted to GeoJSON.

Time elapsed so far: 2.5 minutes (I can tell from my LLM logs).

I pasted it into Datasette (with datasette-paste and datasette-leaflet-geojson) to take a look at it more closely, and got this:

A Datasette table with 106 rows. The first two are shown - both have properties and a geometry, and the geometry is a single line on a map. The first one has a ECL_Text of thence southerly along the westerly line of lots 223, 241, 259, 276, 293, 309, 325 and 340 to the southwesterly corner of lot number 340 in the Brantingham Tract and the second has thence westerly along the northern line of lots 204 and 203 to the midpoint of the northern line of lot 203

That’s not a single polygon! That’s 106 line segments... and they are fascinating. Look at those descriptions:

thence westerly along the northern line of lots 204 and 203 to the midpoint of the northern line of lot 203

This is utterly delightful. The shapefile description did say “as described in Section 9-0101 of the New York Environmental Conservation Law”, so I guess this is how you write geographically boundaries into law!

But it’s not what I wanted. I want a single polygon of the whole park, not 106 separate lines.

I decided to switch models. ChatGPT has access to Code Interpreter, and I happen to know that Code Interpreter is quite effective at processing GeoJSON.

I opened a new ChatGPT (with GPT-4) browser tab, uploaded my GeoJSON file and prompted it:

This GeoJSON file is full of line segments. Use them to create me a single shape that is a Polygon

ChatGPT screenshot - it shows some Python code with a result of <shapely.geometry.polygon.Polygon at 0x7eba83f9fca0 />, then says: I've created a polygon from the line segments in the GeoJSON file. You can now use this polygon for further analysis or visualization. If you have specific requirements for the polygon or need it in a particular format, please let me know! ​​

OK, so it wrote some Python code and ran it. But did it work?

I happen to know that Code Interpreter can save files to disk and provide links to download them, so I told it to do that:

Save it to a GeoJSON file for me to download

ChatGPT screenshot - this time it writes more Python code to define a GeoJSON polygon, then saves that to a file called /mnt/data/polygon.geojson and gives me a link to download it.​​

I pasted that into, and it was clearly wrong: screenshot - a triangle shape sits on top of an area of upstate New York, clearly not in the shape of the park

So I told it to try again. I didn’t think very hard about this prompt, I basically went with a version of “do better”:

that doesn’t look right to me, check that it has all of the lines in it

ChatGPT screenshot - it writes more Python code and outputs a link to complete_polygon.geojson​​

It gave me a new file, optimistically named complete_polygon.geojson. Here’s what that one looked like:

ChatGPT screenshot - it writes more Python code and outputs a link to complete_polygon.geojson​​

This is getting a lot closer! Note how the right hand boundary of the park looks correct, but the rest of the image is scrambled.

I had a hunch about the fix. I pasted in a screenshot of where we were so far and added my hunch about the solution:

That almost works but you need to sort the line segments first, it looked like this:

Honestly, pasting in the screenshot probably wasn’t necessary here, but it amused me.

That almost works but you need to sort the line segments first, it looked like this: an a screenshot of a map

... and ChatGPT churned away again ...

More Python code - link to the full transcript is below

sorted_polygon.geojson is spot on! Here’s what it looks like:

A shaded polygon showing the exact shape of the boundary of Adirondack Park, overlayed on a map of the area

Total time spent in ChatGPT: 3 minutes and 35 seconds. Plus 2.5 minutes with Claude 3 earlier, so an overall total of just over 6 minutes.

Here’s the full Claude transcript and the full transcript from ChatGPT.

This isn’t notable

The most notable thing about this example is how completely not notable it is.

I get results like this from these tools several times a day. I’m not at all surprised that this worked, in fact, I would’ve been mildly surprised if it had not.

Could I have done this without LLM assistance? Yes, but not nearly as quickly. And this was not a task on my critical path for the day—it was a sidequest at best and honestly more of a distraction.

So, without LLM tools, I would likely have given this one up at the first hurdle.

A year ago I wrote about how AI-enhanced development makes me more ambitious with my projects. They are now so firmly baked into my daily work that they influence not just side projects but tiny sidequests like this one as well.

This certainly wasn’t simple

Something else I like about this example is that it illustrates quite how much depth there is to getting great results out of these systems.

In those few minutes I used two different interfaces to call two different models. I sent multiple follow-up prompts. I triggered Code Interpreter, took advantage of GPT-4 Vision and mixed in external tools like and Datasette as well.

I leaned a lot on my existing knowledge and experience:

  • I knew that tools existed for commandline processing of shapefiles and GeoJSON
  • I instinctively knew that Claude 3 Opus was likely to correctly answer my initial prompt
  • I knew the capabilities of Code Interpreter, including that it has libraries that can process geometries, what to say to get it to kick into action and how to get it to give me files to download
  • My limited GIS knowledge was strong enough to spot a likely coordinate system problem, and I guessed the fix for the jumbled lines
  • My prompting intuition is developed to the point that I didn’t have to think very hard about what to say to get the best results

If you have the right combination of domain knowledge and hard-won experience driving LLMs, you can fly with these things.

Isn’t this a bit trivial?

Yes it is, and that’s the point. This was a five minute sidequest. Writing about it here took ten times longer than the exercise itself.

I take on LLM-assisted sidequests like this one dozens of times a week. Many of them are substantially larger and more useful. They are having a very material impact on my work: I can get more done and solve much more interesting problems, because I’m not wasting valuable cycles figuring out ogr2ogr invocations or mucking around with polygon libraries.

Not to mention that I find working this way fun! It feels like science fiction every time I do it. Our AI-assisted future is here right now and I’m still finding it weird, fascinating and deeply entertaining.

LLMs are useful

There are many legitimate criticisms of LLMs. The copyright issues involved in their training, their enormous power consumption and the risks of people trusting them when they shouldn’t (considering both accuracy and bias) are three that I think about a lot.

The one criticism I wont accept is that they aren’t useful.

One of the greatest misconceptions concerning LLMs is the idea that they are easy to use. They really aren’t: getting great results out of them requires a great deal of experience and hard-fought intuition, combined with deep domain knowledge of the problem you are applying them to.

I use these things every day. They help me take on much more interesting and ambitious problems than I could otherwise. I would miss them terribly if they were no longer available to me.



  • Lessons after a half-billion GPT tokens (via) Ken Kantzer presents some hard-won experience from shipping real features on top of OpenAI’s models.

    They ended up settling on a very basic abstraction over the chat API—mainly to handle automatic retries on a 500 error. No complex wrappers, not even JSON mode or function calling or system prompts.

    Rather than counting tokens they estimate tokens as 3 times the length in characters, which works well enough.

    One challenge they highlight for structured data extraction (one of my favourite use-cases for LLMs): “GPT really cannot give back more than 10 items. Trying to have it give you back 15 items? Maybe it does it 15% of the time.”

    (Several commenters on Hacker News report success in getting more items back by using numbered keys or sequence IDs in the returned JSON to help the model keep count.) #13th April 2024, 8:54 pm

12th April 2024

  • How we built JSR (via) Really interesting deep dive by Luca Casonato into the engineering behind the new JSR alternative JavaScript package registry launched recently by Deno.

    The backend uses PostgreSQL and a Rust API server hosted on Google Cloud Run.

    The frontend uses Fresh, Deno’s own server-side JavaScript framework which leans heavily in the concept of “islands”—a progressive enhancement technique where pages are rendered on the server and small islands of interactivity are added once the page has loaded. #12th April 2024, 3:49 pm

  • The language issues are indicative of the bigger problem facing the AI Pin, ChatGPT, and frankly, every other AI product out there: you can’t see how it works, so it’s impossible to figure out how to use it. [...] our phones are constant feedback machines — colored buttons telling us what to tap, instant activity every time we touch or pinch or scroll. You can see your options and what happens when you pick one. With AI, you don’t get any of that. Using the AI Pin feels like wishing on a star: you just close your eyes and hope for the best. Most of the time, nothing happens.

    David Pierce # 12th April 2024, 12:39 pm

11th April 2024

  • 3Blue1Brown: Attention in transformers, visually explained. Grant Sanderson publishes animated explainers of mathematical topics on YouTube, to over 6 million subscribers. His latest shows how the attention mechanism in transformers (the algorithm behind most LLMs) works and is by far the clearest explanation I’ve seen of the topic anywhere.

    I was intrigued to find out what tool he used to produce the visualizations. It turns out Grant built his own open source Python animation library, manim, to enable his YouTube work. #11th April 2024, 4:12 pm

  • Use an llm to automagically generate meaningful git commit messages. Neat, thoroughly documented recipe by Harper Reed using my LLM CLI tool as part of a scheme for if you’re feeling too lazy to write a commit message—it uses a prepare-commit-msg Git hook which runs any time you commit without a message and pipes your changes to a model along with a custom system prompt. #11th April 2024, 4:06 am

  • [on GitHub Copilot] It’s like insisting to walk when you can take a bike. It gets the hard things wrong but all the easy things right, very helpful and much faster. You have to learn what it can and can’t do.

    Andrej Karpathy # 11th April 2024, 1:27 am

10th April 2024

  • Shell History Is Your Best Productivity Tool (via) Martin Heinz drops a wealth of knowledge about ways to configure zsh (the default shell on macOS these days) to get better utility from your shell history. #10th April 2024, 11:17 pm

  • Notes on how to use LLMs in your product. A whole bunch of useful observations from Will Larson here. I love his focus on the key characteristic of LLMs that “you cannot know whether a given response is accurate”, nor can you calculate a dependable confidence score for a response—and as a result you need to either “accept potential inaccuracies (which makes sense in many cases, humans are wrong sometimes too) or keep a Human-in-the-Loop (HITL) to validate the response.” #10th April 2024, 11:14 pm

  • The challenge [with RAG] is that most corner-cutting solutions look like they’re working on small datasets while letting you pretend that things like search relevance don’t matter, while in reality relevance significantly impacts quality of responses when you move beyond prototyping (whether they’re literally search relevance or are better tuned SQL queries to retrieve more appropriate rows). This creates a false expectation of how the prototype will translate into a production capability, with all the predictable consequences: underestimating timelines, poor production behavior/performance, etc.

    Will Larson # 10th April 2024, 11:09 pm

  • Gemini 1.5 Pro public preview (via) Huge release from Google: Gemini 1.5 Pro—the GPT-4 competitive model with the incredible 1 million token context length—is now available without a waitlist in 180+ countries (including the USA but not Europe or the UK as far as I can tell)... and the API is free for 50 requests/day (rate limited to 2/minute).

    Beyond that you’ll need to pay—$7/million input tokens and $21/million output tokens, which is slightly less than GPT-4 Turbo and a little more than Claude 3 Sonnet.

    They also announced audio input (up to 9.5 hours in a single prompt), system instruction support and a new JSON mod. #10th April 2024, 2:38 am

  • Mistral tweet a magnet link for mixtral-8x22b. Another open model release from Mistral using their now standard operating procedure of tweeting out a raw torrent link.

    This one is an 8x22B Mixture of Experts model. Their previous most powerful openly licensed release was Mixtral 8x7B, so this one is a whole lot bigger (a 281GB download)—and apparently has a 65,536 context length, at least according to initial rumors on Twitter. #10th April 2024, 2:31 am

9th April 2024

  • Extracting data from unstructured text and images with Datasette and GPT-4 Turbo. Datasette Extract is a new Datasette plugin that uses GPT-4 Turbo (released to general availability today) and GPT-4 Vision to extract structured data from unstructured text and images.

    I put together a video demo of the plugin in action today, and posted it to the Datasette Cloud blog along with screenshots and a tutorial describing how to use it. #9th April 2024, 11:03 pm

  • A solid pattern to build LLM Applications (feat. Claude) (via) Hrishi Olickel is one of my favourite prompt whisperers. In this YouTube video he walks through his process for building quick interactive applications with the assistance of Claude 3, spinning up an app that analyzes his meeting transcripts to extract participants and mentioned organisations, then presents a UI for exploring the results built with Next.js and shadcn/ui.

    An interesting tip I got from this: use the weakest, not the strongest models to iterate on your prompts. If you figure out patterns that work well with Claude 3 Haiku they will have a significantly lower error rate with Sonnet or Opus. The speed of the weaker models also means you can iterate much faster, and worry less about the cost of your experiments. #9th April 2024, 6:39 pm

  • Command R+ now ranked 6th on the LMSYS Chatbot Arena. The LMSYS Chatbot Arena Leaderboard is one of the most interesting approaches to evaluating LLMs because it captures their ever-elusive “vibes”—it works by users voting on the best responses to prompts from two initially hidden models

    Big news today is that Command R+—the brand new open weights model (Creative Commons non-commercial) by Cohere—is now the highest ranked non-proprietary model, in at position six and beating one of the GPT-4s.

    (Linking to my screenshot on Mastodon.) #9th April 2024, 4:19 pm

  • llm.c (via) Andrej Karpathy implements LLM training—initially for GPT-2, other architectures to follow—in just over 1,000 lines of C on top of CUDA. Includes a tutorial about implementing LayerNorm by porting an implementation from Python. #9th April 2024, 3:24 pm

  • Hello World (via) Lennon McLean dives deep down the rabbit hole of what happens when you execute the binary compiled from “Hello world” in C on a Linux system, digging into the details of ELF executables, objdump disassembly, the C standard library, stack frames, null-terminated strings and taking a detour through musl because it’s easier to read than Glibc. #9th April 2024, 1:06 am

8th April 2024

  • Introducing Enhance WASM (via) “Backend agnostic server-side rendering (SSR) for Web Components”—fascinating new project from Brian LeRoux and Begin.

    The idea here is to provide server-side rendering of Web Components using WebAssembly that can run on any platform that is supported within the Extism WASM ecosystem.

    The key is the enhance-ssr.wasm bundle, a 4.1MB WebAssembly version of the enhance-ssr JavaScript library, compiled using the Extism JavaScript PDK (Plugin Development Kit) which itself bundles a WebAssembly version of QuickJS. #8th April 2024, 7:44 pm

  • in July 2023, we [Hugging Face] wanted to experiment with a custom license for this specific project [text-generation-inference] in order to protect our commercial solutions from companies with bigger means than we do, who would just host an exact copy of our cloud services.

    The experiment however wasn’t successful.

    It did not lead to licensing-specific incremental business opportunities by itself, while it did hamper or at least complicate the community contributions, given the legal uncertainty that arises as soon as you deviate from the standard licenses.

    Julien Chaumond # 8th April 2024, 6:35 pm

7th April 2024

  • The lifecycle of a code AI completion (via) Philipp Spiess provides a deep dive into how Sourcegraph’s Cody code completion assistant works. Lots of fascinating details in here:

    “One interesting learning was that if a user is willing to wait longer for a multi-line request, it usually is worth it to increase latency slightly in favor of quality. For our production setup this means we use a more complex language model for multi-line completions than we do for single-line completions.”

    This article is from October 2023 and talks about Claude Instant. The code for Cody is open source so I checked to see if they have switched to Haiku yet and found a commit from March 25th that adds Haiku as an A/B test. #7th April 2024, 7:37 pm

6th April 2024

  • datasette-import. A new plugin for importing data into Datasette. This is a replacement for datasette-paste, duplicating and extending its functionality. datasette-paste had grown beyond just dealing with pasted CSV/TSV/JSON data—it handles file uploads as well now—which inspired the new name. #6th April 2024, 10:40 pm

5th April 2024

  • Everything I Know About the XZ Backdoor (via) Evan Boehs provides the most detailed timeline I’ve seen of the recent xz story, where a backdoor was inserted into the xz compression library in an attempt to compromise OpenSSH. #5th April 2024, 10:58 pm

  • s3-credentials 0.16. I spent entirely too long this evening trying to figure out why files in my new supposedly public S3 bucket were unavailable to view. It turns out these days you need to set a PublicAccessBlockConfiguration of {“BlockPublicAcls”: false, “IgnorePublicAcls”: false, “BlockPublicPolicy”: false, “RestrictPublicBuckets”: false}.

    The “s3-credentials --create-bucket --public” option now does that for you. I also added a “s3-credentials debug-bucket name-of-bucket” command to help figure out why a bucket isn’t working as expected. #5th April 2024, 5:35 am

4th April 2024

  • Before Google Reader was shut down, they were internally looking for maintainers. It turned out you have to deal with three years of infra migrations if you sign up to be the new owner of Reader. No one wanted that kind of job for a product that is not likely to grow 10x.

    Jaana Dogan # 4th April 2024, 8:51 pm

  • llm-command-r. Cohere released Command R Plus today—an open weights (non commercial/research only) 104 billion parameter LLM, a big step up from their previous 35 billion Command R model.

    Both models are fine-tuned for both tool use and RAG. The commercial API has features to expose this functionality, including a web-search connector which lets the model run web searches as part of answering the prompt and return documents and citations as part of the JSON response.

    I released a new plugin for my LLM command line tool this morning adding support for the Command R models.

    In addition to the two models it also adds a custom command for running prompts with web search enabled and listing the referenced documents. #4th April 2024, 5:38 pm

  • The cost of AI reasoning over time (via) Karina Nguyen from Anthropic provides a fascinating visualization illustrating the cost of different levels of LLM over the past few years, plotting their cost-per-token against their scores on the MMLU benchmark.

    Claude 3 Haiku currently occupies the lowest cost to score ratio, over on the lower right hand side of the chart. #4th April 2024, 12:51 pm

  • Kobold letters (via) Konstantin Weddige explains a sophisticated HTML email phishing vector he calls Kobold emails.

    When you forward a message, most HTML email clients will indent the forward by nesting it inside another element.

    This means CSS rules within the email can be used to cause an element that was invisible in the original email to become visible when it is forwarded—allowing tricks like a forwarded innocuous email from your boss adding instructions for wiring money from the company bank account.

    Gmail strips style blocks before forwarding—which it turns out isn’t protection against this, because you can put a style block in the original email to hide the attack text which will then be stripped for you when the email is forwarded. #4th April 2024, 12:43 pm