Simon Willison on tools

50 posts tagged “tools”

Small tools I have built, many of which are hosted on my tools.simonwillison.net GitHub Pages site backed by my simonw/tools repository.

2026

Unicode Explorer using binary search over fetch() HTTP range requests. Here's a little prototype I built this morning from my phone as an experiment in HTTP range requests, and a general example of using LLMs to satisfy curiosity.

I've been collecting HTTP range tricks for a while now, and I decided it would be fun to build something with them myself that used binary search against a large file to do something useful.

So I brainstormed with Claude. The challenge was coming up with a use case for binary search where the data could be naturally sorted in a way that would benefit from binary search.

One of Claude's suggestions was looking up information about unicode codepoints, which means searching through many MBs of metadata.

I had Claude write me a spec to feed to Claude Code - visible here - then kicked off an asynchronous research project with Claude Code for web against my simonw/research repo to turn that into working code.

Here's the resulting report and code. One interesting thing I learned is that Range request tricks aren't compatible with HTTP compression because they mess with the byte offset calculations. I added 'Accept-Encoding': 'identity' to the fetch() calls but this isn't actually necessary because Cloudflare and other CDNs automatically skip compression if a content-range header is present.

I deployed the result to my tools.simonwillison.net site, after first tweaking it to query the data via range requests against a CORS-enabled 76.6MB file in an S3 bucket fronted by Cloudflare.

The demo is fun to play with - type in a single character like ø or a hexadecimal codepoint indicator like 1F99C and it will binary search its way through the large file and show you the steps it takes along the way:

Animated demo of a web tool called Unicode Explore. I enter the ampersand character and hit Search. A box below shows a sequence of HTTP binary search requests made, finding in 17 steps with 3,864 bytes transferred and telling me that ampersand is U+0026 in Punctuation other, Basic Latin

# 27th February 2026, 5:50 pm / algorithms, http, research, tools, unicode, ai, generative-ai, llms, ai-assisted-programming, vibe-coding, http-range-requests

2025

TIL: Downloading archived Git repositories from archive.softwareheritage.org (via) Back in February I blogged about a neat Python library called sqlite-s3vfs for accessing SQLite databases hosted in an S3 bucket, released as MIT licensed open source by the UK government's Department for Business and Trade.

I went looking for it today and found that the github.com/uktrade/sqlite-s3vfs repository is now a 404.

Since this is taxpayer-funded open source software I saw it as my moral duty to try and restore access! It turns out a full copy had been captured by the Software Heritage archive, so I was able to restore the repository from there. My copy is now archived at simonw/sqlite-s3vfs.

The process for retrieving an archive was non-obvious, so I've written up a TIL and also published a new Software Heritage Repository Retriever tool which takes advantage of the CORS-enabled APIs provided by Software Heritage. Here's the Claude Code transcript from building that.

# 30th December 2025, 11:51 pm / archives, git, github, open-source, tools, ai, til, generative-ai, llms, ai-assisted-programming, claude-code

Cooking with Claude

I’ve been having an absurd amount of fun recently using LLMs for cooking. I started out using them for basic recipes, but as I’ve grown more confident in their culinary abilities I’ve leaned into them for more advanced tasks. Today I tried something new: having Claude vibe-code up a custom application to help with the timing for a complicated meal preparation. It worked really well!

[... 1,313 words]

5:01 am / 23rd December 2025 / cooking, devfort, localstorage, tools, ai, generative-ai, llms, anthropic, claude, vision-llms, vibe-coding

Useful patterns for building HTML tools

I’ve started using the term HTML tools to refer to HTML applications that I’ve been building which combine HTML, JavaScript, and CSS in a single file and use them to provide useful functionality. I have built over 150 of these in the past two years, almost all of them written by LLMs. This article presents a collection of useful patterns I’ve discovered along the way.

[... 4,231 words]

9 pm / 10th December 2025 / definitions, github, html, javascript, localstorage, projects, tools, ai, webassembly, generative-ai, llms, ai-assisted-programming, vibe-coding, coding-agents, claude-code

Bluesky Thread Viewer thread by @simonwillison.net. I've been having a lot of fun hacking on my Bluesky Thread Viewer JavaScript tool with Claude Code recently. Here it renders a thread (complete with demo video) talking about the latest improvements to the tool itself.

I've been mostly vibe-coding this thing since April, now spanning 15 commits with contributions from ChatGPT, Claude, Claude Code for Web and Claude Code on my laptop. Each of those commits links to the transcript that created the changes in the commit.

Bluesky is a lot of fun to build tools like this against because the API supports CORS (so you can talk to it from an HTML+JavaScript page hosted anywhere) and doesn't require authentication.

# 28th November 2025, 11:57 pm / projects, tools, ai, generative-ai, llms, cors, bluesky, vibe-coding, coding-agents, claude-code

Video: Building a tool to copy-paste share terminal sessions using Claude Code for web

This afternoon I was manually converting a terminal session into a shared HTML file for the umpteenth time when I decided to reduce the friction by building a custom tool for it—and on the spur of the moment I fired up Descript to record the process. The result is this new 11 minute YouTube video showing my workflow for vibe-coding simple tools from start to finish.

[... 1,338 words]

4:14 am / 23rd October 2025 / github, tools, youtube, ai, cloudflare, generative-ai, llms, ai-assisted-programming, anthropic, claude, vibe-coding, coding-agents, claude-code, async-coding-agents

SLOCCount in WebAssembly. This project/side-quest got a little bit out of hand.

I remembered an old tool called SLOCCount which could count lines of code and produce an estimate for how much they would cost to develop. I thought it would be fun to play around with it again, especially given how cheap it is to generate code using LLMs these days.

Here's the homepage for SLOCCount by David A. Wheeler. It dates back to 2001!

I figured it might be fun to try and get it running on the web. Surely someone had compiled Perl to WebAssembly...?

WebPerl by Hauke Dämpfling is exactly that, even adding a neat <script type="text/perl"> tag.

I told Claude Code for web on my iPhone to figure it out and build something, giving it some hints from my initial research:

Build sloccount.html - a mobile friendly UI for running the Perl sloccount tool against pasted code or against a GitHub repository that is provided in a form field

It works using the webperl webassembly build of Perl, plus it loads Perl code from this exact commit of this GitHub repository https://github.com/licquia/sloccount/tree/7220ff627334a8f646617fe0fa542d401fb5287e - I guess via the GitHub API, maybe using the https://github.com/licquia/sloccount/archive/7220ff627334a8f646617fe0fa542d401fb5287e.zip URL if that works via CORS

Test it with playwright Python - don’t edit any file other than sloccount.html and a tests/test_sloccount.py file

Since I was working on my phone I didn't review the results at all. It seemed to work so I deployed it to static hosting... and then when I went to look at it properly later on found that Claude had given up, cheated and reimplemented it in JavaScript instead!

So I switched to Claude Code on my laptop where I have more control and coached Claude through implementing the project for real. This took way longer than the project deserved - probably a solid hour of my active time, spread out across the morning.

I've shared some of the transcripts - one, two, and three - as terminal sessions rendered to HTML using my rtf-to-html tool.

At one point I realized that the original SLOCCount project wasn't even entirely Perl as I had assumed, it included several C utilities! So I had Claude Code figure out how to compile those to WebAssembly (it used Emscripten) and incorporate those into the project (with notes on what it did.)

The end result (source code here) is actually pretty cool. It's a web UI with three tabs - one for pasting in code, a second for loading code from a GitHub repository and a third that lets you open a Zip file full of code that you want to analyze. Here's an animated demo:

I enter simonw/llm in the GitHub repository field. It loads 41 files from GitHub and displays a report showing the number of lines and estimated cost.

The cost estimates it produces are of very little value. By default it uses the original method from 2001. You can also twiddle the factors - bumping up the expected US software engineer's annual salary from its 2000 estimate of $56,286 is a good start!

I had ChatGPT take a guess at what those figures should be for today and included those in the tool, with a very prominent warning not to trust them in the slightest.

# 22nd October 2025, 6:12 am / javascript, perl, projects, tools, ai, webassembly, generative-ai, llms, ai-assisted-programming, vibe-coding, claude-code

gpt-image-1-mini. OpenAI released a new image model today: gpt-image-1-mini, which they describe as "A smaller image generation model that’s 80% less expensive than the large model."

They released it very quietly - I didn't hear about this in the DevDay keynote but I later spotted it on the DevDay 2025 announcements page.

It wasn't instantly obvious to me how to use this via their API. I ended up vibe coding a Python CLI tool for it so I could try it out.

I dumped the plain text diff version of the commit to the OpenAI Python library titled feat(api): dev day 2025 launches into ChatGPT GPT-5 Thinking and worked with it to figure out how to use the new image model and build a script for it. Here's the transcript and the the openai_image.py script it wrote.

I had it add inline script dependencies, so you can run it with uv like this:

export OPENAI_API_KEY="$(llm keys get openai)"
uv run https://tools.simonwillison.net/python/openai_image.py "A pelican riding a bicycle"

It picked this illustration style without me specifying it:

A nice illustration of a pelican riding a bicycle, both pelican and bicycle are exactly as you would hope. Looks sketched, maybe colored pencils? The pelican's two legs are on the pedals but it also has a weird sort of paw on an arm on the handlebars.

(This is a very different test from my normal "Generate an SVG of a pelican riding a bicycle" since it's using a dedicated image generator, not having a text-based model try to generate SVG code.)

My tool accepts a prompt, and optionally a filename (if you don't provide one it saves to a filename like /tmp/image-621b29.png).

It also accepts options for model and dimensions and output quality - the --help output lists those, you can see that here.

OpenAI's pricing is a little confusing. The model page claims low quality images should cost around half a cent and medium quality around a cent and a half. It also lists an image token price of $8/million tokens. It turns out there's a default "high" quality setting - most of the images I've generated have reported between 4,000 and 6,000 output tokens, which costs between 3.2 and 4.8 cents.

One last demo, this time using --quality low:

 uv run https://tools.simonwillison.net/python/openai_image.py \
  'racoon eating cheese wearing a top hat, realistic photo' \
  /tmp/racoon-hat-photo.jpg \
  --size 1024x1024 \
  --output-format jpeg \
  --quality low

This saved the following:

It's a square photo of a raccoon eating cheese and wearing a top hat. It looks pretty realistic.

And reported this to standard error:

{
  "background": "opaque",
  "created": 1759790912,
  "generation_time_in_s": 20.87331541599997,
  "output_format": "jpeg",
  "quality": "low",
  "size": "1024x1024",
  "usage": {
    "input_tokens": 17,
    "input_tokens_details": {
      "image_tokens": 0,
      "text_tokens": 17
    },
    "output_tokens": 272,
    "total_tokens": 289
  }
}

This took 21s, but I'm on an unreliable conference WiFi connection so I don't trust that measurement very much.

272 output tokens = 0.2 cents so this is much closer to the expected pricing from the model page.

# 6th October 2025, 10:54 pm / python, tools, ai, openai, generative-ai, uv, text-to-image, pelican-riding-a-bicycle, vibe-coding

Recreating the Apollo AI adoption rate chart with GPT-5, Python and Pyodide

Apollo Global Management’s “Chief Economist” Dr. Torsten Sløk released this interesting chart which appears to show a slowdown in AI adoption rates among large (>250 employees) companies:

[... 2,673 words]

6:47 am / 9th September 2025 / census, data-journalism, javascript, python, tools, visualization, ai, pyodide, openai, generative-ai, chatgpt, llms, ai-assisted-programming, code-interpreter, llm-reasoning, vibe-coding, ai-assisted-search, gpt-5

Any time I share my collection of tools built using vibe coding and AI-assisted development (now at 124, here's the definitive list) someone will inevitably complain that they're mostly trivial.

A lot of them are! Here's a list of some that I think are genuinely useful and worth highlighting:

OCR PDFs and images directly in your browser. This is the tool that started the collection, and I still use it on a regular basis. You can open any PDF in it (even PDFs that are just scanned images with no embedded text) and it will extract out the text so you can copy-and-paste it. It uses PDF.js and Tesseract.js to do that entirely in the browser. I wrote about how I originally built that here.
Annotated Presentation Creator - this one is so useful. I use it to turn talks that I've given into full annotated presentations, where each slide is accompanied by detailed notes. I have 29 blog entries like that now and most of them were written with the help of this tool. Here's how I built that, plus follow-up prompts I used to improve it.
Image resize, crop, and quality comparison - I use this for every single image I post to my blog. It lets me drag (or paste) an image onto the page and then shows me a comparison of different sizes and quality settings, each of which I can download and then upload to my S3 bucket. I recently added a slightly janky but mobile-accessible cropping tool as well. Prompts.
Social Media Card Cropper - this is an even more useful image tool. Bluesky, Twitter etc all benefit from a 2x1 aspect ratio "card" image. I built this custom tool for creating those - you can paste in an image and crop and zoom it to the right dimensions. I use this all the time. Prompts.
SVG to JPEG/PNG - every time I publish an SVG of a pelican riding a bicycle I use this tool to turn that SVG into a JPEG or PNG. Prompts.
Encrypt / decrypt message - I often run workshops where I want to distribute API keys to the workshop participants. This tool lets me encrypt a message with a passphrase, then share the resulting URL to the encrypted message and tell people (with a note on a slide) how to decrypt it. Prompt.
Jina Reader - enter a URL, get back a Markdown version of the page. It's a thin wrapper over the Jina Reader API, but it's useful because it adds a "copy to clipboard" button which means it's one of the fastest way to turn a webpage into data on a clipboard on my mobile phone. I use this several times a week. Prompts.
llm-prices.com - a pricing comparison and token pricing calculator for various hosted LLMs. This one started out as a tool but graduated to its own domain name. Here's the prompting development history.
Open Sauce 2025 - an unofficial schedule for the Open Sauce conference, complete with option to export to ICS plus a search tool and now-and-next. I built this entirely on my phone using OpenAI Codex, including scraping the official schedule - full details here.
Hacker News Multi-Term Histogram - compare search terms on Hacker News to see how their relative popularity changed over time. Prompts.
Passkey experiment - a UI for trying out the Passkey / WebAuthn APIs that are built into browsers these days. Prompts.
Incomplete JSON Pretty Printer - do you ever find yourself staring at a screen full of JSON that isn't completely valid because it got truncated? This tool will pretty-print it anyway. Prompts.
Bluesky WebSocket Feed Monitor - I found out Bluesky has a Firehose API that can be accessed directly from the browser, so I vibe-coded up this tool to try it out. Prompts.

In putting this list together I realized I wanted to be able to link to the prompts for each tool... but those were hidden inside a collapsed <details><summary> element for each one. So I fired up OpenAI Codex and prompted:

Update the script that builds the colophon.html page such that the generated page has a tiny bit of extra JavaScript - when the page is loaded as e.g. https://tools.simonwillison.net/colophon#jina-reader.html it should notice the #jina-reader.html fragment identifier and ensure that the Development history details/summary for that particular tool is expanded when the page loads.

It authored this PR for me which fixed the problem.

# 4th September 2025, 9:58 pm / projects, tools, ai, generative-ai, llms, ai-assisted-programming, vibe-coding, coding-agents

1KB JS Numbers Station. Terence Eden built a neat and weird 1023 byte JavaScript demo that simulates a numbers station using the browser SpeechSynthesisUtterance, which I hadn't realized is supported by every modern browser now.

This inspired me to vibe code up this playground interface for that API using Claude:

Screenshot of a speech synthesis tester web interface showing: Speech synthesis tester, Text to speak:, Hello, this is a test of the speech synthesis API!, Voice:, Default voice, Rate: 1, Pitch: 1, Volume: 1, Speak, Stop, Ready to speak

# 23rd July 2025, 4 pm / javascript, text-to-speech, tools, ai, generative-ai, llms, terence-eden, vibe-coding

Vibe scraping and vibe coding a schedule app for Open Sauce 2025 entirely on my phone

This morning, working entirely on my phone, I scraped a conference website and vibe coded up an alternative UI for interacting with the schedule using a combination of OpenAI Codex and Claude Artifacts.

[... 2,189 words]

7:38 pm / 17th July 2025 / definitions, github, icalendar, mobile, scraping, tools, ai, playwright, openai, generative-ai, chatgpt, llms, ai-assisted-programming, claude, claude-artifacts, ai-agents, vibe-coding, coding-agents, async-coding-agents, prompt-to-app

Annotated Presentation Creator. I've released a new version of my tool for creating annotated presentations. I use this to turn slides from my talks into posts like this one - here are a bunch more examples.

I wrote the first version in August 2023 making extensive use of ChatGPT and GPT-4. That older version can still be seen here.

This new edition is a design refresh using Claude 3.7 Sonnet (thinking). I ran this command:

llm \
  -f https://til.simonwillison.net/tools/annotated-presentations \
  -s 'Improve this tool by making it respnonsive for mobile, improving the styling' \
  -m claude-3.7-sonnet -o thinking 1

That uses -f to fetch the original HTML (which has embedded CSS and JavaScript in a single page, convenient for working with LLMs) as a prompt fragment, then applies the system prompt instructions "Improve this tool by making it respnonsive for mobile, improving the styling" (typo included).

Here's the full transcript (generated using llm logs -cue) and a diff illustrating the changes. Total cost 10.7781 cents.

There was one visual glitch: the slides were distorted like this:

The slide is distorted by being too high for its width

I decided to try o4-mini to see if it could spot the problem (after fixing this LLM bug):

llm o4-mini \
  -a bug.png \
  -f https://tools.simonwillison.net/annotated-presentations \
  -s 'Suggest a minimal fix for this distorted image'

It suggested adding align-items: flex-start; to my .bundle class (it quoted the @media (min-width: 768px) bit but the solution was to add it to .bundle at the top level), which fixed the bug.

# 15th May 2025, 2:41 pm / css, tools, ai, openai, generative-ai, llms, ai-assisted-programming, claude, annotated-talks, vibe-coding

Create and edit images with Gemini 2.0 in preview (via) Gemini 2.0 Flash has had image generation capabilities for a while now, and they're now available via the paid Gemini API - at 3.9 cents per generated image.

According to the API documentation you need to use the new gemini-2.0-flash-preview-image-generation model ID and specify {"responseModalities":["TEXT","IMAGE"]} as part of your request.

Here's an example that calls the API using curl (and fetches a Gemini key from the llm keys get store):

curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent?key=$(llm keys get gemini)" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [
        {"text": "Photo of a raccoon in a trash can with a paw-written sign that says I love trash"}
      ]
    }],
    "generationConfig":{"responseModalities":["TEXT","IMAGE"]}
  }' > /tmp/raccoon.json

Here's the response. I got Gemini 2.5 Pro to vibe-code me a new debug tool for visualizing that JSON. If you visit that tool and click the "Load an example" link you'll see the result of the raccoon image visualized:

Render JSON from Gemini Image Generation tool. Paste Gemini JSON here: a bunch of JSON with a base64 encoded PNG. Then buttons to Load an example, or a really big (40MB) example or Render JSON. The Rendered Content shows a photograph of a raccoon in an open top bin holding a sign that says I heart trash.

The other prompt I tried was this one:

Provide a vegetarian recipe for butter chicken but with chickpeas not chicken and include many inline illustrations along the way

The result of that one was a 41MB JSON file(!) containing 28 images - which presumably cost over a dollar since images are 3.9 cents each.

Some of the illustrations it chose for that one were somewhat unexpected:

Text reads: "* ½ teaspoon Kashmiri chili powder (or paprika for milder flavor)" followed by a group photo of people in formal attire with black suits and light blue ties standing in rows outdoors, then "* ½ cup heavy cream (or coconut cream for vegan option)" followed by a close-up image of dried cumin seeds or similar brown spice.

If you want to see that one you can click the "Load a really big example" link in the debug tool, then wait for your browser to fetch and render the full 41MB JSON file.

The most interesting feature of Gemini (as with GPT-4o images) is the ability to accept images as inputs. I tried that out with this pelican photo like this:

cat > /tmp/request.json << EOF
{
  "contents": [{
    "parts":[
      {"text": "Modify this photo to add an inappropriate hat"},
      {
        "inline_data": {
          "mime_type":"image/jpeg",
          "data": "$(base64 -i pelican.jpg)"
        }
      }
    ]
  }],
  "generationConfig": {"responseModalities": ["TEXT", "IMAGE"]}
}
EOF

# Execute the curl command with the JSON file
curl -X POST \
  'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent?key='$(llm keys get gemini) \
  -H 'Content-Type: application/json' \
  -d @/tmp/request.json \
  > /tmp/out.json

And now the pelican is wearing a hat:

A pelican with its wings outstretched wearing an inappropriate pink bowler hat. The hat looks a little bit pasted on.

# 7th May 2025, 10:49 pm / tools, ai, generative-ai, llms, gemini, vision-llms, text-to-image, vibe-coding

Image segmentation using Gemini 2.5

Max Woolf pointed out this new feature of the Gemini 2.5 series (here’s my coverage of 2.5 Pro and 2.5 Flash) in a comment on Hacker News:

[... 1,428 words]

1:26 pm / 18th April 2025 / google, tools, ai, max-woolf, generative-ai, llms, ai-assisted-programming, gemini, vision-llms, llm-pricing, vibe-coding, image-segmentation

LLM pricing calculator (updated). I updated my LLM pricing calculator this morning (Claude transcript) to show the prices of various hosted models in a sorted table, defaulting to lowest price first.

Amazon Nova and Google Gemini continue to dominate the lower end of the table. The most expensive models currently are still OpenAI's o1-Pro ($150/$600 and GPT-4.5 ($75/$150).

# 10th April 2025, 7:56 pm / tools, ai, generative-ai, llms, ai-assisted-programming, claude, llm-pricing, vibe-coding

Incomplete JSON Pretty Printer. Every now and then a log file or a tool I'm using will spit out a bunch of JSON that terminates unexpectedly, meaning I can't copy it into a text editor and pretty-print it to see what's going on.

The other day I got frustrated with this and had the then-new GPT-4.5 build me a pretty-printer that didn't mind incomplete JSON, using an OpenAI Canvas. Here's the chat and here's the resulting interactive.

I spotted a bug with the way it indented code today so I pasted it into Claude 3.7 Sonnet Thinking mode and had it make a bunch of improvements - full transcript here. Here's the finished code.

Animated GIF demo - as I type JSON it is pretty printed below, at the end I click the Load Pelican Example button.

In many ways this is a perfect example of vibe coding in action. At no point did I look at a single line of code that either of the LLMs had written for me. I honestly don't care how this thing works: it could not be lower stakes for me, the worst a bug could do is show me poorly formatted incomplete JSON.

I was vaguely aware that some kind of state machine style parser would be needed, because you can't parse incomplete JSON with a regular JSON parser. Building simple parsers is the kind of thing LLMs are surprisingly good at, and also the kind of thing I don't want to take on for a trivial project.

At one point I told Claude "Try using your code execution tool to check your logic", because I happen to know Claude can write and then execute JavaScript independently of using it for artifacts. That helped it out a bunch.

I later dropped in the following:

modify the tool to work better on mobile screens and generally look a bit nicer - and remove the pretty print JSON button, it should update any time the input text is changed. Also add a "copy to clipboard" button next to the results. And add a button that says "example" which adds a longer incomplete example to demonstrate the tool, make that example pelican themed.

It's fun being able to say "generally look a bit nicer" and get a perfectly acceptable result!

# 28th March 2025, 12:18 am / json, tools, ai, generative-ai, chatgpt, llms, claude, vibe-coding

deepseek-ai/DeepSeek-V3-0324. Chinese AI lab DeepSeek just released the latest version of their enormous DeepSeek v3 model, baking the release date into the name DeepSeek-V3-0324.

The license is MIT (that's new - previous DeepSeek v3 had a custom license), the README is empty and the release adds up a to a total of 641 GB of files, mostly of the form model-00035-of-000163.safetensors.

The model only came out a few hours ago and MLX developer Awni Hannun already has it running at >20 tokens/second on a 512GB M3 Ultra Mac Studio ($9,499 of ostensibly consumer-grade hardware) via mlx-lm and this mlx-community/DeepSeek-V3-0324-4bit 4bit quantization, which reduces the on-disk size to 352 GB.

I think that means if you have that machine you can run it with my llm-mlx plugin like this, but I've not tried myself!

llm mlx download-model mlx-community/DeepSeek-V3-0324-4bit
llm chat -m mlx-community/DeepSeek-V3-0324-4bit

The new model is also listed on OpenRouter. You can try a chat at openrouter.ai/chat?models=deepseek/deepseek-chat-v3-0324:free.

Here's what the chat interface gave me for "Generate an SVG of a pelican riding a bicycle":

There's a pelican, and a bicycle, but both of them look disassembled.

I have two API keys with OpenRouter - one of them worked with the model, the other gave me a No endpoints found matching your data policy error - I think because I had a setting on that key disallowing models from training on my activity. The key that worked was a free key with no attached billing credentials.

For my working API key the llm-openrouter plugin let me run a prompt like this:

llm install llm-openrouter
llm keys set openrouter
# Paste key here
llm -m openrouter/deepseek/deepseek-chat-v3-0324:free "best fact about a pelican"

Here's that "best fact" - the terminal output included Markdown and an emoji combo, here that's rendered.

One of the most fascinating facts about pelicans is their unique throat pouch, called a gular sac, which can hold up to 3 gallons (11 liters) of water—three times more than their stomach!

Here’s why it’s amazing:
- Fishing Tool: They use it like a net to scoop up fish, then drain the water before swallowing.
- Cooling Mechanism: On hot days, pelicans flutter the pouch to stay cool by evaporating water.
- Built-in "Shopping Cart": Some species even use it to carry food back to their chicks.

Bonus fact: Pelicans often fish cooperatively, herding fish into shallow water for an easy catch.

Would you like more cool pelican facts? 🐦🌊

In putting this post together I got Claude to build me this new tool for finding the total on-disk size of a Hugging Face repository, which is available in their API but not currently displayed on their website.

Update: Here's a notable independent benchmark from Paul Gauthier:

DeepSeek's new V3 scored 55% on aider's polyglot benchmark, significantly improving over the prior version. It's the #2 non-thinking/reasoning model, behind only Sonnet 3.7. V3 is competitive with thinking models like R1 & o3-mini.

# 24th March 2025, 3:04 pm / tools, ai, generative-ai, local-llms, llms, ai-assisted-programming, hugging-face, llm, mlx, pelican-riding-a-bicycle, deepseek, llm-release, openrouter, ai-in-china

Adding AI-generated descriptions to my tools collection

The /colophon page on my tools.simonwillison.net site lists all 78 of the HTML+JavaScript tools I’ve built (with AI assistance) along with their commit histories, including links to prompting transcripts. I wrote about how I built that colophon the other day. It now also includes a description of each tool, generated using Claude 3.7 Sonnet.

[... 741 words]

11:48 pm / 13th March 2025 / projects, tools, ai, generative-ai, llms, ai-assisted-programming, llm, claude, slop

Here’s how I use LLMs to help me write code

Online discussions about using Large Language Models to help write code inevitably produce comments from developers who’s experiences have been disappointing. They often ask what they’re doing wrong—how come some people are reporting such great results when their own experiments have proved lacking?

[... 5,178 words]

2:09 pm / 11th March 2025 / tools, ai, github-actions, openai, generative-ai, llms, ai-assisted-programming, anthropic, claude, gemini, claude-artifacts, vibe-coding, files-to-prompt, coding-agents, claude-code, prompt-to-app

Anthropic’s new Citations API

Here’s a new API-only feature from Anthropic that requires quite a bit of assembly in order to unlock the value: Introducing Citations on the Anthropic API. Let’s talk about what this is and why it’s interesting.

[... 1,319 words]

4:22 am / 24th January 2025 / tools, ai, openai, prompt-engineering, generative-ai, llms, ai-assisted-programming, llm, anthropic, claude, rag, claude-artifacts, prompt-to-app

2024

openai/openai-openapi. Seeing as the LLM world has semi-standardized on imitating OpenAI's API format for a whole host of different tools, it's useful to note that OpenAI themselves maintain a dedicated repository for a OpenAPI YAML representation of their current API.

(I get OpenAI and OpenAPI typo-confused all the time, so openai-openapi is a delightfully fiddly repository name.)

The openapi.yaml file itself is over 26,000 lines long, defining 76 API endpoints ("paths" in OpenAPI terminology) and 284 "schemas" for JSON that can be sent to and from those endpoints. A much more interesting view onto it is the commit history for that file, showing details of when each different API feature was released.

Browsing 26,000 lines of YAML isn't pleasant, so I got Claude to build me a rudimentary YAML expand/hide exploration tool. Here's that tool running against the OpenAI schema, loaded directly from GitHub via a CORS-enabled fetch() call: https://tools.simonwillison.net/yaml-explorer#.eyJ1c... - the code after that fragment is a base64-encoded JSON for the current state of the tool (mostly Claude's idea).

Screenshot of the YAML explorer, showing a partially expanded set of sections from the OpenAI API specification.

The tool is a little buggy - the expand-all option doesn't work quite how I want - but it's useful enough for the moment.

Update: It turns out the petstore.swagger.io demo has an (as far as I can tell) undocumented ?url= parameter which can load external YAML files, so here's openai-openapi/openapi.yaml in an OpenAPI explorer interface.

The Swagger API browser showing the OpenAI API

# 22nd December 2024, 10:59 pm / apis, tools, yaml, ai, openai, generative-ai, llms, ai-assisted-programming, claude-3-5-sonnet

OpenAI WebRTC Audio demo. OpenAI announced a bunch of API features today, including a brand new WebRTC API for setting up a two-way audio conversation with their models.

They tweeted this opaque code example:

async function createRealtimeSession(inStream, outEl, token) { const pc = new RTCPeerConnection(); pc.ontrack = e => outEl.srcObject = e.streams[0]; pc.addTrack(inStream.getTracks()[0]); const offer = await pc.createOffer(); await pc.setLocalDescription(offer); const headers = { Authorization: Bearer ${token}, 'Content-Type': 'application/sdp' }; const opts = { method: 'POST', body: offer.sdp, headers }; const resp = await fetch('https://api.openai.com/v1/realtime', opts); await pc.setRemoteDescription({ type: 'answer', sdp: await resp.text() }); return pc; }

So I pasted that into Claude and had it build me this interactive demo for trying out the new API.

My demo uses an OpenAI key directly, but the most interesting aspect of the new WebRTC mechanism is its support for ephemeral tokens.

This solves a major problem with their previous realtime API: in order to connect to their endpoint you need to provide an API key, but that meant making that key visible to anyone who uses your application. The only secure way to handle this was to roll a full server-side proxy for their WebSocket API, just so you could hide your API key in your own server. cloudflare/openai-workers-relay is an example implementation of that pattern.

Ephemeral tokens solve that by letting you make a server-side call to request an ephemeral token which will only allow a connection to be initiated to their WebRTC endpoint for the next 60 seconds. The user's browser then starts the connection, which will last for up to 30 minutes.

# 17th December 2024, 11:50 pm / api, audio, security, tools, ai, cloudflare, openai, generative-ai, llms, ai-assisted-programming, claude, multi-modal-output

3 shell scripts to improve your writing, or “My Ph.D. advisor rewrote himself in bash.” (via) Matt Might in 2010:

The hardest part of advising Ph.D. students is teaching them how to write.

Fortunately, I've seen patterns emerge over the past couple years.

So, I've decided to replace myself with a shell script.

In particular, I've created shell scripts for catching three problems:

abuse of the passive voice,

weasel words, and

lexical illusions.

"Lexical illusions" here refers to the thing where you accidentally repeat a word word twice without realizing, which is particularly hard to spot if the repetition spans a line break.

Matt shares Bash scripts that he added to a LaTeX build system to identify these problems.

I pasted his entire article into Claude and asked it to build me an HTML+JavaScript artifact implementing the rules from those scripts. After a couple more iterations (I pasted in some feedback comments from Hacker News) I now have an actually quite useful little web tool:

tools.simonwillison.net/writing-style

Screnshot of the Writing Style Analyzer tool. I have pasted in the post you are reading now, it found a weasel word "quite" in: "actually quite useful little web tool" and duplicate word "word" in: "word word twice without realizing, which is"

Here's the source code and commit history.

# 14th December 2024, 6:20 pm / bash, tools, writing, ai, generative-ai, llms, ai-assisted-programming, claude-artifacts, prompt-to-app

GitHub OAuth for a static site using Cloudflare Workers. Here's a TIL covering a Thanksgiving AI-assisted programming project. I wanted to add OAuth against GitHub to some of the projects on my tools.simonwillison.net site in order to implement "Save to Gist".

That site is entirely statically hosted by GitHub Pages, but OAuth has a required server-side component: there's a client_secret involved that should never be included in client-side code.

Since I serve the site from behind Cloudflare I realized that a minimal Cloudflare Workers script may be enough to plug the gap. I got Claude on my phone to build me a prototype and then pasted that (still on my phone) into a new Cloudflare Worker and it worked!

... almost. On later closer inspection of the code it was missing error handling... and then someone pointed out it was vulnerable to a login CSRF attack thanks to failure to check the state= parameter. I worked with Claude to fix those too.

Useful reminder here that pasting code AI-generated code around on a mobile phone isn't necessarily the best environment to encourage a thorough code review!

# 29th November 2024, 6:13 pm / csrf, github, oauth, projects, security, tools, ai, cloudflare, generative-ai, llms, ai-assisted-programming

QuickTime video script to capture frames and bounding boxes. An update to an older TIL. I'm working on the write-up for my DjangoCon US talk on plugins and I found myself wanting to capture individual frames from the video in two formats: a full frame capture, and another that captured just the portion of the screen shared from my laptop.

I have a script for the former, so I got Claude to update my script to add support for one or more --box options, like this:

capture-bbox.sh ../output.mp4  --box '31,17,100,87' --box '0,0,50,50'

Open output.mp4 in QuickTime Player, run that script and then every time you hit a key in the terminal app it will capture three JPEGs from the current position in QuickTime Player - one for the whole screen and one each for the specified bounding box regions.

Those bounding box regions are percentages of the width and height of the image. I also got Claude to build me this interactive tool on top of cropperjs to help figure out those boxes:

Screenshot of the tool. A frame from a video of a talk I gave at DjangoCon US is shown, with a crop region on it using drag handles for the different edges of the crop. Below that is a box showing --bbox '31,17,99,86'

# 14th November 2024, 7 pm / ffmpeg, projects, tools, ai, generative-ai, llms, ai-assisted-programming, claude, claude-artifacts, prompt-to-app

Claude Token Counter. Anthropic released a token counting API for Claude a few days ago.

I built this tool for running prompts, images and PDFs against that API to count the tokens in them.

The API is free (albeit rate limited), but you'll still need to provide your own API key in order to use it.

Screenshot of a Claude Token Counter interface showing: Title Claude Token Counter, system prompt this counts tokens, user message You can attach images and PDFs too, file upload area with llm-jq-card.jpg and dxweb.pdf attached (both with Remove buttons), a Count Tokens button, and JSON output showing input_tokens: 3320

Here's the source code. I built this using two sessions with Claude - one to build the initial tool and a second to add PDF and image support. That second one is a bit of a mess - it turns out if you drop an HTML file onto a Claude conversation it converts it to Markdown for you, but I wanted it to modify the original HTML source.

The API endpoint also allows you to specify a model, but as far as I can tell from running some experiments the token count was the same for Haiku, Opus and Sonnet 3.5.

# 2nd November 2024, 6:52 pm / tools, ai, generative-ai, llms, ai-assisted-programming, anthropic, claude, claude-artifacts, llm-pricing, prompt-to-app

Prompt GPT-4o audio. A week and a half ago I built a tool for experimenting with OpenAI's new audio input. I just put together the other side of that, for experimenting with audio output.

Once you've provided an API key (which is saved in localStorage) you can use this to prompt the gpt-4o-audio-preview model with a system and regular prompt and select a voice for the response.

I built it with assistance from Claude: initial app, adding system prompt support.

You can preview and download the resulting wav file, and you can also copy out the raw JSON. If you save that in a Gist you can then feed its Gist ID to https://tools.simonwillison.net/gpt-4o-audio-player?gist=GIST_ID_HERE (Claude transcript) to play it back again.

You can try using that to listen to my French accented pelican description.

There's something really interesting to me here about this form of application which exists entirely as HTML and JavaScript that uses CORS to talk to various APIs. GitHub's Gist API is accessible via CORS too, so it wouldn't take much more work to add a "save" button which writes out a new Gist after prompting for a personal access token. I prototyped that a bit here.

# 28th October 2024, 4:38 am / audio, github, javascript, tools, ai, openai, generative-ai, gpt-4, llms, ai-assisted-programming, claude, claude-artifacts, claude-3-5-sonnet, cors, multi-modal-output, prompt-to-app

Everything I built with Claude Artifacts this week

I’m a huge fan of Claude’s Artifacts feature, which lets you prompt Claude to create an interactive Single Page App (using HTML, CSS and JavaScript) and then view the result directly in the Claude interface, iterating on it further with the bot and then, if you like, copying out the resulting code.

[... 2,273 words]

2:32 pm / 21st October 2024 / javascript, projects, tools, ai, pyodide, generative-ai, llms, ai-assisted-programming, anthropic, claude, claude-artifacts, claude-3-5-sonnet, prompt-to-app

Dashboard: Tools. I used Django SQL Dashboard to spin up a dashboard that shows all of the URLs to my tools.simonwillison.net site that I've shared on my blog so far. It uses this (Claude assisted) regular expression in a PostgreSQL SQL query:

select distinct on (tool_url)
    unnest(regexp_matches(
        body,
        '(https://tools\.simonwillison\.net/[^<"\s)]+)',
        'g'
    )) as tool_url,
    'https://simonwillison.net/' || left(type, 1) || '/' || id as blog_url,
    title,
    date(created) as created
from content

I've been really enjoying having a static hosting platform (it's GitHub Pages serving my simonw/tools repo) that I can use to quickly deploy little HTML+JavaScript interactive tools and demos.

# 21st October 2024, 3:33 am / javascript, postgresql, projects, sql, tools, django-sql-dashboard, ai-assisted-programming

page 1 / 2 next »