88 items tagged “ai-assisted-programming”
Using AI tools such as Large Language Models to help write code.
2024
In the past, these decisions were so consequential, they were basically one-way doors, in Amazon language. That’s why we call them ‘architectural decisions!’ You basically have to live with your choice of database, authentication, JavaScript UI framework, almost forever.
But that’s changing with LLMs, because you can explore, investigate, and even prototype each one so quickly. Even technology migrations are becoming so much easier/cheaper/faster.
These are all examples of increasing optionality.
— Steve Yegge, via Gene Kim
GitHub OAuth for a static site using Cloudflare Workers. Here's a TIL covering a Thanksgiving AI-assisted programming project. I wanted to add OAuth against GitHub to some of the projects on my tools.simonwillison.net site in order to implement "Save to Gist".
That site is entirely statically hosted by GitHub Pages, but OAuth has a required server-side component: there's a client_secret
involved that should never be included in client-side code.
Since I serve the site from behind Cloudflare I realized that a minimal Cloudflare Workers script may be enough to plug the gap. I got Claude on my phone to build me a prototype and then pasted that (still on my phone) into a new Cloudflare Worker and it worked!
... almost. On later closer inspection of the code it was missing error handling... and then someone pointed out it was vulnerable to a login CSRF attack thanks to failure to check the state=
parameter. I worked with Claude to fix those too.
Useful reminder here that pasting code AI-generated code around on a mobile phone isn't necessarily the best environment to encourage a thorough code review!
Leaked system prompts from Vercel v0. v0 is Vercel's entry in the increasingly crowded LLM-assisted development market - chat with a bot and have that bot build a full application for you.
They've been iterating on it since launching in October last year, making it one of the most mature products in this space.
Somebody leaked the system prompts recently. Vercel CTO Malte Ubl said this:
When @v0 first came out we were paranoid about protecting the prompt with all kinds of pre and post processing complexity.
We completely pivoted to let it rip. A prompt without the evals, models, and especially UX is like getting a broken ASML machine without a manual
Ask questions of SQLite databases and CSV/JSON files in your terminal
I built a new plugin for my sqlite-utils CLI tool that lets you ask human-language questions directly of SQLite databases and CSV/JSON files on your computer.
[... 723 words]open-interpreter (via) This "natural language interface for computers" open source ChatGPT Code Interpreter alternative has been around for a while, but today I finally got around to trying it out.
Here's how I ran it (without first installing anything) using uv
:
uvx --from open-interpreter interpreter
The default mode asks you for an OpenAI API key so it can use gpt-4o
- there are a multitude of other options, including the ability to use local models with interpreter --local
.
It runs in your terminal and works by generating Python code to help answer your questions, asking your permission to run it and then executing it directly on your computer.
I pasted in an API key and then prompted it with this:
find largest files on my desktop
Here's the full transcript.
Since code is run directly on your machine there are all sorts of ways things could go wrong if you don't carefully review the generated code before hitting "y". The team have an experimental safe mode in development which works by scanning generated code with semgrep. I'm not convinced by that approach, I think executing code in a sandbox would be a much more robust solution here - but sandboxing Python is still a very difficult problem.
They do at least have an experimental Docker integration.
Foursquare Open Source Places: A new foundational dataset for the geospatial community (via) I did not expect this!
[...] we are announcing today the general availability of a foundational open data set, Foursquare Open Source Places ("FSQ OS Places"). This base layer of 100mm+ global places of interest ("POI") includes 22 core attributes (see schema here) that will be updated monthly and available for commercial use under the Apache 2.0 license framework.
The data is available as Parquet files hosted on Amazon S3.
Here's how to list the available files:
aws s3 ls s3://fsq-os-places-us-east-1/release/dt=2024-11-19/places/parquet/
I got back places-00000.snappy.parquet
through places-00024.snappy.parquet
, each file around 455MB for a total of 10.6GB of data.
I ran duckdb
and then used DuckDB's ability to remotely query Parquet on S3 to explore the data a bit more without downloading it to my laptop first:
select count(*) from 's3://fsq-os-places-us-east-1/release/dt=2024-11-19/places/parquet/places-00000.snappy.parquet';
This got back 4,180,424 - that number is similar for each file, suggesting around 104,000,000 records total.
Update: DuckDB can use wildcards in S3 paths (thanks, Paul) so this query provides an exact count:
select count(*) from 's3://fsq-os-places-us-east-1/release/dt=2024-11-19/places/parquet/places-*.snappy.parquet';
That returned 104,511,073 - and Activity Monitor on my Mac confirmed that DuckDB only needed to fetch 1.2MB of data to answer that query.
I ran this query to retrieve 1,000 places from that first file as newline-delimited JSON:
copy (
select * from 's3://fsq-os-places-us-east-1/release/dt=2024-11-19/places/parquet/places-00000.snappy.parquet'
limit 1000
) to '/tmp/places.json';
Here's that places.json file, and here it is imported into Datasette Lite.
Finally, I got ChatGPT Code Interpreter to convert that file to GeoJSON and pasted the result into this Gist, giving me a map of those thousand places (because Gists automatically render GeoJSON):
QuickTime video script to capture frames and bounding boxes. An update to an older TIL. I'm working on the write-up for my DjangoCon US talk on plugins and I found myself wanting to capture individual frames from the video in two formats: a full frame capture, and another that captured just the portion of the screen shared from my laptop.
I have a script for the former, so I got Claude to update my script to add support for one or more --box
options, like this:
capture-bbox.sh ../output.mp4 --box '31,17,100,87' --box '0,0,50,50'
Open output.mp4
in QuickTime Player, run that script and then every time you hit a key in the terminal app it will capture three JPEGs from the current position in QuickTime Player - one for the whole screen and one each for the specified bounding box regions.
Those bounding box regions are percentages of the width and height of the image. I also got Claude to build me this interactive tool on top of cropperjs to help figure out those boxes:
Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac
There’s a whole lot of buzz around the new Qwen2.5-Coder Series of open source (Apache 2.0 licensed) LLM releases from Alibaba’s Qwen research team. On first impression it looks like the buzz is well deserved.
[... 697 words]MDN Browser Support Timelines. I complained on Hacker News today that I wished the MDN browser compatibility ables - like this one for the Web Locks API - included an indication as to when each browser was released rather than just the browser numbers.
It turns out they do! If you click on each browser version in turn you can see an expanded area showing the browser release date:
There's even an inline help tip telling you about the feature, which I've been studiously ignoring for years.
I want to see all the information at once without having to click through each browser. I had a poke around in the Firefox network tab and found https://bcd.developer.mozilla.org/bcd/api/v0/current/api.Lock.json - a JSON document containing browser support details (with release dates) for that API... and it was served using access-control-allow-origin: *
which means I can hit it from my own little client-side applications.
I decided to build something with an autocomplete drop-down interface for selecting the API. That meant I'd need a list of all of the available APIs, and I used GitHub code search to find that in the mdn/browser-compat-data repository, in the api/
directory.
I needed the list of files in that directory for my autocomplete. Since there are just over 1,000 of those the regular GitHub contents API won't return them all, so I switched to the tree API instead.
Here's the finished tool - source code here:
95% of the code was written by LLMs, but I did a whole lot of assembly and iterating to get it to the finished state. Three of the transcripts for that:
- Web Locks API Browser Support Timeline in which I paste in the original API JSON and ask it to come up with a timeline visualization for it.
- Enhancing API Feature Display with URL Hash where I dumped in a more complex JSON example to get it to show multiple APIs on the same page, and also had it add
#fragment
bookmarking to the tool - Fetch GitHub API Data Hierarchy where I got it to write me an async JavaScript function for fetching a directory listing from that tree API.
Generating documentation from tests using files-to-prompt and LLM. I was experimenting with the wasmtime-py Python library today (for executing WebAssembly programs from inside CPython) and I found the existing API docs didn't quite show me what I wanted to know.
The project has a comprehensive test suite so I tried seeing if I could generate documentation using that:
cd /tmp
git clone https://github.com/bytecodealliance/wasmtime-py
files-to-prompt -e py wasmtime-py/tests -c | \
llm -m claude-3.5-sonnet -s \
'write detailed usage documentation including realistic examples'
More notes in my TIL. You can see the full Claude transcript here - I think this worked really well!
California Clock Change. The clocks go back in California tonight and I finally built my dream application for helping me remember if I get an hour extra of sleep or not, using a Claude Artifact. Here's the transcript.
This is one of my favorite examples yet of the kind of tiny low stakes utilities I'm building with Claude Artifacts because the friction involved in churning out a working application has dropped almost to zero.
(I added another feature: it now includes a note of what time my Dog thinks it is if the clocks have recently changed.)
Claude Token Counter. Anthropic released a token counting API for Claude a few days ago.
I built this tool for running prompts, images and PDFs against that API to count the tokens in them.
The API is free (albeit rate limited), but you'll still need to provide your own API key in order to use it.
Here's the source code. I built this using two sessions with Claude - one to build the initial tool and a second to add PDF and image support. That second one is a bit of a mess - it turns out if you drop an HTML file onto a Claude conversation it converts it to Markdown for you, but I wanted it to modify the original HTML source.
The API endpoint also allows you to specify a model, but as far as I can tell from running some experiments the token count was the same for Haiku, Opus and Sonnet 3.5.
Cerebras Coder (via) Val Town founder Steve Krouse has been building demos on top of the Cerebras API that runs Llama3.1-70b at 2,000 tokens/second.
Having a capable LLM with that kind of performance turns out to be really interesting. Cerebras Coder is a demo that implements Claude Artifact-style on-demand JavaScript apps, and having it run at that speed means changes you request are visible within less than a second:
Steve's implementation (created with the help of Townie, the Val Town code assistant) demonstrates the simplest possible version of an iframe sandbox:
<iframe
srcDoc={code}
sandbox="allow-scripts allow-modals allow-forms allow-popups allow-same-origin allow-top-navigation allow-downloads allow-presentation allow-pointer-lock"
/>
Where code
is populated by a setCode(...)
call inside a React component.
The most interesting applications of LLMs continue to be where they operate in a tight loop with a human - this can make those review loops potentially much faster and more productive.
Bringing developer choice to Copilot with Anthropic’s Claude 3.5 Sonnet, Google’s Gemini 1.5 Pro, and OpenAI’s o1-preview. The big announcement from GitHub Universe: Copilot is growing support for alternative models.
GitHub Copilot predated the release of ChatGPT by more than year, and was the first widely used LLM-powered tool. This announcement includes a brief history lesson:
The first public version of Copilot was launched using Codex, an early version of OpenAI GPT-3, specifically fine-tuned for coding tasks. Copilot Chat was launched in 2023 with GPT-3.5 and later GPT-4. Since then, we have updated the base model versions multiple times, using a range from GPT 3.5-turbo to GPT 4o and 4o-mini models for different latency and quality requirements.
It's increasingly clear that any strategy that ties you to models from exclusively one provider is short-sighted. The best available model for a task can change every few months, and for something like AI code assistance model quality matters a lot. Getting stuck with a model that's no longer best in class could be a serious competitive disadvantage.
The other big announcement from the keynote was GitHub Spark, described like this:
Sparks are fully functional micro apps that can integrate AI features and external data sources without requiring any management of cloud resources.
I got to play with this at the event. It's effectively a cross between Claude Artifacts and GitHub Gists, with some very neat UI details. The features that really differentiate it from Artifacts is that Spark apps gain access to a server-side key/value store which they can use to persist JSON - and they can also access an API against which they can execute their own prompts.
The prompt integration is particularly neat because prompts used by the Spark apps are extracted into a separate UI so users can view and modify them without having to dig into the (editable) React JavaScript code.
Prompt GPT-4o audio. A week and a half ago I built a tool for experimenting with OpenAI's new audio input. I just put together the other side of that, for experimenting with audio output.
Once you've provided an API key (which is saved in localStorage) you can use this to prompt the gpt-4o-audio-preview
model with a system and regular prompt and select a voice for the response.
I built it with assistance from Claude: initial app, adding system prompt support.
You can preview and download the resulting wav
file, and you can also copy out the raw JSON. If you save that in a Gist you can then feed its Gist ID to https://tools.simonwillison.net/gpt-4o-audio-player?gist=GIST_ID_HERE
(Claude transcript) to play it back again.
You can try using that to listen to my French accented pelican description.
There's something really interesting to me here about this form of application which exists entirely as HTML and JavaScript that uses CORS to talk to various APIs. GitHub's Gist API is accessible via CORS too, so it wouldn't take much more work to add a "save" button which writes out a new Gist after prompting for a personal access token. I prototyped that a bit here.
Run a prompt to generate and execute jq programs using llm-jq
llm-jq is a brand new plugin for LLM which lets you pipe JSON directly into the llm jq
command along with a human-language description of how you’d like to manipulate that JSON and have a jq program generated and executed for you on the fly.
Mastodon discussion about sandboxing SVG data. I asked this on Mastodon and got some really useful replies:
How hard is it to process untrusted SVG data to strip out any potentially harmful tags or attributes (like stuff that might execute JavaScript)?
The winner for me turned out to be the humble <img src="">
tag. SVG images that are rendered in an image have all dynamic functionality - including embedded JavaScript - disabled by default, and that's something that's directly included in the spec:
2.2.6. Secure static mode
This processing mode is intended for circumstances where an SVG document is to be used as a non-animated image that is not allowed to resolve external references, and which is not intended to be used as an interactive document. This mode might be used where image support has traditionally been limited to non-animated raster images (such as JPEG and PNG.)
[...]
'image' references
An SVG embedded within an 'image' element must be processed in secure animated mode if the embedding document supports declarative animation, or in secure static mode otherwise.
The same processing modes are expected to be used for other cases where SVG is used in place of a raster image, such as an HTML 'img' element or in any CSS property that takes an
data type. This is consistent with HTML's requirement that image sources must reference "a non-interactive, optionally animated, image resource that is neither paged nor scripted" [HTML]
This also works for SVG data that's presented in a <img src="data:image/svg+xml;base64,...
attribute. I had Claude help spin me up this interactive demo:
Build me an artifact - just HTML, no JavaScript - which demonstrates embedding some SVG files using img src= base64 URIs
I want three SVGs - one of the sun, one of a pelican and one that includes some tricky javascript things which I hope the img src= tag will ignore
If you right click and "open in a new tab" on the JavaScript-embedding SVG that script will execute, showing an alert. You can click the image to see another alert showing location.href
and document.cookie
which should confirm that the base64 image is not treated as having the same origin as the page itself.
Notes on the new Claude analysis JavaScript code execution tool
Anthropic released a new feature for their Claude.ai consumer-facing chat bot interface today which they’re calling “the analysis tool”.
[... 918 words]Running prompts against images and PDFs with Google Gemini.
New TIL. I've been experimenting with the Google Gemini APIs for running prompts against images and PDFs (in preparation for finally adding multi-modal support to LLM) - here are my notes on how to send images or PDF files to their API using curl
and the base64 -i
macOS command.
I figured out the curl
incantation first and then got Claude to build me a Bash script that I can execute like this:
prompt-gemini 'extract text' example-handwriting.jpg
Playing with this is really fun. The Gemini models charge less than 1/10th of a cent per image, so it's really inexpensive to try them out.
I've often been building single-use apps with Claude Artifacts when I'm helping my children learn. For example here's one on visualizing fractions. [...] What's more surprising is that it is far easier to create an app on-demand than searching for an app in the app store that will do what I'm looking for. Searching for kids' learning apps is typically a nails-on-chalkboard painful experience because 95% of them are addictive garbage. And even if I find something usable, it can't match the fact that I can tell Claude what I want.
Everything I built with Claude Artifacts this week
I’m a huge fan of Claude’s Artifacts feature, which lets you prompt Claude to create an interactive Single Page App (using HTML, CSS and JavaScript) and then view the result directly in the Claude interface, iterating on it further with the bot and then, if you like, copying out the resulting code.
[... 2,273 words]Dashboard: Tools. I used Django SQL Dashboard to spin up a dashboard that shows all of the URLs to my tools.simonwillison.net site that I've shared on my blog so far. It uses this (Claude assisted) regular expression in a PostgreSQL SQL query:
select distinct on (tool_url)
unnest(regexp_matches(
body,
'(https://tools\.simonwillison\.net/[^<"\s)]+)',
'g'
)) as tool_url,
'https://simonwillison.net/' || left(type, 1) || '/' || id as blog_url,
title,
date(created) as created
from content
I've been really enjoying having a static hosting platform (it's GitHub Pages serving my simonw/tools repo) that I can use to quickly deploy little HTML+JavaScript interactive tools and demos.
The 3 AI Use Cases: Gods, Interns, and Cogs. Drew Breunig introduces an interesting new framework for categorizing use cases of modern AI:
- Gods refers to the autonomous, human replacement applications - I see that as AGI stuff that's still effectively science fiction.
- Interns are supervised copilots. This is how I get most of the value out of LLMs at the moment, delegating tasks to them that I can then review, such as AI-assisted programming.
- Cogs are the smaller, more reliable components that you can build pipelines and automations on top of without needing to review everything they do - think Whisper for transcriptions or maybe some limited LLM subtasks such as structured data extraction.
Drew also considers Toys as a subcategory of Interns: things like image generators, “defined by their usage by non-experts. Toys have a high tolerance for errors because they’re not being relied on for much beyond entertainment.”
You can use text-wrap: balance; on icons. Neat CSS experiment from Terence Eden: the new text-wrap: balance CSS property is intended to help make text like headlines display without ugly wrapped single orphan words, but Terence points out it can be used for icons too:
This inspired me to investigate if the same technique could work for text based navigation elements. I used Claude to build this interactive prototype of a navigation bar that uses text-wrap: balance
against a list of display: inline
menu list items. It seems to work well!
My first attempt used display: inline-block
which worked in Safari but failed in Firefox.
Notable limitation from that MDN article:
Because counting characters and balancing them across multiple lines is computationally expensive, this value is only supported for blocks of text spanning a limited number of lines (six or less for Chromium and ten or less for Firefox)
So it's fine for these navigation concepts but isn't something you can use for body text.
Experimenting with audio input and output for the OpenAI Chat Completion API
OpenAI promised this at DevDay a few weeks ago and now it’s here: their Chat Completion API can now accept audio as input and return it as output. OpenAI still recommend their WebSocket-based Realtime API for audio tasks, but the Chat Completion API is a whole lot easier to write code against.
[... 1,522 words]Video scraping: extracting JSON data from a 35 second screen capture for less than 1/10th of a cent
The other day I found myself needing to add up some numeric values that were scattered across twelve different emails.
[... 1,220 words]My Jina Reader tool. I wanted to feed the Cloudflare Durable Objects SQLite documentation into Claude, but I was on my iPhone so copying and pasting was inconvenient. Jina offer a Reader API which can turn any URL into LLM-friendly Markdown and it turns out it supports CORS, so I got Claude to build me this tool (second iteration, third iteration, final source code).
Paste in a URL to get the Jina Markdown version, along with an all important "Copy to clipboard" button.
An LLM TDD loop (via) Super neat demo by David Winterbottom, who wrapped my LLM and files-to-prompt tools in a short Bash script that can be fed a file full of Python unit tests and an empty implementation file and will then iterate on that file in a loop until the tests pass.
lm.rs: run inference on Language Models locally on the CPU with Rust (via) Impressive new LLM inference implementation in Rust by Samuel Vitorino. I tried it just now on an M2 Mac with 64GB of RAM and got very snappy performance for this Q8 Llama 3.2 1B, with Activity Monitor reporting 980% CPU usage over 13 threads.
Here's how I compiled the library and ran the model:
cd /tmp
git clone https://github.com/samuel-vitorino/lm.rs
cd lm.rs
RUSTFLAGS="-C target-cpu=native" cargo build --release --bin chat
curl -LO 'https://huggingface.co/samuel-vitorino/Llama-3.2-1B-Instruct-Q8_0-LMRS/resolve/main/tokenizer.bin?download=true'
curl -LO 'https://huggingface.co/samuel-vitorino/Llama-3.2-1B-Instruct-Q8_0-LMRS/resolve/main/llama3.2-1b-it-q80.lmrs?download=true'
./target/release/chat --model llama3.2-1b-it-q80.lmrs --show-metrics
That --show-metrics
option added this at the end of a response:
Speed: 26.41 tok/s
It looks like the performance is helped by two key dependencies: wide, which provides data types optimized for SIMD operations and rayon for running parallel iterators across multiple cores (used for matrix multiplication).
(I used LLM and files-to-prompt
to help figure this out.)
SVG to JPG/PNG. The latest in my ongoing series of interactive HTML and JavaScript tools written almost entirely by LLMs. This one lets you paste in (or open-from-file, or drag-onto-page) some SVG and then use that to render a JPEG or PNG image of your desired width.
I built this using Claude 3.5 Sonnet, initially as an Artifact and later in a code editor since some of the features (loading an example image and downloading the result) cannot run in the sandboxed iframe Artifact environment.
Here's the full transcript of the Claude conversation I used to build the tool, plus a few commits I later made by hand to further customize it.
The code itself is mostly quite simple. The most interesting part is how it renders the SVG to an image, which (simplified) looks like this:
// First extract the viewbox to get width/height
const svgElement = new DOMParser().parseFromString(
svgInput, 'image/svg+xml'
).documentElement;
let viewBox = svgElement.getAttribute('viewBox');
[, , width, height] = viewBox.split(' ').map(Number);
// Figure out the width/height of the output image
const newWidth = parseInt(widthInput.value) || 800;
const aspectRatio = width / height;
const newHeight = Math.round(newWidth / aspectRatio);
// Create off-screen canvas
const canvas = document.createElement('canvas');
canvas.width = newWidth;
canvas.height = newHeight;
// Draw SVG on canvas
const svgBlob = new Blob([svgInput], {type: 'image/svg+xml;charset=utf-8'});
const svgUrl = URL.createObjectURL(svgBlob);
const img = new Image();
const ctx = canvas.getContext('2d');
img.onload = function() {
ctx.drawImage(img, 0, 0, newWidth, newHeight);
URL.revokeObjectURL(svgUrl);
// Convert that to a JPEG
const imageDataUrl = canvas.toDataURL("image/jpeg");
const convertedImg = document.createElement('img');
convertedImg.src = imageDataUrl;
imageContainer.appendChild(convertedImg);
};
img.src = svgUrl;
Here's the MDN explanation of that revokeObjectURL() method, which I hadn't seen before.
Call this method when you've finished using an object URL to let the browser know not to keep the reference to the file any longer.