Simon Willison’s Weblog

Subscribe
Atom feed for ai-assisted-programming

71 items tagged “ai-assisted-programming”

Using AI tools such as Large Language Models to help write code.

2024

Notes on the new Claude analysis JavaScript code execution tool

Visit Notes on the new Claude analysis JavaScript code execution tool

Anthropic released a new feature for their Claude.ai consumer-facing chat bot interface today which they’re calling “the analysis tool”.

[... 842 words]

Running prompts against images and PDFs with Google Gemini. New TIL. I've been experimenting with the Google Gemini APIs for running prompts against images and PDFs (in preparation for finally adding multi-modal support to LLM) - here are my notes on how to send images or PDF files to their API using curl and the base64 -i macOS command.

I figured out the curl incantation first and then got Claude to build me a Bash script that I can execute like this:

prompt-gemini 'extract text' example-handwriting.jpg

Animated terminal demo. At the top of the screen is a example-handwriting.jpg with some rough handwriting. I run this command in a terminal: 
prompt-gemini 'extract text' example-handwriting.jpg It returns JSON showing 270 tokens used by gemini-1.5-flash-8b. Then I run the command again with -r on the end and it returns the text from the image: Example handwriting Let's try this out

Playing with this is really fun. The Gemini models charge less than 1/10th of a cent per image, so it's really inexpensive to try them out.

# 23rd October 2024, 6:25 pm / vision-llms, gemini, llm, bash, ai, llms, ai-assisted-programming, google, generative-ai, ocr, projects

I've often been building single-use apps with Claude Artifacts when I'm helping my children learn. For example here's one on visualizing fractions. [...] What's more surprising is that it is far easier to create an app on-demand than searching for an app in the app store that will do what I'm looking for. Searching for kids' learning apps is typically a nails-on-chalkboard painful experience because 95% of them are addictive garbage. And even if I find something usable, it can't match the fact that I can tell Claude what I want.

Arvind Narayanan

# 21st October 2024, 4:12 pm / anthropic, claude, education, ai, llms, claude-artifacts, ai-assisted-programming, generative-ai, arvind-narayana

Everything I built with Claude Artifacts this week

Visit Everything I built with Claude Artifacts this week

I’m a huge fan of Claude’s Artifacts feature, which lets you prompt Claude to create an interactive Single Page App (using HTML, CSS and JavaScript) and then view the result directly in the Claude interface, iterating on it further with the bot and then, if you like, copying out the resulting code.

[... 2,273 words]

Dashboard: Tools. I used Django SQL Dashboard to spin up a dashboard that shows all of the URLs to my tools.simonwillison.net site that I've shared on my blog so far. It uses this (Claude assisted) regular expression in a PostgreSQL SQL query:

select distinct on (tool_url)
    unnest(regexp_matches(
        body,
        '(https://tools\.simonwillison\.net/[^<"\s)]+)',
        'g'
    )) as tool_url,
    'https://simonwillison.net/' || left(type, 1) || '/' || id as blog_url,
    title,
    date(created) as created
from content

I've been really enjoying having a static hosting platform (it's GitHub Pages serving my simonw/tools repo) that I can use to quickly deploy little HTML+JavaScript interactive tools and demos.

# 21st October 2024, 3:33 am / django-sql-dashboard, ai-assisted-programming, tools, projects, postgresql, sql, javascript

The 3 AI Use Cases: Gods, Interns, and Cogs. Drew Breunig introduces an interesting new framework for categorizing use cases of modern AI:

  • Gods refers to the autonomous, human replacement applications - I see that as AGI stuff that's still effectively science fiction.
  • Interns are supervised copilots. This is how I get most of the value out of LLMs at the moment, delegating tasks to them that I can then review, such as AI-assisted programming.
  • Cogs are the smaller, more reliable components that you can build pipelines and automations on top of without needing to review everything they do - think Whisper for transcriptions or maybe some limited LLM subtasks such as structured data extraction.

Drew also considers Toys as a subcategory of Interns: things like image generators, “defined by their usage by non-experts. Toys have a high tolerance for errors because they’re not being relied on for much beyond entertainment.”

# 20th October 2024, 10:12 pm / drew-breunig, ai-assisted-programming, generative-ai, ai, llms

You can use text-wrap: balance; on icons. Neat CSS experiment from Terence Eden: the new text-wrap: balance CSS property is intended to help make text like headlines display without ugly wrapped single orphan words, but Terence points out it can be used for icons too:

A row of icons, without text-wrap balances just one is wrapped on the second line. With the propert they are split into two lines with equal numbers of icons.

This inspired me to investigate if the same technique could work for text based navigation elements. I used Claude to build this interactive prototype of a navigation bar that uses text-wrap: balance against a list of display: inline menu list items. It seems to work well!

Animated demo. A navigation menu with 13 items - things like Home and About and Services and a products. These are wrapped on four lines with 4, 4, 4 and then 1 item. Selecting the enable text-wrap: balances checkbox changes that to 3, 4, 3, 3 - a slider also allows the number of visible items to be changed to see the effect that has

My first attempt used display: inline-block which worked in Safari but failed in Firefox.

Notable limitation from that MDN article:

Because counting characters and balancing them across multiple lines is computationally expensive, this value is only supported for blocks of text spanning a limited number of lines (six or less for Chromium and ten or less for Firefox)

So it's fine for these navigation concepts but isn't something you can use for body text.

# 20th October 2024, 1:23 pm / css, claude-artifacts, anthropic, ai-assisted-programming, claude, terence-eden

Experimenting with audio input and output for the OpenAI Chat Completion API

Visit Experimenting with audio input and output for the OpenAI Chat Completion API

OpenAI promised this at DevDay a few weeks ago and now it’s here: their Chat Completion API can now accept audio as input and return it as output. OpenAI still recommend their WebSocket-based Realtime API for audio tasks, but the Chat Completion API is a whole lot easier to write code against.

[... 1,502 words]

Video scraping: extracting JSON data from a 35 second screen capture for less than 1/10th of a cent

Visit Video scraping: extracting JSON data from a 35 second screen capture for less than 1/10th of a cent

The other day I found myself needing to add up some numeric values that were scattered across twelve different emails.

[... 1,220 words]

My Jina Reader tool. I wanted to feed the Cloudflare Durable Objects SQLite documentation into Claude, but I was on my iPhone so copying and pasting was inconvenient. Jina offer a Reader API which can turn any URL into LLM-friendly Markdown and it turns out it supports CORS, so I got Claude to build me this tool (second iteration, third iteration, final source code).

Paste in a URL to get the Jina Markdown version, along with an all important "Copy to clipboard" button.

# 14th October 2024, 4:47 pm / projects, markdown, ai-assisted-programming, jina, claude-3-5-sonnet, claude, generative-ai, ai, llms

An LLM TDD loop (via) Super neat demo by David Winterbottom, who wrapped my LLM and files-to-prompt tools in a short Bash script that can be fed a file full of Python unit tests and an empty implementation file and will then iterate on that file in a loop until the tests pass.

# 13th October 2024, 7:37 pm / llm, ai-assisted-programming, python, generative-ai, pytest, ai, llms

lm.rs: run inference on Language Models locally on the CPU with Rust (via) Impressive new LLM inference implementation in Rust by Samuel Vitorino. I tried it just now on an M2 Mac with 64GB of RAM and got very snappy performance for this Q8 Llama 3.2 1B, with Activity Monitor reporting 980% CPU usage over 13 threads.

Here's how I compiled the library and ran the model:

cd /tmp
git clone https://github.com/samuel-vitorino/lm.rs
cd lm.rs
RUSTFLAGS="-C target-cpu=native" cargo build --release --bin chat
curl -LO 'https://huggingface.co/samuel-vitorino/Llama-3.2-1B-Instruct-Q8_0-LMRS/resolve/main/tokenizer.bin?download=true'
curl -LO 'https://huggingface.co/samuel-vitorino/Llama-3.2-1B-Instruct-Q8_0-LMRS/resolve/main/llama3.2-1b-it-q80.lmrs?download=true'
./target/release/chat --model llama3.2-1b-it-q80.lmrs --show-metrics

That --show-metrics option added this at the end of a response:

Speed: 26.41 tok/s

It looks like the performance is helped by two key dependencies: wide, which provides data types optimized for SIMD operations and rayon for running parallel iterators across multiple cores (used for matrix multiplication).

(I used LLM and files-to-prompt to help figure this out.)

# 11th October 2024, 7:33 pm / llm, rust, ai-assisted-programming, generative-ai, ai, llms

SVG to JPG/PNG. The latest in my ongoing series of interactive HTML and JavaScript tools written almost entirely by LLMs. This one lets you paste in (or open-from-file, or drag-onto-page) some SVG and then use that to render a JPEG or PNG image of your desired width.

Screenshot of the SVG to JPEG/PNG tool. It starts with a Browse... option for selecting a file, next to a Load example image link, above a textarea full of SVG code. Then a radio box to select between JPEG and PNG, plus a background color color picker widget next to a checkbox labelled transparent. Then Output width, a number field set to 300. Then a convert SVG button. Below is the classic SVG tiger image, with a Download image link that says 47.38BK. Under that is a Base 64 image tag header with a copy image tag button and some visible HTML for a data:image/jpeg image element.

I built this using Claude 3.5 Sonnet, initially as an Artifact and later in a code editor since some of the features (loading an example image and downloading the result) cannot run in the sandboxed iframe Artifact environment.

Here's the full transcript of the Claude conversation I used to build the tool, plus a few commits I later made by hand to further customize it.

The code itself is mostly quite simple. The most interesting part is how it renders the SVG to an image, which (simplified) looks like this:

// First extract the viewbox to get width/height
const svgElement = new DOMParser().parseFromString(
    svgInput, 'image/svg+xml'
).documentElement;
let viewBox = svgElement.getAttribute('viewBox');
[, , width, height] = viewBox.split(' ').map(Number);

// Figure out the width/height of the output image
const newWidth = parseInt(widthInput.value) || 800;
const aspectRatio = width / height;
const newHeight = Math.round(newWidth / aspectRatio);

// Create off-screen canvas
const canvas = document.createElement('canvas');
canvas.width = newWidth;
canvas.height = newHeight;

// Draw SVG on canvas
const svgBlob = new Blob([svgInput], {type: 'image/svg+xml;charset=utf-8'});
const svgUrl = URL.createObjectURL(svgBlob);
const img = new Image();
const ctx = canvas.getContext('2d');
img.onload = function() {
    ctx.drawImage(img, 0, 0, newWidth, newHeight);
    URL.revokeObjectURL(svgUrl);
    // Convert that to a JPEG
    const imageDataUrl = canvas.toDataURL("image/jpeg");
    const convertedImg = document.createElement('img');
    convertedImg.src = imageDataUrl;
    imageContainer.appendChild(convertedImg);
};
img.src = svgUrl;

Here's the MDN explanation of that revokeObjectURL() method, which I hadn't seen before.

Call this method when you've finished using an object URL to let the browser know not to keep the reference to the file any longer.

# 6th October 2024, 7:57 pm / claude-3-5-sonnet, images, claude, ai, llms, svg, claude-artifacts, javascript, ai-assisted-programming, generative-ai

Building an automatically updating live blog in Django. Here's an extended write-up of how I implemented the live blog feature I used for my coverage of OpenAI DevDay yesterday. I built the first version using Claude while waiting for the keynote to start, then upgraded it during the lunch break with the help of GPT-4o to add sort options and incremental fetching of new updates.

# 2nd October 2024, 3:42 pm / claude, ai, django, llms, javascript, ai-assisted-programming, generative-ai, chatgpt

In the future, we won't need programmers; just people who can describe to a computer precisely what they want it to do.

Jason Gorman

# 29th September 2024, 8:21 pm / ai-assisted-programming, llms, ai, generative-ai

OpenFreeMap (via) New free map tile hosting service from Zsolt Ero:

OpenFreeMap lets you display custom maps on your website and apps for free. […] Using our public instance is completely free: there are no limits on the number of map views or requests. There’s no registration, no user database, no API keys, and no cookies. We aim to cover the running costs of our public instance through donations.

The site serves static vector tiles that work with MapLibre GL. It deliberately doesn’t offer any other services such as search or routing.

From the project README looks like it’s hosted on two Hetzner machines. I don’t think the public server is behind a CDN.

Part of the trick to serving the tiles efficiently is the way it takes advantage of Btrfs:

Production-quality hosting of 300 million tiny files is hard. The average file size is just 450 byte. Dozens of tile servers have been written to tackle this problem, but they all have their limitations.

The original idea of this project is to avoid using tile servers altogether. Instead, the tiles are directly served from Btrfs partition images + hard links using an optimised nginx config.

The self-hosting guide describes the scripts that are provided for downloading their pre-built tiles (needing a fresh Ubuntu server with 300GB of SSD and 4GB of RAM) or building the tiles yourself using Planetiler (needs 500GB of disk and 64GB of RAM).

Getting started is delightfully straightforward:

const map = new maplibregl.Map({
  style: 'https://tiles.openfreemap.org/styles/liberty',
  center: [13.388, 52.517],
  zoom: 9.5,
  container: 'map',
})

I got Claude to help build this demo showing a thousand random markers dotted around San Francisco. The 3D tiles even include building shapes!

Map of San Francisco in 3D with building shapes and small blue random markers dotted around.

Zsolt built OpenFreeMap based on his experience running MapHub over the last 9 years. Here’s a 2018 interview about that project.

It’s pretty incredible that the OpenStreetMap and open geospatial stack has evolved to the point now where it’s economically feasible for an individual to offer a service like this. I hope this turns out to be sustainable. Hetzner charge just €1 per TB for bandwidth (S3 can cost $90/TB) which should help a lot.

# 28th September 2024, 9:41 pm / maps, ai-assisted-programming, geospatial, openstreetmap

Solving a bug with o1-preview, files-to-prompt and LLM. I added a new feature to DJP this morning: you can now have plugins specify their middleware in terms of how it should be positioned relative to other middleware - inserted directly before or directly after django.middleware.common.CommonMiddleware for example.

At one point I got stuck with a weird test failure, and after ten minutes of head scratching I decided to pipe the entire thing into OpenAI's o1-preview to see if it could spot the problem. I used files-to-prompt to gather the code and LLM to run the prompt:

files-to-prompt **/*.py -c | llm -m o1-preview "
The middleware test is failing showing all of these - why is MiddlewareAfter repeated so many times?

['MiddlewareAfter', 'Middleware3', 'MiddlewareAfter', 'Middleware5', 'MiddlewareAfter', 'Middleware3', 'MiddlewareAfter', 'Middleware2', 'MiddlewareAfter', 'Middleware3', 'MiddlewareAfter', 'Middleware5', 'MiddlewareAfter', 'Middleware3', 'MiddlewareAfter', 'Middleware4', 'MiddlewareAfter', 'Middleware3', 'MiddlewareAfter', 'Middleware5', 'MiddlewareAfter', 'Middleware3', 'MiddlewareAfter', 'Middleware2', 'MiddlewareAfter', 'Middleware3', 'MiddlewareAfter', 'Middleware5', 'MiddlewareAfter', 'Middleware3', 'MiddlewareAfter', 'Middleware', 'MiddlewareBefore']"

The model whirled away for a few seconds and spat out an explanation of the problem - one of my middleware classes was accidentally calling self.get_response(request) in two different places.

I did enjoy how o1 attempted to reference the relevant Django documentation and then half-repeated, half-hallucinated a quote from it:

Reference: From the Django documentation on writing middleware: Each middleware component is responsible for doing some specific function. They accept the request, do something, and pass the request to the next middleware component (if needed). They can also modify the response before sending it back to the client.

This took 2,538 input tokens and 4,354 output tokens - by my calculations at $15/million input and $60/million output that prompt cost just under 30 cents.

# 25th September 2024, 6:41 pm / o1, llm, djp, openai, ai, llms, ai-assisted-programming, generative-ai

The Pragmatic Engineer Podcast: AI tools for software engineers, but without the hype – with Simon Willison. Gergely Orosz has a brand new podcast, and I was the guest for the first episode. We covered a bunch of ground, but my favorite topic was an exploration of the (very legitimate) reasons that many engineers are resistant to taking advantage of AI-assisted programming tools.

# 25th September 2024, 5:58 pm / podcasts, ai-assisted-programming, generative-ai, gergely-orosz, ai, llms

XKCD 1425 (Tasks) turns ten years old today (via) One of the all-time great XKCDs. It's amazing that "check whether the photo is of a bird" has gone from PhD-level to trivially easy to solve (with a vision LLM, or CLIP, or ResNet+ImageNet among others).

XKCD comic. Cueball: When a user takes a photo, the app should check whether they're in a national park... Ponytail: Sure, easy GIS lookup gimme a few hours. Cueball: ...and check whether the photo is of a bird. Ponytail: I'll need a research team and five years. Caption: In CS, it can be hard to explain the difference between the easy and the virtually impossible.

The key idea still very much stands though. Understanding the difference between easy and hard challenges in software development continues to require an enormous depth of experience.

I'd argue that LLMs have made this even worse.

Understanding what kind of tasks LLMs can and cannot reliably solve remains incredibly difficult and unintuitive. They're computer systems that are terrible at maths and that can't reliably lookup facts!

On top of that, the rise of AI-assisted programming tools means more people than ever are beginning to create their own custom software.

These brand new AI-assisted proto-programmers are having a crash course in this easy-v.s.-hard problem.

I saw someone recently complaining that they couldn't build a Claude Artifact that could analyze images, even though they knew Claude itself could do that. Understanding why that's not possible involves understanding how the CSP headers that are used to serve Artifacts prevent the generated code from making its own API calls out to an LLM!

# 24th September 2024, 3:08 pm / clip, ai-assisted-programming, xkcd, ai, llms

Jiter (via) One of the challenges in dealing with LLM streaming APIs is the need to parse partial JSON - until the stream has ended you won't have a complete valid JSON object, but you may want to display components of that JSON as they become available.

I've solved this previously using the ijson streaming JSON library, see my previous TIL.

Today I found out about Jiter, a new option from the team behind Pydantic. It's written in Rust and extracted from pydantic-core, so the Python wrapper for it can be installed using:

pip install jiter

You can feed it an incomplete JSON bytes object and use partial_mode="on" to parse the valid subset:

import jiter
partial_json = b'{"name": "John", "age": 30, "city": "New Yor'
jiter.from_json(partial_json, partial_mode="on")
# {'name': 'John', 'age': 30}

Or use partial_mode="trailing-strings" to include incomplete string fields too:

jiter.from_json(partial_json, partial_mode="trailing-strings")
# {'name': 'John', 'age': 30, 'city': 'New Yor'}

The current README was a little thin, so I submiitted a PR with some extra examples. I got some help from files-to-prompt and Claude 3.5 Sonnet):

cd crates/jiter-python/ && files-to-prompt -c README.md tests | llm -m claude-3.5-sonnet --system 'write a new README with comprehensive documentation'

# 22nd September 2024, 8:03 pm / json, python, ai-assisted-programming

Whether you think coding with AI works today or not doesn’t really matter.

But if you think functional AI helping to code will make humans dumber or isn’t real programming just consider that’s been the argument against every generation of programming tools going back to Fortran.

Steven Sinofsky

# 21st September 2024, 2:44 pm / ai-assisted-programming, ai

Markdown and Math Live Renderer. Another of my tiny Claude-assisted JavaScript tools. This one lets you enter Markdown with embedded mathematical expressions (like $ax^2 + bx + c = 0$) and live renders those on the page, with an HTML version using MathML that you can export through copy and paste.

Screenshot of the tool in action - Markdown plus math at the top is rendered underneath.

Here's the Claude transcript. I started by asking:

Are there any client side JavaScript markdown libraries that can also handle inline math and render it?

Claude gave me several options including the combination of Marked and KaTeX, so I followed up by asking:

Build an artifact that demonstrates Marked plus KaTeX - it should include a text area I can enter markdown in (repopulated with a good example) and live update the rendered version below. No react.

Which gave me this artifact, instantly demonstrating that what I wanted to do was possible.

I iterated on it a tiny bit to get to the final version, mainly to add that HTML export and a Copy button. The final source code is here.

# 21st September 2024, 4:56 am / claude-3-5-sonnet, anthropic, claude, markdown, mathml, ai, llms, ai-assisted-programming, tools, generative-ai, claude-artifacts

YouTube Thumbnail Viewer. I wanted to find the best quality thumbnail image for a YouTube video, so I could use it as a social media card. I know from past experience that GPT-4 has memorized the various URL patterns for img.youtube.com, so I asked it to guess the URL for my specific video.

This piqued my interest as to what the other patterns were, so I got it to spit those out too. Then, to save myself from needing to look those up again in the future, I asked it to build me a little HTML and JavaScript tool for turning a YouTube video URL into a set of visible thumbnails.

I iterated on the code a bit more after pasting it into Claude and ended up with this, now hosted in my tools collection.

# 20th September 2024, 4:45 am / youtube, ai-assisted-programming, generative-ai, ai, llms, tools

Notes on using LLMs for code

Visit Notes on using LLMs for code

I was recently the guest on TWIML—the This Week in Machine Learning & AI podcast. Our episode is titled Supercharging Developer Productivity with ChatGPT and Claude with Simon Willison, and the focus of the conversation was the ways in which I use LLM tools in my day-to-day work as a software developer and product engineer.

[... 861 words]

Speed matters (via) Jamie Brandon in 2021, talking about the importance of optimizing for the speed at which you can work as a developer:

Being 10x faster also changes the kinds of projects that are worth doing.

Last year I spent something like 100 hours writing a text editor. […] If I was 10x slower it would have been 20-50 weeks. Suddenly that doesn't seem like such a good deal any more - what a waste of a year!

It’s not just about speed of writing code:

When I think about speed I think about the whole process - researching, planning, designing, arguing, coding, testing, debugging, documenting etc.

Often when I try to convince someone to get faster at one of those steps, they'll argue that the others are more important so it's not worthwhile trying to be faster. Eg choosing the right idea is more important than coding the wrong idea really quickly.

But that's totally conditional on the speed of everything else! If you could code 10x as fast then you could try out 10 different ideas in the time it would previously have taken to try out 1 idea. Or you could just try out 1 idea, but have 90% of your previous coding time available as extra idea time.

Jamie’s model here helps explain the effect I described in AI-enhanced development makes me more ambitious with my projects. Prompting an LLM to write portions of my code for me gives me that 5-10x boost in the time I spend typing code into a computer, which has a big effect on my ambitions despite being only about 10% of the activities I perform relevant to building software.

I also increasingly lean on LLMs as assistants in the research phase - exploring library options, building experimental prototypes - and for activities like writing tests and even a little bit of documentation.

# 15th September 2024, 8:58 am / ai-assisted-programming, llms, ai, generative-ai

json-flatten, now with format documentation. json-flatten is a fun little Python library I put together a few years ago for converting JSON data into a flat key-value format, suitable for inclusion in an HTML form or query string. It lets you take a structure like this one:

{"foo": {"bar": [1, True, None]}

And convert it into key-value pairs like this:

foo.bar.[0]$int=1
foo.bar.[1]$bool=True
foo.bar.[2]$none=None

The flatten(dictionary) function function converts to that format, and unflatten(dictionary) converts back again.

I was considering the library for a project today and realized that the 0.3 README was a little thin - it showed how to use the library but didn't provide full details of the format it used.

On a hunch, I decided to see if files-to-prompt plus LLM plus Claude 3.5 Sonnet could write that documentation for me. I ran this command:

files-to-prompt *.py | llm -m claude-3.5-sonnet --system 'write detailed documentation in markdown describing the format used to represent JSON and nested JSON as key/value pairs, include a table as well'

That *.py picked up both json_flatten.py and test_json_flatten.py - I figured the test file had enough examples in that it should act as a good source of information for the documentation.

This worked really well! You can see the first draft it produced here.

It included before and after examples in the documentation. I didn't fully trust these to be accurate, so I gave it this follow-up prompt:

llm -c "Rewrite that document to use the Python cog library to generate the examples"

I'm a big fan of Cog for maintaining examples in READMEs that are generated by code. Cog has been around for a couple of decades now so it was a safe bet that Claude would know about it.

This almost worked - it produced valid Cog syntax like the following:

[[[cog
example = {
"fruits": ["apple", "banana", "cherry"]
}

cog.out("```json\n")
cog.out(str(example))
cog.out("\n```\n")
cog.out("Flattened:\n```\n")
for key, value in flatten(example).items():
    cog.out(f"{key}: {value}\n")
cog.out("```\n")
]]]
[[[end]]]

But that wasn't entirely right, because it forgot to include the Markdown comments that would hide the Cog syntax, which should have looked like this:

<!-- [[[cog -->
...
<!-- ]]] -->
...
<!-- [[[end]]] -->

I could have prompted it to correct itself, but at this point I decided to take over and edit the rest of the documentation by hand.

The end result was documentation that I'm really happy with, and that I probably wouldn't have bothered to write if Claude hadn't got me started.

# 7th September 2024, 5:43 am / claude-3-5-sonnet, llm, anthropic, claude, ai, llms, ai-assisted-programming, generative-ai, projects, json

I think that AI has killed, or is about to kill, pretty much every single modifier we want to put in front of the word “developer.”

“.NET developer”? Meaningless. Copilot, Cursor, etc can get anyone conversant enough with .NET to be productive in an afternoon … as long as you’ve done enough other programming that you know what to prompt.

Forrest Brazeal

# 31st August 2024, 12:52 pm / ai-assisted-programming, llms, ai, generative-ai, forrest-brazeal

OpenAI: Improve file search result relevance with chunk ranking (via) I've mostly been ignoring OpenAI's Assistants API. It provides an alternative to their standard messages API where you construct "assistants", chatbots with optional access to additional tools and that store full conversation threads on the server so you don't need to pass the previous conversation with every call to their API.

I'm pretty comfortable with their existing API and I found the assistants API to be quite a bit more complicated. So far the only thing I've used it for is a script to scrape OpenAI Code Interpreter to keep track of updates to their enviroment's Python packages.

Code Interpreter aside, the other interesting assistants feature is File Search. You can upload files in a wide variety of formats and OpenAI will chunk them, store the chunks in a vector store and make them available to help answer questions posed to your assistant - it's their version of hosted RAG.

Prior to today OpenAI had kept the details of how this worked undocumented. I found this infuriating, because when I'm building a RAG system the details of how files are chunked and scored for relevance is the whole game - without understanding that I can't make effective decisions about what kind of documents to use and how to build on top of the tool.

This has finally changed! You can now run a "step" (a round of conversation in the chat) and then retrieve details of exactly which chunks of the file were used in the response and how they were scored using the following incantation:

run_step = client.beta.threads.runs.steps.retrieve(
    thread_id="thread_abc123",
    run_id="run_abc123",
    step_id="step_abc123",
    include=[
        "step_details.tool_calls[*].file_search.results[*].content"
    ]
)

(See what I mean about the API being a little obtuse?)

I tried this out today and the results were very promising. Here's a chat transcript with an assistant I created against an old PDF copy of the Datasette documentation - I used the above new API to dump out the full list of snippets used to answer the question "tell me about ways to use spatialite".

It pulled in a lot of content! 57,017 characters by my count, spread across 20 search results (customizable), for a total of 15,021 tokens as measured by ttok. At current GPT-4o-mini prices that would cost 0.225 cents (less than a quarter of a cent), but with regular GPT-4o it would cost 7.5 cents.

OpenAI provide up to 1GB of vector storage for free, then charge $0.10/GB/day for vector storage beyond that. My 173 page PDF seems to have taken up 728KB after being chunked and stored, so that GB should stretch a pretty long way.

Confession: I couldn't be bothered to work through the OpenAI code examples myself, so I hit Ctrl+A on that web page and copied the whole lot into Claude 3.5 Sonnet, then prompted it:

Based on this documentation, write me a Python CLI app (using the Click CLi library) with the following features:

openai-file-chat add-files name-of-vector-store *.pdf *.txt

This creates a new vector store called name-of-vector-store and adds all the files passed to the command to that store.

openai-file-chat name-of-vector-store1 name-of-vector-store2 ...

This starts an interactive chat with the user, where any time they hit enter the question is answered by a chat assistant using the specified vector stores.

We iterated on this a few times to build me a one-off CLI app for trying out the new features. It's got a few bugs that I haven't fixed yet, but it was a very productive way of prototyping against the new API.

# 30th August 2024, 4:03 am / embeddings, vector-search, generative-ai, openai, ai, rag, llms, claude-3-5-sonnet, ai-assisted-programming

How Anthropic built Artifacts. Gergely Orosz interviews five members of Anthropic about how they built Artifacts on top of Claude with a small team in just three months.

The initial prototype used Streamlit, and the biggest challenge was building a robust sandbox to run the LLM-generated code in:

We use iFrame sandboxes with full-site process isolation. This approach has gotten robust over the years. This protects users' main Claude.ai browsing session from malicious artifacts. We also use strict Content Security Policies (CSPs) to enforce limited and controlled network access.

Artifacts were launched in general availability yesterday - previously you had to turn them on as a preview feature. Alex Albert has a 14 minute demo video up on Twitter showing the different forms of content they can create, including interactive HTML apps, Markdown, HTML, SVG, Mermaid diagrams and React Components.

# 28th August 2024, 11:28 pm / claude-artifacts, anthropic, claude, gergely-orosz, ai, llms, alex-albert, sandboxing, iframes, security, ai-assisted-programming

System prompt for val.town/townie (via) Val Town (previously) provides hosting and a web-based coding environment for Vals - snippets of JavaScript/TypeScript that can run server-side as scripts, on a schedule or hosting a web service.

Townie is Val's new AI bot, providing a conversational chat interface for creating fullstack web apps (with blob or SQLite persistence) as Vals.

In the most recent release of Townie Val added the ability to inspect and edit its system prompt!

I've archived a copy in this Gist, as a snapshot of how Townie works today. It's surprisingly short, relying heavily on the model's existing knowledge of Deno and TypeScript.

I enjoyed the use of "tastefully" in this bit:

Tastefully add a view source link back to the user's val if there's a natural spot for it and it fits in the context of what they're building. You can generate the val source url via import.meta.url.replace("esm.town", "val.town").

The prompt includes a few code samples, like this one demonstrating how to use Val's SQLite package:

import { sqlite } from "https://esm.town/v/stevekrouse/sqlite";
let KEY = new URL(import.meta.url).pathname.split("/").at(-1);
(await sqlite.execute(`select * from ${KEY}_users where id = ?`, [1])).rows[0].id

It also reveals the existence of Val's very own delightfully simple image generation endpoint Val, currently powered by Stable Diffusion XL Lightning on fal.ai.

If you want an AI generated image, use https://maxm-imggenurl.web.val.run/the-description-of-your-image to dynamically generate one.

Here's a fun colorful raccoon with a wildly inappropriate hat.

Val are also running their own gpt-4o-mini proxy, free to users of their platform:

import { OpenAI } from "https://esm.town/v/std/openai";
const openai = new OpenAI();
const completion = await openai.chat.completions.create({
  messages: [
    { role: "user", content: "Say hello in a creative way" },
  ],
  model: "gpt-4o-mini",
  max_tokens: 30,
});

Val developer JP Posma wrote a lot more about Townie in How we built Townie – an app that generates fullstack apps, describing their prototyping process and revealing that the current model it's using is Claude 3.5 Sonnet.

Their current system prompt was refined over many different versions - initially they were including 50 example Vals at quite a high token cost, but they were able to reduce that down to the linked system prompt which includes condensed documentation and just one templated example.

# 28th August 2024, 3:33 am / claude-3-5-sonnet, deno, sqlite, anthropic, claude, typescript, ai, llms, prompt-engineering, ai-assisted-programming, javascript, generative-ai, val-town