Simon Willison’s Weblog

On python 1153 llm-tool-use 34 llm-reasoning 48 pelican-riding-a-bicycle 29 vision-llms 67 ...

 

Recent

May 20, 2025

We did the math on AI’s energy footprint. Here’s the story you haven’t heard. James O'Donnell and Casey Crownhart try to pull together a detailed account of AI energy usage for MIT Technology Review.

They quickly run into the same roadblock faced by everyone else who's tried to investigate this: the AI companies themselves remain infuriatingly opaque about their energy usage, making it impossible to produce credible, definitive numbers on any of this.

Something I find frustrating about conversations about AI energy usage is the way anything that could remotely be categorized as "AI" (a vague term at the best of the times) inevitably gets bundled together. Here's a good example from early in this piece:

In 2017, AI began to change everything. Data centers started getting built with energy-intensive hardware designed for AI, which led them to double their electricity consumption by 2023.

ChatGPT kicked off the generative AI boom in November 2022, so that six year period mostly represents growth in data centers in the pre-generative AI era.

Thanks to the lack of transparency on energy usage by the popular closed models - OpenAI, Anthropic and Gemini all refused to share useful numbers with the reporters - they turned to the Llama models to get estimates of energy usage instead. The estimated prompts like this:

  • Llama 3.1 8B - 114 joules per response - run a microwave for one-tenth of a second.
  • Llama 3.1 405B - 6,706 joules per response - run the microwave for eight seconds.
  • A 1024 x 1024 pixels image with Stable Diffusion 3 Medium - 2,282 joules per image which I'd estimate at about two and a half seconds.

Video models use a lot more energy. Experiments with CogVideoX (presumably this one) used "700 times the energy required to generate a high-quality image" for a 5 second video.

AI companies have defended these numbers saying that generative video has a smaller footprint than the film shoots and travel that go into typical video production. That claim is hard to test and doesn’t account for the surge in video generation that might follow if AI videos become cheap to produce.

I share their skepticism here. I don't think comparing a 5 second AI generated video to a full film production is a credible comparison here.

This piece generally reinforced my mental model that the cost of (most) individual prompts by individuals is fractionally small, but that the overall costs still add up to something substantial.

The lack of detailed information around this stuff is so disappointing - especially from companies like Google who have aggressive sustainability targets.

# 10:34 pm / ai-energy-usage, llms, ai, generative-ai, ai-ethics

Gemini 2.5: Our most intelligent models are getting even better. A bunch of new Gemini 2.5 announcements at Google I/O today.

2.5 Flash and 2.5 Pro are both getting audio output (previously previewed in Gemini 2.0) and 2.5 Pro is getting an enhanced reasoning mode called "Deep Think" - not yet available via the API.

Available today is the latest Gemini 2.5 Flash model, gemini-2.5-flash-preview-05-20. I added support to that in llm-gemini 0.20 (and, if you're using the LLM tool-use alpha, llm-gemini 0.20a2).

I tried it out on my personal benchmark, as seen in the Google I/O keynote!

llm -m gemini-2.5-flash-preview-05-20 'Generate an SVG of a pelican riding a bicycle'

Here's what I got from the default model, with its thinking mode enabled:

The bicycle has spokes that look like a spider web. The pelican is goofy but recognizable.

Full transcript. 11 input tokens, 2,619 output tokens, 10,391 thinking tokens = 4.5537 cents.

I ran the same thing again with -o thinking_budget 0 to turn off thinking mode entirely, and got this:

The bicycle has too many bits of frame in the wrong direction. The pelican is yellow and weirdly shaped.

Full transcript. 11 input, 1,243 output = 0.0747 cents.

The non-thinking model is priced differently - still $0.15/million for input but $0.60/million for output as opposed to $3.50/million for thinking+output. The pelican it drew was 61x cheaper!

Finally, inspired by the keynote I ran this follow-up prompt to animate the more expensive pelican:

llm --cid 01jvqjqz9aha979yemcp7a4885 'Now animate it'

This one is pretty great!

The wheels and pedals are rotating and the pelican is bobbing up and down. This would be a fantastic animated pelican if the pelican didn't kind of suck!

# 8:34 pm / llm-release, gemini, llm, google, generative-ai, pelican-riding-a-bicycle, ai, llm-reasoning, llm-pricing, google-io

Tucked into today's Google I/O keynote, a blink-and-you'll miss it moment:

The pelican in the keynote was created by Alexander Chen. Here's the code they wrote with the help of Gemini, which uses p5.js to power the animation.

# 7:24 pm / pelican-riding-a-bicycle, google-io, google

cityofaustin/atd-data-tech issues. I stumbled across this today while looking for interesting frequently updated data sources from local governments. It turns out the City of Austin's Transportation Data & Technology Services department run everything out of a public GitHub issues instance, which currently has 20,225 closed and 2,002 open issues. They also publish an exported copy of the issues data through the data.austintexas.gov open data portal.

# 6:18 pm / open-data, github, github-issues

After months of coding with LLMs, I’m going back to using my brain. Interesting vibe coding retrospective from Alberto Fortin. Alberto is an experienced software developer and decided to use Claude an Cursor to rewrite an existing system using Go and ClickHouse - two new-to-him technologies.

One morning, I decide to actually inspect closely what’s all this code that Cursor has been writing. It’s not like I was blindly prompting without looking at the end result, but I was optimizing for speed and I hadn’t actually sat down just to review the code. I was just building building building.

So I do a “coding review” session. And the horror ensues.

Two service files, in the same directory, with similar names, clearly doing a very similar thing. But the method names are different. The props are not consistent. One is called "WebAPIprovider", the other one "webApi". They represent the same exact parameter. The same method is redeclared multiple times across different files. The same config file is being called in different ways and retrieved with different methods.

No consistency, no overarching plan. It’s like I'd asked 10 junior-mid developers to work on this codebase, with no Git access, locking them in a room without seeing what the other 9 were doing.

Alberto reset to a less vibe-heavy approach and is finding it to be a much more productive way of working:

I’m defaulting to pen and paper, I’m defaulting to coding the first draft of that function on my own. [...] But I’m not asking it to write new things from scratch, to come up with ideas or to write a whole new plan. I’m writing the plan. I’m the senior dev. The LLM is the assistant.

# 3:43 pm / vibe-coding, ai-assisted-programming, generative-ai, ai, llms

May 19, 2025

Jules. It seems like everyone is rolling out AI coding assistants that attach to your GitHub account and submit PRs for you right now. We had OpenAI Codex last week, today Microsoft announced GitHub Copilot coding agent (confusingly not the same thing as Copilot Workspace) and I found out just now that Google's Jules, announced in December, is now in a beta preview.

I'm flying home from PyCon but I managed to try out Jules from my phone. I took this GitHub issue thread, converted it to copy-pasteable Markdown with this tool and pasted it into Jules, with no further instructions.

Here's the resulting PR created from its branch. I haven't fully reviewed it yet and the tests aren't passing, so it's hard to evaluate from my phone how well it did. In a cursory first glance it looks like it's covered most of the requirements from the issue thread.

My habit of creating long issue threads where I talk to myself about the features I'm planning is proving to be a good fit for outsourcing implementation work to this new generation of coding assistants.

# 9:40 pm / gemini, ai-assisted-programming, google, llms, ai, generative-ai, github, github-issues

May 18, 2025

llm-pdf-to-images. Inspired by my previous llm-video-frames plugin, I thought it would be neat to have a plugin for LLM that can take a PDF and turn that into an image-per-page so you can feed PDFs into models that support image inputs but don't yet support PDFs.

This should now do exactly that:

llm install llm-pdf-to-images
llm -f pdf-to-images:path/to/document.pdf 'Summarize this document'

Under the hood it's using the PyMuPDF library. The key code to convert a PDF into images looks like this:

import fitz
doc = fitz.open("input.pdf")
for page in doc:
    pix = page.get_pixmap(matrix=fitz.Matrix(300/72, 300/72))
    jpeg_bytes = pix.tobytes(output="jpg", jpg_quality=30)

Once I'd figured out that code I got o4-mini to write most of the rest of the plugin, using llm-fragments-github to load in the example code from the video plugin:

llm -f github:simonw/llm-video-frames '
import fitz
doc = fitz.open("input.pdf")
for page in doc:
    pix = page.get_pixmap(matrix=fitz.Matrix(300/72, 300/72))
    jpeg_bytes = pix.tobytes(output="jpg", jpg_quality=30)
' -s 'output llm_pdf_to_images.py which adds a pdf-to-images: 
 fragment loader that converts a PDF to frames using fitz like in the example' \
-m o4-mini

Here's the transcript - more details in this issue.

I had some weird results testing this with GPT 4.1 mini. I created a test PDF with two pages - one white, one black - and ran a test prompt like this:

llm -f 'pdf-to-images:blank-pages.pdf' \
  'describe these images'

The first image features a stylized red maple leaf with triangular facets, giving it a geometric appearance. The maple leaf is a well-known symbol associated with Canada.

The second image is a simple black silhouette of a cat sitting and facing to the left. The cat's tail curls around its body. The design is minimalistic and iconic.

I got even wilder hallucinations for other prompts, like "summarize this document" or "describe all figures". I have a collection of those in this Gist.

Thankfully this behavior is limited to GPT-4.1 mini. I upgraded to full GPT-4.1 and got much more sensible results:

llm -f 'pdf-to-images:blank-pages.pdf' \
  'describe these images' -m gpt-4.1

Certainly! Here are the descriptions of the two images you provided:

  1. First image: This image is completely white. It appears blank, with no discernible objects, text, or features.

  2. Second image: This image is entirely black. Like the first, it is blank and contains no visible objects, text, or distinct elements.

If you have questions or need a specific kind of analysis or modification, please let me know!

# 8:48 pm / llm, plugins, ai, llms, ai-assisted-programming, pdf, generative-ai, projects, hallucinations

qwen2.5vl in Ollama. Ollama announced a complete overhaul of their vision support the other day. Here's the first new model they've shipped since then - a packaged version of Qwen 2.5 VL which was first released on January 26th 2025. Here are my notes from that release.

I upgraded Ollama (it auto-updates so I just had to restart it from the tray icon) and ran this:

ollama pull qwen2.5vl

This downloaded a 6GB model file. I tried it out against my photo of Cleo rolling on the beach:

llm -a https://static.simonwillison.net/static/2025/cleo-sand.jpg \
  'describe this image' -m qwen2.5vl

And got a pretty good result:

The image shows a dog lying on its back on a sandy beach. The dog appears to be a medium to large breed with a dark coat, possibly black or dark brown. It is wearing a red collar or harness around its chest. The dog's legs are spread out, and its belly is exposed, suggesting it might be rolling around or playing in the sand. The sand is light-colored and appears to be dry, with some small footprints and marks visible around the dog. The lighting in the image suggests it is taken during the daytime, with the sun casting a shadow of the dog to the left side of the image. The overall scene gives a relaxed and playful impression, typical of a dog enjoying time outdoors on a beach.

Qwen 2.5 VL has a strong reputation for OCR, so I tried it on my poster:

llm -a https://static.simonwillison.net/static/2025/poster.jpg \
  'convert to markdown' -m qwen2.5vl

The result that came back:

It looks like the image you provided is a jumbled and distorted text, making it difficult to interpret. If you have a specific question or need help with a particular topic, please feel free to ask, and I'll do my best to assist you!

I'm not sure what went wrong here. My best guess is that the maximum resolution the model can handle is too small to make out the text, or maybe Ollama resized the image to the point of illegibility before handing it to the model?

Update: I think this may be a bug relating to URL handling in LLM/llm-ollama. I tried downloading the file first:

wget https://static.simonwillison.net/static/2025/poster.jpg
llm -m qwen2.5vl 'extract text' -a poster.jpg

This time it did a lot better. The results weren't perfect though - it ended up stuck in a loop outputting the same code example dozens of times.

I tried with a different prompt - "extract text" - and it got confused by the three column layout, misread Datasette as "Datasetette" and missed some of the text. Here's that result.

These experiments used qwen2.5vl:7b (6GB) - I expect the results would be better with the larger qwen2.5vl:32b (21GB) and qwen2.5vl:72b (71GB) models.

Fred Jonsson reported a better result using the MLX model via LM studio (~9GB model running in 8bit - I think that's mlx-community/Qwen2.5-VL-7B-Instruct-8bit). His full output is here - looks almost exactly right to me.

# 12:31 pm / vision-llms, llm, ollama, generative-ai, ai, qwen, llms, mlx, ocr

2025 Python Packaging Ecosystem Survey. If you make use of Python packaging tools (pip, Anaconda, uv, dozens of others) and have opinions please spend a few minutes with this year's packaging survey. This one was "Co-authored by 30+ of your favorite Python Ecosystem projects, organizations and companies."

# 11:50 am / surveys, packaging, pip, python, psf

Speaking of the effects of technology on individuals and society as a whole, Marshall McLuhan wrote that every augmentation is also an amputation. [...] Today, quite suddenly, billions of people have access to AI systems that provide augmentations, and inflict amputations, far more substantial than anything McLuhan could have imagined. This is the main thing I worry about currently as far as AI is concerned. I follow conversations among professional educators who all report the same phenomenon, which is that their students use ChatGPT for everything, and in consequence learn nothing. We may end up with at least one generation of people who are like the Eloi in H.G. Wells’s The Time Machine, in that they are mental weaklings utterly dependent on technologies that they don’t understand and that they could never rebuild from scratch were they to break down.

Neal Stephenson, Remarks on AI from NZ

# 9:09 am / ai-ethics, neal-stephenson, chatgpt, education, ai, llms

May 17, 2025

In addition to my workshop the other day I'm also participating in the poster session at PyCon US this year.

This means that tomorrow (Sunday 18th May) I'll be hanging out next to my poster from 10am to 1pm in Hall A talking to people about my various projects.

I'll confess: I didn't pay close enough attention to the poster information, so when I first put my poster up it looked a little small:

My Datasette poster on a huge black poster board. It looks a bit lonely in the middle surrounded by empty space.

... so I headed to the nearest CVS and printed out some photos to better represent my interests and personality. I'm going for a "teenage bedroom" aesthetic here, I'm very happy with the result:

My Datasette poster is now surrounded by nearly 100 photos - mostly of pelicans, SVGs of pelicans and niche museums I've been to.

Here's the poster in the middle (also available as a PDF). It has columns for Datasette, sqlite-utils and LLM.

Datasette: An ecosystem of tools for finding stories in data. Three projects: Datasette is a tool for exploring and publishing data. It helps data journalists (and everyone else) take data of any shape, analyze and explore it, and publish it as an interactive website and accompanying API. There's a screenshot of the table interface against a legislators table. Datasette has over 180 plugins adding features for visualizing, editing and transforming data. datasette-cluster-map, datasette-graphql, datasette-publish-cloudrun, datasette-comments, datasette-query-assistant, datasette-extract. datasette.io. sqlite-utils is a Python library and CLI tool for manipulating SQLite databases. It aims to make the gap from “I have data” to “that data is in SQLite” as small as possible. There's a code example showing inserting three chickens into a database and configuring full-text search. And in the terminal: sqlite-utils transform places.db roadside_attractions  --rename pk id  --default name Untitled  --drop address.  sqlite-utils.datasette.io. LLM is a Python library and CLI tool for interacting with Large Language Models. It provides a plugin-based abstraction over hundreds of different models, both local and hosted, and logs every interaction with them to SQLite. LLMs are proficient at SQL and extremely good at extracting structured data from unstructured text, images and documents. LLM’s asyncio Python library powers several Datasette plugins, including datasette-query-assistant, datasette-enrichments and datasette-extract. llm.datasette.io

If you're at PyCon I'd love to talk to you about things I'm working on!

Update: Thanks to everyone who came along. Here's a 6MB photo of the poster setup. The museums were all from my www.niche-museums.com site and the pelicans riding a bicycle SVGs came from my pelican-riding-a-bicycle tag.

# 8:34 pm / pycon, llm, datasette, sqlite-utils, pelican-riding-a-bicycle, museums

django-simple-deploy. Eric Matthes presented a lightning talk about this project at PyCon US this morning. "Django has a deploy command now". You can run it like this:

pip install django-simple-deploy[fly_io]
# Add django_simple_deploy to INSTALLED_APPS.
python manage.py deploy --automate-all

It's plugin-based (inspired by Datasette!) and the project has stable plugins for three hosting platforms: dsd-flyio, dsd-heroku and dsd-platformsh.

Currently in development: dsd-vps - a plugin that should work with any VPS provider, using Paramiko to connect to a newly created instance and run all of the commands needed to start serving a Django application.

# 12:49 pm / fly, heroku, datasette, plugins, django, paramiko, python

May 16, 2025

OpenAI Codex. Announced today, here's the documentation for OpenAI's "cloud-based software engineering agent". It's not yet available for us $20/month Plus customers ("coming soon") but if you're a $200/month Pro user you can try it out now.

At a high level, you specify a prompt, and the agent goes to work in its own environment. After about 8–10 minutes, the agent gives you back a diff.

You can execute prompts in either ask mode or code mode. When you select ask, Codex clones a read-only version of your repo, booting faster and giving you follow-up tasks. Code mode, however, creates a full-fledged environment that the agent can run and test against.

This 4 minute demo video is a useful overview. One note that caught my eye is that the setup phase for an environment can pull from the internet (to install necessary dependencies) but the agent loop itself still runs in a network disconnected sandbox.

It sounds similar to GitHub's own Copilot Workspace project, which can compose PRs against your code based on a prompt. The big difference is that Codex incorporates a full Code Interpeter style environment, allowing it to build and run the code it's creating and execute tests in a loop.

Copilot Workspaces has a level of integration with Codespaces but still requires manual intervention to help exercise the code.

Also similar to Copilot Workspaces is a confusing name. OpenAI now have four products called Codex:

  • OpenAI Codex, announced today.
  • Codex CLI, a completely different coding assistant tool they released a few weeks ago that is the same kind of shape as Claude Code. This one owns the openai/codex namespace on GitHub.
  • codex-mini, a brand new model released today that is used by their Codex product. It's a fine-tuned o4-mini variant. I released llm-openai-plugin 0.4 adding support for that model.
  • OpenAI Codex (2021) - Internet Archive link, OpenAI's first specialist coding model from the GPT-3 era. This was used by the original GitHub Copilot and is still the current topic of Wikipedia's OpenAI Codex page.

My favorite thing about this most recent Codex product is that OpenAI shared the full Dockerfile for the environment that the system uses to run code - in openai/codex-universal on GitHub because openai/codex was taken already.

This is extremely useful documentation for figuring out how to use this thing - I'm glad they're making this as transparent as possible.

And to be fair, If you ignore it previous history Codex Is a good name for this product. I'm just glad they didn't call it Ada.

# 7:12 pm / ai-assisted-programming, generative-ai, ai-agents, openai, ai, github, llms, llm-release, llm

Today I learned - from a very short "we're sponsoring Python" sponsor blurb by Meta during the opening PyCon US welcome talks - that Python is now "the most-used language at Meta" - if you consider all of the different functional areas spread across the company.

They also have "over 3,000 Python developers working in the language every day".

Conference presentation at PyCon US 2025 showing speaker on stage in blue shirt with large screens displaying his image and slide text: "have over 3,000 Python developers working in the language every day, which is -- I mean, there's probably more people here. Looking at you all. They're in different functional areas spread across the country. But if you look at folks making changes, Python is the most-used language at Meta. Our motivation to continue investing in Python is to support development at scale. We look forward to building solutions"

The live captions for the event are once again provided by the excellent White Coat Captioning - real human beings! This got a cheer when it was pointed out by the conference chair a few moments earlier.

# 1:58 pm / pycon, python, meta

soon we have another low-key research preview to share with you all

we will name it better than chatgpt this time in case it takes off

Sam Altman

# 1:46 am / openai, chatgpt, sam-altman

May 15, 2025

Annotated Presentation Creator. I've released a new version of my tool for creating annotated presentations. I use this to turn slides from my talks into posts like this one - here are a bunch more examples.

I wrote the first version in August 2023 making extensive use of ChatGPT and GPT-4. That older version can still be seen here.

This new edition is a design refresh using Claude 3.7 Sonnet (thinking). I ran this command:

llm \
  -f https://til.simonwillison.net/tools/annotated-presentations \
  -s 'Improve this tool by making it respnonsive for mobile, improving the styling' \
  -m claude-3.7-sonnet -o thinking 1

That uses -f to fetch the original HTML (which has embedded CSS and JavaScript in a single page, convenient for working with LLMs) as a prompt fragment, then applies the system prompt instructions "Improve this tool by making it respnonsive for mobile, improving the styling" (typo included).

Here's the full transcript (generated using llm logs -cue) and a diff illustrating the changes. Total cost 10.7781 cents.

There was one visual glitch: the slides were distorted like this:

The slide is distorted by being too high for its width

I decided to try o4-mini to see if it could spot the problem (after fixing this LLM bug):

llm o4-mini \
  -a bug.png \
  -f https://tools.simonwillison.net/annotated-presentations \
  -s 'Suggest a minimal fix for this distorted image'

It suggested adding align-items: flex-start; to my .bundle class (it quoted the @media (min-width: 768px) bit but the solution was to add it to .bundle at the top level), which fixed the bug.

Screenshot of an "Annotated Presentation Creator" web application. The interface shows: "Annotated Presentation Creator" header, "Create beautiful annotated slides for your presentations. See How I make annotated presentations for instructions." Below is an upload area with buttons "Choose Images", "Load Images", "Restore 64 saved items", and "OCR Missing Alt Text". The main area displays a presentation slide with "Building software on top of Large Language Models" by "Simon Willison - PyCon US 2025" dated "15th May 2025", alongside an alt text input field and annotation section containing "The full handout for the workshop parts of this talk can be found at building-with-llms-pycon-2025.readthedocs.io."

# 2:41 pm / claude, openai, annotated-talks, ai, llms, vibe-coding, tools, generative-ai, css, ai-assisted-programming

By popular request, GPT-4.1 will be available directly in ChatGPT starting today.

GPT-4.1 is a specialized model that excels at coding tasks & instruction following. Because it’s faster, it’s a great alternative to OpenAI o3 & o4-mini for everyday coding needs.

OpenAI on Twitter

# 12:30 pm / generative-ai, openai, chatgpt, ai, llms

Building software on top of Large Language Models

Visit Building software on top of Large Language Models

I presented a three hour workshop at PyCon US yesterday titled Building software on top of Large Language Models. The goal of the workshop was to give participants everything they needed to get started writing code that makes use of LLMs.

[... 3,728 words]

May 14, 2025

I designed Dropbox's storage system and modeled its durability. Durability numbers (11 9's etc) are meaningless because competent providers don't lose data because of disk failures, they lose data because of bugs and operator error. [...]

The best thing you can do for your own durability is to choose a competent provider and then ensure you don't accidentally delete or corrupt own data on it:

  1. Ideally never mutate an object in S3, add a new version instead.
  2. Never live-delete any data. Mark it for deletion and then use a lifecycle policy to clean it up after a week.

This way you have time to react to a bug in your own stack.

James Cowling

# 3:49 am / s3, ops, software-architecture

LLM 0.26a0 adds support for tools! It's only an alpha so I'm not going to promote this extensively yet, but my LLM project just grew a feature I've been working towards for nearly two years now: tool support!

I'm presenting a workshop about Building software on top of Large Language Models at PyCon US tomorrow and this was the one feature I really needed to pull everything else together.

Tools can be used from the command-line like this (inspired by sqlite-utils --functions):

llm --functions '
def multiply(x: int, y: int) -> int:
    """Multiply two numbers."""
    return x * y
' 'what is 34234 * 213345' -m o4-mini

You can add --tools-debug (shortcut: --td) to have it show exactly what tools are being executed and what came back. More documentation here.

It's also available in the Python library:

import llm

def multiply(x: int, y: int) -> int:
    """Multiply two numbers."""
    return x * y

model = llm.get_model("gpt-4.1-mini")
response = model.chain(
    "What is 34234 * 213345?",
    tools=[multiply]
)
print(response.text())

There's also a new plugin hook so plugins can register tools that can then be referenced by name using llm --tool name_of_tool "prompt".

There's still a bunch I want to do before including this in a stable release, most notably adding support for Python asyncio. It's a pretty exciting start though!

llm-anthropic 0.16a0 and llm-gemini 0.20a0 add tool support for Anthropic and Gemini models, depending on the new LLM alpha.

Update: Here's the section about tools from my PyCon workshop.

# 2 am / llm, generative-ai, projects, llm-tool-use, ai, llms, openai, gemini, anthropic

May 13, 2025

Building, launching, and scaling ChatGPT Images (via) Gergely Orosz landed a fantastic deep dive interview with OpenAI's Sulman Choudhry (head of engineering, ChatGPT) and Srinivas Narayanan (VP of engineering, OpenAI) to talk about the launch back in March of ChatGPT images - their new image generation mode built on top of multi-modal GPT-4o.

The feature kept on having new viral spikes, including one that added one million new users in a single hour. They signed up 100 million new users in the first week after the feature's launch.

When this vertical growth spike started, most of our engineering teams didn't believe it. They assumed there must be something wrong with the metrics.

Under the hood the infrastructure is mostly Python and FastAPI! I hope they're sponsoring those projects (and Starlette, which is used by FastAPI under the hood.)

They're also using some C, and Temporal as a workflow engine. They addressed the early scaling challenge by adding an asynchronous queue to defer the load for their free users (resulting in longer generation times) at peak demand.

There are plenty more details tucked away behind the firewall, including an exclusive I've not been able to find anywhere else: OpenAI's core engineering principles.

  • Ship relentlessly - move quickly and continuously improve, without waiting for perfect conditions
  • Own the outcome - take full responsibility for products, end-to-end
  • Follow through - finish what is started and ensure the work lands fully

I tried getting o4-mini-high to track down a copy of those principles online and was delighted to see it either leak or hallucinate the URL to OpenAI's internal engineering handbook!

Thought for 1m 48s:  Here are the two places where OpenAl's “Ship  relentlessly — Own the outcome — Follow through”  checklist shows up as its core engineering  principles:  1. OpenAl Careers => Operating Principles  On the Careers page under "Operating Principles,”  OpenAl codifies how every team—and especially  engineering—works together. While the page itself  doesn’t list the bullets verbatim, it's the umbrella  under which these three behavioural norms sit (Ship  relentlessly, Own the outcome, Follow through).  2. OpenAl Internal Engineering Handbook  These three tenets are drawn straight from OpenAl's  internal Engineering Handbook (employee-only), at: https://handbook.openai.com/engineering/principles  (This handbook page isn't publicly crawlable, but it's  the definitive source for how OpenAl's engineers  actually organize their day-to-day work.)

Gergely has a whole series of posts like this called Real World Engineering Challenges, including another one on ChatGPT a year ago.

# 11:52 pm / chatgpt, generative-ai, gergely-orosz, openai, scaling, ai, llms, python

Atlassian: “We’re Not Going to Charge Most Customers Extra for AI Anymore”. The Beginning of the End of the AI Upsell? (via) Jason Lemkin highlighting a potential new trend in the pricing of AI-enhanced SaaS:

Can SaaS and B2B vendors really charge even more for AI … when it’s become core? And we’re already paying $15-$200 a month for a seat? [...]

You can try to charge more, but if the competition isn’t — you’re going to likely lose. And if it’s core to the product itself … can you really charge more ultimately? Probably … not.

It's impressive how quickly LLM-powered features are going from being part of the top tier premium plans to almost an expected part of most per-seat software.

# 3:52 pm / startups, generative-ai, saas, ai, llms, atlassian

Vision Language Models (Better, Faster, Stronger) (via) Extremely useful review of the last year in vision and multi-modal LLMs.

So much has happened! I'm particularly excited about the range of small open weight vision models that are now available. Models like gemma3-4b-it and Qwen2.5-VL-3B-Instruct produce very impressive results and run happily on mid-range consumer hardware.

# 3:25 pm / vision-llms, hugging-face, generative-ai, ai, local-llms, llms

I did find one area where LLMs absolutely excel, and I’d never want to be without them:

AIs can find your syntax error 100x faster than you can.

They’ve been a useful tool in multiple areas, to my surprise. But this is the one space where they’ve been an honestly huge help: I know I’ve made a mistake somewhere and I just can’t track it down. I can spend ten minutes staring at my files and pulling my hair out, or get an answer back in thirty seconds.

There are whole categories of coding problems that look like this, and LLMs are damn good at nearly all of them. [...]

Luke Kanies, AI Is Like a Crappy Consultant

# 1:13 pm / ai-assisted-programming, llms, ai, generative-ai

May 12, 2025

Contributions must not include content generated by large language models or other probabilistic tools, including but not limited to Copilot or ChatGPT. This policy covers code, documentation, pull requests, issues, comments, and any other contributions to the Servo project. [...]

Our rationale is as follows:

Maintainer burden: Reviewers depend on contributors to write and test their code before submitting it. We have found that these tools make it easy to generate large amounts of plausible-looking code that the contributor does not understand, is often untested, and does not function properly. This is a drain on the (already limited) time and energy of our reviewers.

Correctness and security: Even when code generated by AI tools does seem to function, there is no guarantee that it is correct, and no indication of what security implications it may have. A web browser engine is built to run in hostile execution environments, so all code must take into account potential security issues. Contributors play a large role in considering these issues when creating contributions, something that we cannot trust an AI tool to do.

Copyright issues: [...] Ethical issues:: [...] These are harms that we do not want to perpetuate, even if only indirectly.

Contributing to Servo, section on AI contributions

# 10:14 pm / ai-ethics, browsers, servo, ai-assisted-programming, generative-ai, ai, llms

It's interesting how much my perception of o3 as being the latest, best model released by OpenAI is tarnished by the co-release of o4-mini. I'm also still not entirely sure how to compare o3 to o1-pro, especially given o1-pro is 15x more expensive via the OpenAI API.

# 6:38 pm / o1, llm-reasoning, generative-ai, openai, o3, ai, llms

May 11, 2025

Cursor: Security (via) Cursor's security documentation page includes a surprising amount of detail about how the Cursor text editor's backend systems work.

I've recently learned that checking an organization's list of documented subprocessors is a great way to get a feel for how everything works under the hood - it's a loose "view source" for their infrastructure! That was how I confirmed that Anthropic's search features used Brave search back in March.

Cursor's list includes AWS, Azure and GCP (AWS for primary infrastructure, Azure and GCP for "some secondary infrastructure"). They host their own custom models on Fireworks and make API calls out to OpenAI, Anthropic, Gemini and xAI depending on user preferences. They're using turbopuffer as a hosted vector store.

The most interesting section is about codebase indexing:

Cursor allows you to semantically index your codebase, which allows it to answer questions with the context of all of your code as well as write better code by referencing existing implementations. […]

At our server, we chunk and embed the files, and store the embeddings in Turbopuffer. To allow filtering vector search results by file path, we store with every vector an obfuscated relative file path, as well as the line range the chunk corresponds to. We also store the embedding in a cache in AWS, indexed by the hash of the chunk, to ensure that indexing the same codebase a second time is much faster (which is particularly useful for teams).

At inference time, we compute an embedding, let Turbopuffer do the nearest neighbor search, send back the obfuscated file path and line range to the client, and read those file chunks on the client locally. We then send those chunks back up to the server to answer the user’s question.

When operating in privacy mode - which they say is enabled by 50% of their users - they are careful not to store any raw code on their servers for longer than the duration of a single request. This is why they store the embeddings and obfuscated file paths but not the code itself.

Reading this made me instantly think of the paper Text Embeddings Reveal (Almost) As Much As Text about how vector embeddings can be reversed. The security documentation touches on that in the notes:

Embedding reversal: academic work has shown that reversing embeddings is possible in some cases. Current attacks rely on having access to the model and embedding short strings into big vectors, which makes us believe that the attack would be somewhat difficult to do here. That said, it is definitely possible for an adversary who breaks into our vector database to learn things about the indexed codebases.

# 7:15 pm / ai-assisted-programming, security, generative-ai, ai, embeddings, llms

Achievement unlocked: tap danced in the local community college dance recital.

# 4:17 am / dance

May 10, 2025

Poker Face season two just started on Peacock (the US streaming service). It's my favorite thing on TV right now. I've started threads on MetaFilter FanFare for episodes one, two and three.

# 10:34 pm / tv, metafilter

Trying out llama.cpp’s new vision support

Visit Trying out llama.cpp's new vision support

This llama.cpp server vision support via libmtmd pull request—via Hacker News—was merged earlier today. The PR finally adds full support for vision models to the excellent llama.cpp project. It’s documented on this page, but the more detailed technical details are covered here. Here are my notes on getting it working on a Mac.

[... 1,693 words]

Highlights