Example dashboard

Various statistics from my blog.

Owned by simonw, visibility: Public

Entries

3167

SQL query
select 'Entries' as label, count(*) as big_number from blog_entry

Blogmarks

7816

SQL query
select 'Blogmarks' as label, count(*) as big_number from blog_blogmark

Quotations

1150

SQL query
select 'Quotations' as label, count(*) as big_number from blog_quotation

Chart of number of entries per month over time

SQL query
select '<h2>Chart of number of entries per month over time</h2>' as html
SQL query
select to_char(date_trunc('month', created), 'YYYY-MM') as bar_label,
count(*) as bar_quantity from blog_entry group by bar_label order by count(*) desc

Ten most recent blogmarks (of 7816 total)

SQL query
select '## Ten most recent blogmarks (of ' || count(*) || ' total)' as markdown from blog_blogmark
SQL query
select link_title, link_url, commentary, created from blog_blogmark order by created desc limit 10

10 rows

link_title link_url commentary created
MCP Run Python https://github.com/pydantic/pydantic-ai/tree/main/mcp-run-python Pydantic AI's MCP server for running LLM-generated Python code in a sandbox. They ended up using a trick I explored [two years ago](https://til.simonwillison.net/deno/pyodide-sandbox): using a [Deno](https://deno.com/) process to run [Pyodide](https://pyodide.org/) in a WebAssembly sandbox. Here's a bit of a wild trick: since Deno loads code on-demand from [JSR](https://jsr.io/), and [uv run](https://docs.astral.sh/uv/guides/scripts/) can install Python dependencies on demand via the `--with` option... here's a one-liner you can paste into a macOS shell (provided you have Deno and `uv` installed already) which will run the example from [their README](https://github.com/pydantic/pydantic-ai/blob/v0.1.2/mcp-run-python/README.md) - calculating the number of days between two dates in the most complex way imaginable: <pre>ANTHROPIC_API_KEY=<span class="pl-s"><span class="pl-pds">"</span>sk-ant-...<span class="pl-pds">"</span></span> \ uv run --with pydantic-ai python -c <span class="pl-s"><span class="pl-pds">'</span></span> <span class="pl-s">import asyncio</span> <span class="pl-s">from pydantic_ai import Agent</span> <span class="pl-s">from pydantic_ai.mcp import MCPServerStdio</span> <span class="pl-s"></span> <span class="pl-s">server = MCPServerStdio(</span> <span class="pl-s"> "deno",</span> <span class="pl-s"> args=[</span> <span class="pl-s"> "run",</span> <span class="pl-s"> "-N",</span> <span class="pl-s"> "-R=node_modules",</span> <span class="pl-s"> "-W=node_modules",</span> <span class="pl-s"> "--node-modules-dir=auto",</span> <span class="pl-s"> "jsr:@pydantic/mcp-run-python",</span> <span class="pl-s"> "stdio",</span> <span class="pl-s"> ],</span> <span class="pl-s">)</span> <span class="pl-s">agent = Agent("claude-3-5-haiku-latest", mcp_servers=[server])</span> <span class="pl-s"></span> <span class="pl-s">async def main():</span> <span class="pl-s"> async with agent.run_mcp_servers():</span> <span class="pl-s"> result = await agent.run("How many days between 2000-01-01 and 2025-03-18?")</span> <span class="pl-s"> print(result.output)</span> <span class="pl-s"></span> <span class="pl-s">asyncio.run(main())<span class="pl-pds">'</span></span></pre> I ran that just now and got: > The number of days between January 1st, 2000 and March 18th, 2025 is 9,208 days. I thoroughly enjoy how tools like `uv` and Deno enable throwing together shell one-liner demos like this one. Here's [an extended version](https://gist.github.com/simonw/54fc42ef9a7fb8f777162bbbfbba4f23) of this example which adds pretty-printed logging of the messages exchanged with the LLM to illustrate exactly what happened. The most important piece is this tool call where Claude 3.5 Haiku asks for Python code to be executed my the MCP server: <pre><span class="pl-en">ToolCallPart</span>( <span class="pl-s1">tool_name</span><span class="pl-c1">=</span><span class="pl-s">'run_python_code'</span>, <span class="pl-s1">args</span><span class="pl-c1">=</span>{ <span class="pl-s">'python_code'</span>: ( <span class="pl-s">'from datetime import date<span class="pl-cce">\n</span>'</span> <span class="pl-s">'<span class="pl-cce">\n</span>'</span> <span class="pl-s">'date1 = date(2000, 1, 1)<span class="pl-cce">\n</span>'</span> <span class="pl-s">'date2 = date(2025, 3, 18)<span class="pl-cce">\n</span>'</span> <span class="pl-s">'<span class="pl-cce">\n</span>'</span> <span class="pl-s">'days_between = (date2 - date1).days<span class="pl-cce">\n</span>'</span> <span class="pl-s">'print(f"Number of days between {date1} and {date2}: {days_between}")'</span> ), }, <span class="pl-s1">tool_call_id</span><span class="pl-c1">=</span><span class="pl-s">'toolu_01TXXnQ5mC4ry42DrM1jPaza'</span>, <span class="pl-s1">part_kind</span><span class="pl-c1">=</span><span class="pl-s">'tool-call'</span>, )</pre> I also managed to run it against [Mistral Small 3.1](https://ollama.com/library/mistral-small3.1) (15GB) running locally using [Ollama](https://ollama.com/) (I had to add "Use your python tool" to the prompt to get it to work): <pre>ollama pull mistral-small3.1:24b uv run --with devtools --with pydantic-ai python -c <span class="pl-s"><span class="pl-pds">'</span></span> <span class="pl-s">import asyncio</span> <span class="pl-s">from devtools import pprint</span> <span class="pl-s">from pydantic_ai import Agent, capture_run_messages</span> <span class="pl-s">from pydantic_ai.models.openai import OpenAIModel</span> <span class="pl-s">from pydantic_ai.providers.openai import OpenAIProvider</span> <span class="pl-s">from pydantic_ai.mcp import MCPServerStdio</span> <span class="pl-s"></span> <span class="pl-s">server = MCPServerStdio(</span> <span class="pl-s"> "deno",</span> <span class="pl-s"> args=[</span> <span class="pl-s"> "run",</span> <span class="pl-s"> "-N",</span> <span class="pl-s"> "-R=node_modules",</span> <span class="pl-s"> "-W=node_modules",</span> <span class="pl-s"> "--node-modules-dir=auto",</span> <span class="pl-s"> "jsr:@pydantic/mcp-run-python",</span> <span class="pl-s"> "stdio",</span> <span class="pl-s"> ],</span> <span class="pl-s">)</span> <span class="pl-s"></span> <span class="pl-s">agent = Agent( </span> <span class="pl-s"> OpenAIModel( </span> <span class="pl-s"> model_name="mistral-small3.1:latest",</span> <span class="pl-s"> provider=OpenAIProvider(base_url="http://localhost:11434/v1"), </span> <span class="pl-s"> ), </span> <span class="pl-s"> mcp_servers=[server],</span> <span class="pl-s">)</span> <span class="pl-s"></span> <span class="pl-s">async def main():</span> <span class="pl-s"> with capture_run_messages() as messages:</span> <span class="pl-s"> async with agent.run_mcp_servers():</span> <span class="pl-s"> result = await agent.run("How many days between 2000-01-01 and 2025-03-18? Use your python tool.")</span> <span class="pl-s"> pprint(messages)</span> <span class="pl-s"> print(result.output)</span> <span class="pl-s"></span> <span class="pl-s">asyncio.run(main())<span class="pl-pds">'</span></span></pre> Here's [the full output](https://gist.github.com/simonw/e444a81440bda2f37b0fef205780074a) including the debug logs. 2025-04-18 04:51:20+00:00
Start building with Gemini 2.5 Flash https://developers.googleblog.com/en/start-building-with-gemini-25-flash/ Google Gemini's latest model is Gemini 2.5 Flash, available in (paid) preview as `gemini-2.5-flash-preview-04-17`. > Building upon the popular foundation of 2.0 Flash, this new version delivers a major upgrade in reasoning capabilities, while still prioritizing speed and cost. Gemini 2.5 Flash is our first fully hybrid reasoning model, giving developers the ability to turn thinking on or off. The model also allows developers to set thinking budgets to find the right tradeoff between quality, cost, and latency. Gemini AI Studio product lead Logan Kilpatrick [says](https://twitter.com/OfficialLoganK/status/1912966500794654855): > This is an early version of 2.5 Flash, but it already shows huge gains over 2.0 Flash. > > You can fully turn off thinking if needed and use this model as a drop in replacement for 2.0 Flash. I added support to the new model in [llm-gemini 0.18](https://github.com/simonw/llm-gemini/releases/tag/0.18). Here's how to try it out: llm install -U llm-gemini llm -m gemini-2.5-flash-preview-04-17 'Generate an SVG of a pelican riding a bicycle' Here's that first pelican, using the default setting where Gemini Flash 2.5 makes its own decision in terms of how much "thinking" effort to apply: ![Described below](https://static.simonwillison.net/static/2025/gemini-2.5-flash-default.jpg) Here's [the transcript](https://gist.github.com/simonw/afce6639ed10c712a0778fc779efd756). This one used 11 input tokens and 4266 output tokens of which 2702 were "thinking" tokens. I asked the model to "`describe`" that image and it could tell it was meant to be a pelican: > A simple illustration on a white background shows a stylized pelican riding a bicycle. The pelican is predominantly grey with a black eye and a prominent pink beak pouch. It is positioned on a black line-drawn bicycle with two wheels, a frame, handlebars, and pedals. The way the model is priced is a little complicated. If you have thinking enabled, you get charged $0.15/million tokens for input and $3.50/million for output. With thinking disabled those output tokens drop to $0.60/million. I've added these [to my pricing calculator](https://tools.simonwillison.net/llm-prices). For comparison, Gemini 2.0 Flash is $0.10/million input and $0.40/million for output. So my first prompt - 11 input and 4266 output(with thinking enabled), cost 1.4933 cents. Let's try 2.5 Flash again with thinking disabled: llm -m gemini-2.5-flash-preview-04-17 'Generate an SVG of a pelican riding a bicycle' -o thinking_budget 0 ![Described below, again](https://static.simonwillison.net/static/2025/gemini-2.5-flash-thinking-0.jpg) 11 input, 1705 output. That's 0.1025 cents. [Transcript here](https://gist.github.com/simonw/182679e918ab5263f98f6a65691874d1) - it still shows 25 thinking tokens even though I set the thinking budget to 0 - Logan [confirms](https://twitter.com/OfficialLoganK/status/1912986097765789782) that this will still be billed at the lower rate: > In some rare cases, the model still thinks a little even with thinking budget = 0, we are hoping to fix this before we make this model stable and you won't be billed for thinking. The thinking budget = 0 is what triggers the billing switch. Here's Gemini 2.5 Flash's self-description of that image: > A minimalist illustration shows a bright yellow bird riding a bicycle. The bird has a simple round body, small wings, a black eye, and an open orange beak. It sits atop a simple black bicycle frame with two large circular black wheels. The bicycle also has black handlebars and black and yellow pedals. The scene is set against a solid light blue background with a thick green stripe along the bottom, suggesting grass or ground. And finally, let's ramp the thinking budget up to the maximum: llm -m gemini-2.5-flash-preview-04-17 'Generate an SVG of a pelican riding a bicycle' -o thinking_budget 24576 ![Described below](https://static.simonwillison.net/static/2025/gemini-2.5-flash-thinking-max.jpg) I think it over-thought this one. [Transcript](https://gist.github.com/simonw/6a83bd7ad114ee23c460f5246b854247) - 5174 output tokens of which 3023 were thinking. A hefty 1.8111 cents! > A simple, cartoon-style drawing shows a bird-like figure riding a bicycle. The figure has a round gray head with a black eye and a large, flat orange beak with a yellow stripe on top. Its body is represented by a curved light gray shape extending from the head to a smaller gray shape representing the torso or rear. It has simple orange stick legs with round feet or connections at the pedals. The figure is bent forward over the handlebars in a cycling position. The bicycle is drawn with thick black outlines and has two large wheels, a frame, and pedals connected to the orange legs. The background is plain white, with a dark gray line at the bottom representing the ground. One thing I really appreciate about Gemini 2.5 Flash's approach to SVGs is that it shows very good taste in CSS, comments and general SVG class structure. Here's a truncated extract - I run a lot of these SVG tests against different models and this one has a coding style that I particularly enjoy. (Gemini 2.5 Pro [does this too](https://gist.github.com/simonw/c34f7f0c94afcbeab77e170511f6f51f)) <pre>&lt;<span class="pl-ent">svg</span> <span class="pl-e">width</span>=<span class="pl-s"><span class="pl-pds">"</span>800<span class="pl-pds">"</span></span> <span class="pl-e">height</span>=<span class="pl-s"><span class="pl-pds">"</span>500<span class="pl-pds">"</span></span> <span class="pl-e">viewBox</span>=<span class="pl-s"><span class="pl-pds">"</span>0 0 800 500<span class="pl-pds">"</span></span> <span class="pl-e">xmlns</span>=<span class="pl-s"><span class="pl-pds">"</span>http://www.w3.org/2000/svg<span class="pl-pds">"</span></span>&gt; &lt;<span class="pl-ent">style</span>&gt;<span class="pl-s1"></span> <span class="pl-s1"> <span class="pl-e">.bike-frame</span> { <span class="pl-c1"><span class="pl-c1">fill</span></span>: <span class="pl-c1">none</span>; <span class="pl-c1"><span class="pl-c1">stroke</span></span>: <span class="pl-c1">#333</span>; <span class="pl-c1"><span class="pl-c1">stroke-width</span></span>: <span class="pl-c1">8</span>; <span class="pl-c1"><span class="pl-c1">stroke-linecap</span></span>: <span class="pl-c1">round</span>; <span class="pl-c1"><span class="pl-c1">stroke-linejoin</span></span>: <span class="pl-c1">round</span>; }</span> <span class="pl-s1"> <span class="pl-e">.wheel-rim</span> { <span class="pl-c1"><span class="pl-c1">fill</span></span>: <span class="pl-c1">none</span>; <span class="pl-c1"><span class="pl-c1">stroke</span></span>: <span class="pl-c1">#333</span>; <span class="pl-c1"><span class="pl-c1">stroke-width</span></span>: <span class="pl-c1">8</span>; }</span> <span class="pl-s1"> <span class="pl-e">.wheel-hub</span> { <span class="pl-c1"><span class="pl-c1">fill</span></span>: <span class="pl-c1">#333</span>; }</span> <span class="pl-s1"> <span class="pl-c"><span class="pl-c">/*</span> ... <span class="pl-c">*/</span></span></span> <span class="pl-s1"> <span class="pl-e">.pelican-body</span> { <span class="pl-c1"><span class="pl-c1">fill</span></span>: <span class="pl-c1">#d3d3d3</span>; <span class="pl-c1"><span class="pl-c1">stroke</span></span>: <span class="pl-c1">black</span>; <span class="pl-c1"><span class="pl-c1">stroke-width</span></span>: <span class="pl-c1">3</span>; }</span> <span class="pl-s1"> <span class="pl-e">.pelican-head</span> { <span class="pl-c1"><span class="pl-c1">fill</span></span>: <span class="pl-c1">#d3d3d3</span>; <span class="pl-c1"><span class="pl-c1">stroke</span></span>: <span class="pl-c1">black</span>; <span class="pl-c1"><span class="pl-c1">stroke-width</span></span>: <span class="pl-c1">3</span>; }</span> <span class="pl-s1"> <span class="pl-c"><span class="pl-c">/*</span> ... <span class="pl-c">*/</span></span></span> <span class="pl-s1"></span> &lt;/<span class="pl-ent">style</span>&gt; <span class="pl-c"><span class="pl-c">&lt;!--</span> Ground Line <span class="pl-c">--&gt;</span></span> &lt;<span class="pl-ent">line</span> <span class="pl-e">x1</span>=<span class="pl-s"><span class="pl-pds">"</span>0<span class="pl-pds">"</span></span> <span class="pl-e">y1</span>=<span class="pl-s"><span class="pl-pds">"</span>480<span class="pl-pds">"</span></span> <span class="pl-e">x2</span>=<span class="pl-s"><span class="pl-pds">"</span>800<span class="pl-pds">"</span></span> <span class="pl-e">y2</span>=<span class="pl-s"><span class="pl-pds">"</span>480<span class="pl-pds">"</span></span> <span class="pl-e">stroke</span>=<span class="pl-s"><span class="pl-pds">"</span>#555<span class="pl-pds">"</span></span> <span class="pl-e">stroke-width</span>=<span class="pl-s"><span class="pl-pds">"</span>5<span class="pl-pds">"</span></span>/&gt; <span class="pl-c"><span class="pl-c">&lt;!--</span> Bicycle <span class="pl-c">--&gt;</span></span> &lt;<span class="pl-ent">g</span> <span class="pl-e">id</span>=<span class="pl-s"><span class="pl-pds">"</span>bicycle<span class="pl-pds">"</span></span>&gt; <span class="pl-c"><span class="pl-c">&lt;!--</span> Wheels <span class="pl-c">--&gt;</span></span> &lt;<span class="pl-ent">circle</span> <span class="pl-e">class</span>=<span class="pl-s"><span class="pl-pds">"</span>wheel-rim<span class="pl-pds">"</span></span> <span class="pl-e">cx</span>=<span class="pl-s"><span class="pl-pds">"</span>250<span class="pl-pds">"</span></span> <span class="pl-e">cy</span>=<span class="pl-s"><span class="pl-pds">"</span>400<span class="pl-pds">"</span></span> <span class="pl-e">r</span>=<span class="pl-s"><span class="pl-pds">"</span>70<span class="pl-pds">"</span></span>/&gt; &lt;<span class="pl-ent">circle</span> <span class="pl-e">class</span>=<span class="pl-s"><span class="pl-pds">"</span>wheel-hub<span class="pl-pds">"</span></span> <span class="pl-e">cx</span>=<span class="pl-s"><span class="pl-pds">"</span>250<span class="pl-pds">"</span></span> <span class="pl-e">cy</span>=<span class="pl-s"><span class="pl-pds">"</span>400<span class="pl-pds">"</span></span> <span class="pl-e">r</span>=<span class="pl-s"><span class="pl-pds">"</span>10<span class="pl-pds">"</span></span>/&gt; &lt;<span class="pl-ent">circle</span> <span class="pl-e">class</span>=<span class="pl-s"><span class="pl-pds">"</span>wheel-rim<span class="pl-pds">"</span></span> <span class="pl-e">cx</span>=<span class="pl-s"><span class="pl-pds">"</span>550<span class="pl-pds">"</span></span> <span class="pl-e">cy</span>=<span class="pl-s"><span class="pl-pds">"</span>400<span class="pl-pds">"</span></span> <span class="pl-e">r</span>=<span class="pl-s"><span class="pl-pds">"</span>70<span class="pl-pds">"</span></span>/&gt; &lt;<span class="pl-ent">circle</span> <span class="pl-e">class</span>=<span class="pl-s"><span class="pl-pds">"</span>wheel-hub<span class="pl-pds">"</span></span> <span class="pl-e">cx</span>=<span class="pl-s"><span class="pl-pds">"</span>550<span class="pl-pds">"</span></span> <span class="pl-e">cy</span>=<span class="pl-s"><span class="pl-pds">"</span>400<span class="pl-pds">"</span></span> <span class="pl-e">r</span>=<span class="pl-s"><span class="pl-pds">"</span>10<span class="pl-pds">"</span></span>/&gt; <span class="pl-c"><span class="pl-c">&lt;!--</span> ... <span class="pl-c">--&gt;</span></span> &lt;/<span class="pl-ent">g</span>&gt; <span class="pl-c"><span class="pl-c">&lt;!--</span> Pelican <span class="pl-c">--&gt;</span></span> &lt;<span class="pl-ent">g</span> <span class="pl-e">id</span>=<span class="pl-s"><span class="pl-pds">"</span>pelican<span class="pl-pds">"</span></span>&gt; <span class="pl-c"><span class="pl-c">&lt;!--</span> Body <span class="pl-c">--&gt;</span></span> &lt;<span class="pl-ent">path</span> <span class="pl-e">class</span>=<span class="pl-s"><span class="pl-pds">"</span>pelican-body<span class="pl-pds">"</span></span> <span class="pl-e">d</span>=<span class="pl-s"><span class="pl-pds">"</span>M 440 330 C 480 280 520 280 500 350 C 480 380 420 380 440 330 Z<span class="pl-pds">"</span></span>/&gt; <span class="pl-c"><span class="pl-c">&lt;!--</span> Neck <span class="pl-c">--&gt;</span></span> &lt;<span class="pl-ent">path</span> <span class="pl-e">class</span>=<span class="pl-s"><span class="pl-pds">"</span>pelican-neck<span class="pl-pds">"</span></span> <span class="pl-e">d</span>=<span class="pl-s"><span class="pl-pds">"</span>M 460 320 Q 380 200 300 270<span class="pl-pds">"</span></span>/&gt; <span class="pl-c"><span class="pl-c">&lt;!--</span> Head <span class="pl-c">--&gt;</span></span> &lt;<span class="pl-ent">circle</span> <span class="pl-e">class</span>=<span class="pl-s"><span class="pl-pds">"</span>pelican-head<span class="pl-pds">"</span></span> <span class="pl-e">cx</span>=<span class="pl-s"><span class="pl-pds">"</span>300<span class="pl-pds">"</span></span> <span class="pl-e">cy</span>=<span class="pl-s"><span class="pl-pds">"</span>270<span class="pl-pds">"</span></span> <span class="pl-e">r</span>=<span class="pl-s"><span class="pl-pds">"</span>35<span class="pl-pds">"</span></span>/&gt; <span class="pl-c"><span class="pl-c">&lt;!--</span> ... <span class="pl-c">--&gt;</span></span></pre> The [LM Arena leaderboard](https://lmarena.ai/?leaderboard) now has Gemini 2.5 Flash [in joint second place](https://twitter.com/lmarena_ai/status/1912955625224773911), just behind Gemini 2.5 Pro and tied with ChatGPT-4o-latest, Grok-3 and GPT-4.5 Preview. ![Screenshot of a table showing AI model rankings with columns Rank* (UB), Rank (StyleCtrl), Model, Arena Score, 95% CI, Votes, Organization, and License. The rows show data for: Gemini-2.5-Pro-Exp-03-25 ranked 1/1 with score 1439, CI +7/-5, 9013 Votes, Organization Google, License Proprietary. ChatGPT-4o-latest (2025-03-26) ranked 2/2 with score 1407, CI +6/-6, 8261 Votes, Organization OpenAI, License Proprietary. Grok-3-Preview-02-24 ranked 2/4 with score 1402, CI +5/-3, 14849 Votes, Organization xAI, License Proprietary. GPT-4.5-Preview ranked 2/2 with score 1398, CI +5/-6, 14520 Votes, Organization OpenAI, License Proprietary. Gemini-2.5-Flash-Preview-04-17 ranked 2/4 with score 1392, CI +10/-13, 3325 Votes, Organization Google, License Proprietary](https://static.simonwillison.net/static/2025/gemini-2.5-flash-leaderboard.jpg) 2025-04-17 20:56:16+00:00
Introducing OpenAI o3 and o4-mini https://openai.com/index/introducing-o3-and-o4-mini/ OpenAI are *really* emphasizing tool use with these: > For the first time, our reasoning models can agentically use and combine every tool within ChatGPT—this includes searching the web, analyzing uploaded files and other data with Python, reasoning deeply about visual inputs, and even generating images. Critically, these models are trained to reason about when and how to use tools to produce detailed and thoughtful answers in the right output formats, typically in under a minute, to solve more complex problems. I released [llm-openai-plugin 0.3](https://github.com/simonw/llm-openai-plugin/releases/tag/0.3) adding support for the two new models: llm install -U llm-openai-plugin llm -m openai/o3 "say hi in five languages" llm -m openai/o4-mini "say hi in five languages" Here are [the pelicans riding bicycles](https://simonwillison.net/tags/pelican-riding-a-bicycle/) (prompt: `Generate an SVG of a pelican riding a bicycle`). **o3**: ![Described by o3: The illustration shows a playful, stylized bicycle whose frame is drawn to resemble a duck. • The duck’s rounded body forms the bicycle’s seat area, and a small wing is sketched on its side. • Its long neck stretches forward to become the top tube, ending in a simple head with a black eye and an orange, open beak that points ahead like handlebars. • Two large black‑outlined wheels with thin, evenly spaced spokes complete the bike, while thin blue lines depict the rest of the frame, pedals, and chain. • A dashed grey ground line runs beneath the wheels, giving the impression the duck‑bike is rolling along.](https://static.simonwillison.net/static/2025/o3-pelican.jpg) **o4-mini**: ![Described by o4-mini: The image is a very simple, minimalistic line drawing of a bird riding a bicycle, rendered against a plain white background. Key elements: • Two large, identical circles for the bike’s wheels, outlined in thick black. • A horizontal bar links the rear wheel to the bird’s body; a slanted bar links the front wheel to the bird. • The front wheel has a single straight spoke ending in an arrowhead, suggesting the forward direction of travel. • The bird itself is drawn with basic shapes: – A small circle for the head, with a single dot for the eye and a short yellow triangle for the beak. – An oval for the body. – Thin lines for the neck, legs, and the bike’s pedals and handlebars. • The bird appears perched on the saddle, its legs extending down to the pedals, and its tiny wings resting near the handlebars.](https://static.simonwillison.net/static/2025/o4-mini-pelican.jpg) Here are the full OpenAI model listings: [o3](https://platform.openai.com/docs/models/o3) is $10/million input and $40/million for output, with a 75% discount on cached input tokens, 200,000 token context window, 100,000 max output tokens and a May 31st 2024 training cut-off (same as the GPT-4.1 models). It's a bit cheaper than o1 ($15/$60) and a _lot_ cheaper than o1-pro ($150/$600). [o4-mini](https://platform.openai.com/docs/models/o4-mini) is priced the same as o3-mini: $1.10/million for input and $4.40/million for output, also with a 75% input caching discount. The size limits and training cut-off are the same as o3. You can compare these prices with other models using the table on my [updated LLM pricing calculator](https://tools.simonwillison.net/llm-prices). A new capability released today is that the OpenAI API can now optionally return reasoning summary text. I've been exploring that [in this issue](https://github.com/simonw/llm-openai-plugin/issues/16). I believe you have to verify your organization (which may involve a photo ID) in order to use this option - once you have access the easiest way to see the new tokens is using `curl` like this: curl https://api.openai.com/v1/responses \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $(llm keys get openai)" \ -d '{ "model": "o3", "input": "why is the sky blue?", "reasoning": {"summary": "auto"}, "stream": true }' This produces a stream of events that includes this new event type: `event: response.reasoning_summary_text.delta`<br> `data: {"type": "response.reasoning_summary_text.delta","item_id": "rs_68004320496081918e1e75ddb550d56e0e9a94ce520f0206","output_index": 0,"summary_index": 0,"delta": "**Expl"}` Omit the `"stream": true` and the response is easier to read and contains this: <pre>{ <span class="pl-ent">"output"</span>: [ { <span class="pl-ent">"id"</span>: <span class="pl-s"><span class="pl-pds">"</span>rs_68004edd2150819183789a867a9de671069bc0c439268c95<span class="pl-pds">"</span></span>, <span class="pl-ent">"type"</span>: <span class="pl-s"><span class="pl-pds">"</span>reasoning<span class="pl-pds">"</span></span>, <span class="pl-ent">"summary"</span>: [ { <span class="pl-ent">"type"</span>: <span class="pl-s"><span class="pl-pds">"</span>summary_text<span class="pl-pds">"</span></span>, <span class="pl-ent">"text"</span>: <span class="pl-s"><span class="pl-pds">"</span>**Explaining the blue sky**<span class="pl-cce">\n\n</span>The user asks a classic question about why the sky is blue. I'll talk about Rayleigh scattering, where shorter wavelengths of light scatter more than longer ones. This explains how we see blue light spread across the sky! I wonder if the user wants a more scientific or simpler everyday explanation. I'll aim for a straightforward response while keeping it engaging and informative. So, let's break it down!<span class="pl-pds">"</span></span> } ] }, { <span class="pl-ent">"id"</span>: <span class="pl-s"><span class="pl-pds">"</span>msg_68004edf9f5c819188a71a2c40fb9265069bc0c439268c95<span class="pl-pds">"</span></span>, <span class="pl-ent">"type"</span>: <span class="pl-s"><span class="pl-pds">"</span>message<span class="pl-pds">"</span></span>, <span class="pl-ent">"status"</span>: <span class="pl-s"><span class="pl-pds">"</span>completed<span class="pl-pds">"</span></span>, <span class="pl-ent">"content"</span>: [ { <span class="pl-ent">"type"</span>: <span class="pl-s"><span class="pl-pds">"</span>output_text<span class="pl-pds">"</span></span>, <span class="pl-ent">"annotations"</span>: [], <span class="pl-ent">"text"</span>: <span class="pl-s"><span class="pl-pds">"</span>The short answer ...<span class="pl-pds">"</span></span> } ] } ] }</pre> 2025-04-16 17:46:35+00:00
openai/codex https://github.com/openai/codex Just released by OpenAI, a "lightweight coding agent that runs in your terminal". Looks like their version of [Claude Code](https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview), though unlike Claude Code Codex is released under an open source (Apache 2) license. Here's [the main prompt](https://github.com/openai/codex/blob/9b733fc48fb81b3f3460c1fdda111ba9b861f81f/codex-cli/src/utils/agent/agent-loop.ts#L1001-L1046) that runs in a loop, which starts like this: > `You are operating as and within the Codex CLI, a terminal-based agentic coding assistant built by OpenAI. It wraps OpenAI models to enable natural language interaction with a local codebase. You are expected to be precise, safe, and helpful.` > > `You can:`<br> > `- Receive user prompts, project context, and files.`<br> > `- Stream responses and emit function calls (e.g., shell commands, code edits).`<br> > `- Apply patches, run commands, and manage user approvals based on policy.`<br> > `- Work inside a sandboxed, git-backed workspace with rollback support.`<br> > `- Log telemetry so sessions can be replayed or inspected later.`<br> > `- More details on your functionality are available at codex --help`<br> > > `The Codex CLI is open-sourced. Don't confuse yourself with the old Codex language model built by OpenAI many moons ago (this is understandably top of mind for you!). Within this context, Codex refers to the open-source agentic coding interface. [...]` I like that the prompt describes OpenAI's previous Codex language model as being from "many moons ago". Prompt engineering is so weird. Since the prompt says that it works "inside a sandboxed, git-backed workspace" I went looking for the sandbox. On macOS [it uses](https://github.com/openai/codex/blob/9b733fc48fb81b3f3460c1fdda111ba9b861f81f/codex-cli/src/utils/agent/sandbox/macos-seatbelt.ts) the little-known `sandbox-exec` process, part of the OS but grossly under-documented. The best information I've found about it is [this article from 2020](https://www.karltarvas.com/macos-app-sandboxing-via-sandbox-exec.html), which notes that `man sandbox-exec` lists it as deprecated. I didn't spot evidence in the Codex code of sandboxes for other platforms. 2025-04-16 17:25:39+00:00
SQLite File Format Viewer https://sqlite-internal.pages.dev/ Neat browser-based visual interface for exploring the structure of a SQLite database file, built by Visal In using React and a custom parser [implemented in TypeScript](https://github.com/invisal/sqlite-internal/tree/main/src/parser). 2025-04-14 14:35:07+00:00
Using LLMs as the first line of support in Open Source https://noumenal.es/posts/llms-for-user-support/WZb/ From reading the title I was nervous that this might involve automating the initial response to a user support query in an issue tracker with an LLM, but Carlton Gibson has better taste than that. > The **open contribution** model engendered by GitHub — where anonymous (to the project) users can create issues, and comments, which are almost always extractive support requests — results in an effective **denial-of-service attack against maintainers**. [...] > > For anonymous users, who really just want help almost all the time, the pattern I’m settling on is to facilitate them getting their answer from their LLM of choice. [...] we can generate a file that we offer users to download, then we tell the user to pass this to (say) Claude with a simple prompt for their question. This resonates with the concept proposed by [llms.txt](https://llmstxt.org/) - making LLM-friendly context files available for different projects. My [simonw/docs-for-llms](https://github.com/simonw/docs-for-llms) contains my own early experiment with this: I'm running a build script to create LLM-friendly concatenated documentation for several of my projects, and my [llm-docs](https://github.com/simonw/llm-docs) plugin ([described here](https://simonwillison.net/2025/Apr/7/long-context-llm/#asking-questions-of-llm-s-documentation)) can then be used to ask questions of that documentation. It's possible to pre-populate the Claude UI with a prompt by linking to `https://claude.ai/new?q={PLACE_HOLDER}`, but it looks like there's quite a short length limit on how much text can be passed that way. It would be neat if you could pass a URL to a larger document instead. ChatGPT also supports `https://chatgpt.com/?q=your-prompt-here` (again with a short length limit) and directly executes the prompt rather than waiting for you to edit it first(!) 2025-04-14 04:54:35+00:00
Stevens: a hackable AI assistant using a single SQLite table and a handful of cron jobs https://www.geoffreylitt.com/2025/04/12/how-i-made-a-useful-ai-assistant-with-one-sqlite-table-and-a-handful-of-cron-jobs Geoffrey Litt reports on Stevens, a shared digital assistant he put together for his family using SQLite and scheduled tasks running on Val Town. The design is refreshingly simple considering how much it can do. Everything works around a single `memories` table. A memory has text, tags, creation metadata and an optional `date` for things like calendar entries and weather reports. Everything else is handled by scheduled jobs to popular weather information and events from Google Calendar, a Telegram integration offering a chat UI and a neat system where USPS postal email delivery notifications are run through Val's own email handling mechanism to trigger a Claude prompt to add those as memories too. Here's [the full code on Val Town](https://www.val.town/x/geoffreylitt/stevensDemo), including [the daily briefing prompt](https://www.val.town/x/geoffreylitt/stevensDemo/code/dailyBriefing/sendDailyBrief.ts) that incorporates most of the personality of the bot. 2025-04-13 20:58:09+00:00
llm-fragments-rust https://github.com/huitseeker/llm-fragments-rust Inspired by Filippo Valsorda's [llm-fragments-go](https://simonwillison.net/2025/Apr/10/llm-fragments-go/), Francois Garillot created `llm-fragments-rust`, an [LLM fragments](https://simonwillison.net/2025/Apr/7/long-context-llm/) plugin that lets you pull documentation for any Rust crate directly into a prompt to LLM. I really like this example, which uses two fragments to load documentation for two crates at once: > `llm -f rust:rand@0.8.5 -f rust:tokio "How do I generate random numbers asynchronously?"` The [code](https://github.com/huitseeker/llm-fragments-rust/blob/main/llm_fragments_rust.py) uses some neat tricks: it creates a new Rust project in a temporary directory (similar to how `llm-fragments-go` works), adds the crates and uses `cargo doc --no-deps --document-private-items` to generate documentation. Then it runs `cargo tree --edges features` to add dependency information, and `cargo metadata --format-version=1` to include additional metadata about the crate. 2025-04-11 17:36:27+00:00
Default styles for h1 elements are changing https://developer.mozilla.org/en-US/blog/h1-element-styles/ Wow, this is a rare occurrence! Firefox are rolling out a change to the default user-agent stylesheet for nested `<h1>` elements, currently ramping from 5% to 50% of users and with full roll-out planned for Firefox 140 in June 2025. Chrome is showing deprecation warnings and Safari are expected to follow suit in the future. What's changing? The default sizes of `<h1>` elements that are nested inside `<article>`, `<aside>`, `<nav>` and `<section>`. These are the default styles being removed: <blockquote> <pre><span class="pl-c">/* where x is :is(article, aside, nav, section) */</span> <span class="pl-ent">x</span> <span class="pl-ent">h1</span> { <span class="pl-c1">margin-block</span><span class="pl-kos">:</span> <span class="pl-c1">0.83<span class="pl-smi">em</span></span>; <span class="pl-c1">font-size</span><span class="pl-kos">:</span> <span class="pl-c1">1.50<span class="pl-smi">em</span></span>; } <span class="pl-ent">x</span> <span class="pl-ent">x</span> <span class="pl-ent">h1</span> { <span class="pl-c1">margin-block</span><span class="pl-kos">:</span> <span class="pl-c1">1.00<span class="pl-smi">em</span></span>; <span class="pl-c1">font-size</span><span class="pl-kos">:</span> <span class="pl-c1">1.17<span class="pl-smi">em</span></span>; } <span class="pl-ent">x</span> <span class="pl-ent">x</span> <span class="pl-ent">x</span> <span class="pl-ent">h1</span> { <span class="pl-c1">margin-block</span><span class="pl-kos">:</span> <span class="pl-c1">1.33<span class="pl-smi">em</span></span>; <span class="pl-c1">font-size</span><span class="pl-kos">:</span> <span class="pl-c1">1.00<span class="pl-smi">em</span></span>; } <span class="pl-ent">x</span> <span class="pl-ent">x</span> <span class="pl-ent">x</span> <span class="pl-ent">x</span> <span class="pl-ent">h1</span> { <span class="pl-c1">margin-block</span><span class="pl-kos">:</span> <span class="pl-c1">1.67<span class="pl-smi">em</span></span>; <span class="pl-c1">font-size</span><span class="pl-kos">:</span> <span class="pl-c1">0.83<span class="pl-smi">em</span></span>; } <span class="pl-ent">x</span> <span class="pl-ent">x</span> <span class="pl-ent">x</span> <span class="pl-ent">x</span> <span class="pl-ent">x</span> <span class="pl-ent">h1</span> { <span class="pl-c1">margin-block</span><span class="pl-kos">:</span> <span class="pl-c1">2.33<span class="pl-smi">em</span></span>; <span class="pl-c1">font-size</span><span class="pl-kos">:</span> <span class="pl-c1">0.67<span class="pl-smi">em</span></span>; }</pre> </blockquote> The short version is that, many years ago, the HTML spec introduced the idea that an `<h1>` within a nested section should have the same meaning (and hence visual styling) as an `<h2>`. This never really took off and wasn't reflected by the accessibility tree, and was removed from the HTML spec in 2022. The browsers are now trying to cleanup the legacy default styles. This advice from that post sounds sensible to me: > - **Do not** rely on default browser styles for conveying a heading hierarchy. Explicitly define your document hierarchy using `<h2>` for second-level headings, `<h3>` for third-level, etc. - Always define your own `font-size` and `margin` for `<h1>` elements. 2025-04-11 03:54:43+00:00
LLM pricing calculator (updated) https://tools.simonwillison.net/llm-prices I [updated](https://github.com/simonw/tools/commit/07c91d8b4fd31f4d460e6b0cd9c225af7e211112) my LLM pricing calculator this morning ([Claude transcript](https://claude.ai/share/5e0eebde-6204-4496-aa1a-fcc519df44b2)) to show the prices of various hosted models in a sorted table, defaulting to lowest price first. ![A pricing table titled "Model prices (per million tokens)" showing AI model costs. Lists models including Amazon Nova Micro ($0.035 input/$0.14 output), Gemini 1.5 Flash-8B ≤128k ($0.04/$0.15), Ministral 3B 24.10 ($0.04/$0.04), Amazon Nova Lite ($0.06/$0.24), Gemini 2.0 Flash Lite ($0.075/$0.30), Gemini 1.5 Flash ≤128k ($0.075/$0.30), Gemini 1.5 Flash-8B >128k ($0.075/$0.30), Gemini 2.0 Flash ($0.10/$0.40), Mistral Small 3.1 ($0.10/$0.30), Ministral 8B 24.10 ($0.10/$0.10), Gemini 1.5 Flash >128k ($0.15/$0.60), and GPT-4o Mini ($0.15/$0.60).](https://static.simonwillison.net/static/2025/llm-prices-sorted.jpg) [Amazon Nova](https://simonwillison.net/2024/Dec/4/amazon-nova/) and [Google Gemini](https://simonwillison.net/tags/gemini/) continue to dominate the lower end of the table. The most expensive models currently are still OpenAI's o1-Pro ($150/$600 and GPT-4.5 ($75/$150). 2025-04-10 19:56:56+00:00
Copy and export data

Duration: 4.16ms