Example dashboard

Various statistics from my blog.

Owned by simonw, visibility: Public

Entries

3230

SQL query
select 'Entries' as label, count(*) as big_number from blog_entry

Blogmarks

8139

SQL query
select 'Blogmarks' as label, count(*) as big_number from blog_blogmark

Quotations

1270

SQL query
select 'Quotations' as label, count(*) as big_number from blog_quotation

Chart of number of entries per month over time

SQL query
select '<h2>Chart of number of entries per month over time</h2>' as html
SQL query
select to_char(date_trunc('month', created), 'YYYY-MM') as bar_label,
count(*) as bar_quantity from blog_entry group by bar_label order by count(*) desc

Ten most recent blogmarks (of 8139 total)

SQL query
select '## Ten most recent blogmarks (of ' || count(*) || ' total)' as markdown from blog_blogmark
SQL query
select link_title, link_url, commentary, created from blog_blogmark order by created desc limit 10

10 rows

link_title link_url commentary created
Marimo is Joining CoreWeave https://marimo.io/blog/joining-coreweave I don't usually cover startup acquisitions here, but this one feels relevant to several of my interests. Marimo ([previously](https://simonwillison.net/tags/marimo/)) provide an open source (Apache 2 licensed) notebook tool for Python, with first-class support for an additional WebAssembly build plus an optional hosted service. It's effectively a reimagining of Jupyter notebooks as a reactive system, where cells automatically update based on changes to other cells - similar to how [Observable](https://observablehq.com/) JavaScript notebooks work. The first public Marimo release was in January 2024 and the tool has "been in development since 2022" ([source](https://news.ycombinator.com/item?id=44304607#44330375)) CoreWeave are a *big* player in the AI data center space. They started out as an Ethereum mining company in 2017, then pivoted to cloud computing infrastructure for AI companies after the 2018 cryptocurrency crash. They IPOd in March 2025 and today they operate more than 30 data centers worldwide and have announced a number of eye-wateringly sized deals with companies such as Cohere and OpenAI. I found [their Wikipedia page](https://en.wikipedia.org/wiki/CoreWeave) very helpful. They've also been on an acquisition spree this year, including: - Weights & Biases [in March 2025](https://www.coreweave.com/blog/coreweave-completes-acquisition-of-weights-biases) (deal closed in May), the AI training observability platform. - OpenPipe [in September 2025](https://www.coreweave.com/news/coreweave-to-acquire-openpipe-leader-in-reinforcement-learning) - a reinforcement learning platform, authors of the [Agent Reinforcement Trainer](https://github.com/OpenPipe/ART) Apache 2 licensed open source RL framework. - Monolith AI [in October 2025](https://investors.coreweave.com/news/news-details/2025/CoreWeave-to-Acquire-Monolith-Expanding-AI-Cloud-Platform-into-Industrial-Innovation/default.aspx), a UK-based AI model SaaS platform focused on AI for engineering and industrial manufacturing. - And now Marimo. Marimo's own announcement emphasizes continued investment in that tool: > Marimo is joining CoreWeave. We’re continuing to build the open-source marimo notebook, while also leveling up molab with serious compute. Our long-term mission remains the same: to build the world’s best open-source programming environment for working with data. > > marimo is, and always will be, free, open-source, and permissively licensed. Give CoreWeave's buying spree only really started this year it's impossible to say how well these acquisitions are likely to play out - they haven't yet established a track record. 2025-10-31 13:57:51+00:00
Introducing SWE-1.5: Our Fast Agent Model https://cognition.ai/blog/swe-1-5 Here's the second fast coding model released by a coding agent IDE in the same day - the first was [Composer-1 by Cursor](https://simonwillison.net/2025/Oct/29/cursor-composer/). This time it's Windsurf releasing SWE-1.5: > Today we’re releasing SWE-1.5, the latest in our family of models optimized for software engineering. It is a frontier-size model with hundreds of billions of parameters that achieves near-SOTA coding performance. It also sets a new standard for speed: we partnered with Cerebras to serve it at up to 950 tok/s – 6x faster than Haiku 4.5 and 13x faster than Sonnet 4.5. Like Composer-1 it's only available via their editor, no separate API yet. Also like Composer-1 they don't appear willing to share details of the "leading open-source base model" they based their new model on. I asked it to generate an SVG of a pelican riding a bicycle and got this: ![Bicycle has a red upside down Y shaped frame, pelican is a bit dumpy, it does at least have a long sharp beak.](https://static.simonwillison.net/static/2025/swe-pelican.png) This one felt *really fast*. Partnering with Cerebras for inference is a very smart move. They share a lot of details about their training process in the post: > SWE-1.5 is trained on our state-of-the-art cluster of thousands of GB200 NVL72 chips. We believe SWE-1.5 may be the first public production model trained on the new GB200 generation. [...] > > Our RL rollouts require high-fidelity environments with code execution and even web browsing. To achieve this, we leveraged our VM hypervisor `otterlink` that  allows us to scale **Devin** to tens of thousands of concurrent machines (learn more about [blockdiff](https://cognition.ai/blog/blockdiff#why-incremental-vm-snapshots)). This enabled us to smoothly support very high concurrency and ensure the training environment is aligned with our Devin production environments. That's *another* similarity to Cursor's Composer-1! Cursor talked about how they ran "hundreds of thousands of concurrent sandboxed coding environments in the cloud" in [their description of their RL training](https://cursor.com/blog/composer) as well. This is a notable trend: if you want to build a really great agentic coding tool there's clearly a lot to be said for using reinforcement learning to fine-tune a model against your own custom set of tools using large numbers of sandboxed simulated coding environments as part of that process. **Update**: [I think it's built on GLM](https://x.com/zai_org/status/1984076614951420273). 2025-10-29 23:59:20+00:00
MiniMax M2 & Agent: Ingenious in Simplicity https://www.minimax.io/news/minimax-m2 MiniMax M2 was released on Monday 27th October by MiniMax, a Chinese AI lab founded in December 2021. It's a very promising model. Their self-reported benchmark scores show it as comparable to Claude Sonnet 4, and Artificial Analysis [are ranking it](https://x.com/ArtificialAnlys/status/1982714153375854998) as the best currently available open weight model according to their intelligence score: > MiniMax’s M2 achieves a new all-time-high Intelligence Index score for an open weights model and offers impressive efficiency with only 10B active parameters (200B total). [...] > > The model’s strengths include tool use and instruction following (as shown by Tau2 Bench and IFBench). As such, while M2 likely excels at agentic use cases it may underperform other open weights leaders such as DeepSeek V3.2 and Qwen3 235B at some generalist tasks. This is in line with a number of recent open weights model releases from Chinese AI labs which focus on agentic capabilities, likely pointing to a heavy post-training emphasis on RL. The size is particularly significant: the model weights are 230GB [on Hugging Face](https://huggingface.co/MiniMaxAI/MiniMax-M2), significantly smaller than other high performing open weight models. That's small enough to run on a 256GB Mac Studio, and the MLX community [have that working already](https://huggingface.co/mlx-community/MiniMax-M2-8bit). MiniMax offer their own API, and recommend using their Anthropic-compatible endpoint and the official Anthropic SDKs to access it. MiniMax Head of Engineering Skyler Miao [provided some background on that](https://x.com/SkylerMiao7/status/1982989507252367687): > M2 is a agentic thinking model, it do interleaved thinking like sonnet 4.5, which means every response will contain its thought content. Its very important for M2 to keep the chain of thought. So we must make sure the history thought passed back to the model. Anthropic API support it for sure, as sonnet needs it as well. OpenAI only support it in their new Response API, no support for in ChatCompletion. MiniMax are offering the new model via their API for free until November 7th, after which the cost will be $0.30/million input tokens and $1.20/million output tokens - similar in price to Gemini 2.5 Flash and GPT-5 Mini, see [price comparison here](https://www.llm-prices.com/#it=51&ot=4017&sel=minimax-m2%2Cgpt-5-mini%2Cclaude-3-haiku%2Cgemini-2.5-flash-lite%2Cgemini-2.5-flash) on my [llm-prices.com](https://www.llm-prices.com/) site. I released a new plugin for [LLM](https://llm.datasette.io/) called [llm-minimax](https://github.com/simonw/llm-minimax) providing support for M2 via the MiniMax API: llm install llm-minimax llm keys set minimax # Paste key here llm -m m2 -o max_tokens 10000 "Generate an SVG of a pelican riding a bicycle" Here's [the result](https://gist.github.com/simonw/da79447830dc431c067a93648b338be6): ![Biycle is good though obscured by the pelican. Pelican has an impressive triple beak and is stretched along the bicycle frame. Not clear if it can pedal or what it is sitting on.](https://static.simonwillison.net/static/2025/m2-pelican.png) 51 input, 4,017 output. At $0.30/m input and $1.20/m output that pelican would cost 0.4836 cents - less than half a cent. This is the first plugin I've written for an Anthropic-API-compatible model. I released [llm-anthropic 0.21](https://github.com/simonw/llm-anthropic/releases/tag/0.21) first adding the ability to customize the `base_url` parameter when using that model class. This meant the new plugin was less than [30 lines of Python](https://github.com/simonw/llm-minimax/blob/0.1/llm_minimax.py). 2025-10-29 22:49:47+00:00
Composer: Building a fast frontier model with RL https://cursor.com/blog/composer Cursor released [Cursor 2.0 today](https://cursor.com/blog/2-0), with a refreshed UI focused on agentic coding (and running agents in parallel) and a new model that's unique to Cursor called <strong>Composer&nbsp;1</strong>. As far as I can tell there's no way to call the model directly via an API, so I fired up "Ask" mode in Cursor's chat side panel and asked it to "Generate an SVG of a pelican riding a bicycle": ![Screenshot of Cursor 2 - In the chat panel I have asked the question and it spat out a bunch of SVG.](https://static.simonwillison.net/static/2025/cursor-2.jpg) Here's [the result](https://gist.github.com/simonw/e5c9176f153ca718370055ecd256fe70): ![The bicycle is levitating against a blue sky. The pelican looks a little bit more like a baby chicken but does at least have a long beak.](https://static.simonwillison.net/static/2025/cursor-1-pelican.png) The notable thing about Composer-1 is that it is designed to be *fast*. The pelican certainly came back quickly, and in their announcement they describe it as being "4x faster than similarly intelligent models". It's interesting to see Cursor investing resources in training their own code-specific model - similar to [GPT-5-Codex](https://openai.com/index/introducing-upgrades-to-codex/) or [Qwen3-Coder](https://github.com/QwenLM/Qwen3-Coder). From their post: > Composer is a mixture-of-experts (MoE) language model supporting long-context generation and understanding. It is specialized for software engineering through reinforcement learning (RL) in a diverse range of development environments. [...] > > Efficient training of large MoE models requires significant investment into building infrastructure and systems research. We built custom training infrastructure leveraging PyTorch and Ray to power asynchronous reinforcement learning at scale. We natively train our models at low precision by combining our [MXFP8 MoE kernels](https://cursor.com/blog/kernels) with expert parallelism and hybrid sharded data parallelism, allowing us to scale training to thousands of NVIDIA GPUs with minimal communication cost. [...] > > During RL, we want our model to be able to call any tool in the Cursor Agent harness. These tools allow editing code, using semantic search, grepping strings, and running terminal commands. At our scale, teaching the model to effectively call these tools requires running hundreds of thousands of concurrent sandboxed coding environments in the cloud. One detail that's notably absent from their description: did they train the model from scratch, or did they start with an existing open-weights model such as something from Qwen or GLM? Cursor researcher Sasha Rush has been answering questions [on Hacker News](https://news.ycombinator.com/item?id=45748725), but has so far been evasive in answering questions about the base model. When directly asked "is Composer a fine tune of an existing open source base model?" they replied: > Our primary focus is on RL post-training. We think that is the best way to get the model to be a strong interactive agent. Sasha [did confirm](https://news.ycombinator.com/item?id=45748725#45750784) that rumors of an earlier Cursor preview model, Cheetah, being based on a model by xAI's Grok were "Straight up untrue." 2025-10-29 20:45:53+00:00
The PSF has withdrawn a $1.5 million proposal to US government grant program https://pyfound.blogspot.com/2025/10/NSF-funding-statement.html The Python Software Foundation was recently "recommended for funding" (NSF terminology) for a $1.5m grant from the US government National Science Foundation to help improve the security of the Python software ecosystem, after an grant application process lead by Seth Larson and Loren Crary. The PSF's annual budget is less than $6m so this is a meaningful amount of money for the organization! We were forced to withdraw our application and turn down the funding, thanks to new language that was added to the agreement requiring us to affirm that we "do not, and will not during the term of this financial assistance award, operate any programs that advance or promote DEI, or discriminatory equity ideology in violation of Federal anti-discrimination laws." Our legal advisors confirmed that this would not just apply to security work covered by the grant - this would apply to all of the PSF's activities. This was not an option for us. Here's the [mission](https://www.python.org/psf/mission/) of the PSF: > The mission of the Python Software Foundation is to promote, protect, and advance the Python programming language, and to support and facilitate the growth of a diverse and international community of Python programmers. If we accepted and spent the money despite this term, there was a very real risk that the money could be clawed back later. That represents an existential risk for the foundation since we would have already spent the money! I was one of the board members who voted to reject this funding - a unanimous but tough decision. I’m proud to serve on a board that can make difficult decisions like this. If you'd like to sponsor the PSF you can find out more [on our site](https://www.python.org/sponsors/application/). I'd love to see a few more of the large AI labs show up [on our top-tier visionary sponsors list](https://www.python.org/psf/sponsors/). 2025-10-27 20:32:07+00:00
GenAI Image Editing Showdown https://genai-showdown.specr.net/image-editing Useful collection of examples by Shaun Pedicini who tested Seedream 4, Gemini 2.5 Flash, Qwen-Image-Edit, FLUX.1 Kontext [dev], FLUX.1 Kontext [max], OmniGen2, and OpenAI gpt-image-1 across 12 image editing prompts. The tasks are very neatly selected, for example: > `Remove all the brown pieces of candy from the glass bowl` Qwen-Image-Edit (a model that [can be self-hosted](https://simonwillison.net/2025/Aug/19/qwen-image-edit/)) was the only one to successfully manage that! This kind of collection is really useful for building up an intuition as to how well image editing models work, and which ones are worth trying for which categories of task. Shaun has [a similar page for text-to-image models](https://genai-showdown.specr.net/) which are not fed an initial image to modify, with further challenging prompts like: > `Two Prussian soldiers wearing spiked pith helmets are facing each other and playing a game of ring toss by attempting to toss metal rings over the spike on the other soldier's helmet.` 2025-10-26 23:59:25+00:00
Sora might have a 'pervert' problem on its hands https://www.businessinsider.com/sora-video-openai-fetish-content-my-face-problem-2025-10 Katie Notopoulos turned on the Sora 2 option where anyone can make a video featuring her cameo, and then: > I found a stranger had made a video where I appeared pregnant. A quick look at the user's profile, and I saw that this person's entire Sora profile was made up of this genre — video after video of women with big, pregnant bellies. I recognized immediately what this was: fetish content. This feels like an intractable problem to me: given the enormous array of fetishes it's hard to imagine a classifier that could protect people from having their likeness used in this way. Best to be aware of this risk before turning on any settings that allow strangers to reuse your image... and that's only an option for tools that implement a robust opt-in mechanism like Sora does. 2025-10-26 17:03:55+00:00
Visual Features Across Modalities: SVG and ASCII Art Reveal Cross-Modal Understanding https://transformer-circuits.pub/2025/october-update/index.html#svg-cross-modal New model interpretability research from Anthropic, this time focused on SVG and ASCII art generation. > We found that the same feature that activates over the eyes in an ASCII face also activates for eyes across diverse text-based modalities, including SVG code and prose in various languages. This is not limited to eyes – we found a number of cross-modal features that recognize specific concepts: from small components like mouths and ears within ASCII or SVG faces, to full visual depictions like dogs and cats. [...] > > These features depend on the surrounding context within the visual depiction. For instance, an SVG circle element activates “eye” features only when positioned within a larger structure that activates “face” features. And really, I can't *not* link to this one given the bonus they tagged on at the end! > As a bonus, we also inspected features for an SVG of a pelican riding a bicycle, [first popularized](https://github.com/simonw/pelican-bicycle)[ by Simon Willison](https://github.com/simonw/pelican-bicycle) as a way to test a model's artistic capabilities. We find features representing concepts including "bike", "wheels", "feet", "tail", "eyes", and "mouth" activating over the corresponding parts of the SVG code. > > ![Diagram showing a pelican riding a bicycle illustration alongside its SVG source code. The left side displays two versions: a completed color illustration at top with a white pelican with yellow beak on a red bicycle with blue wheels (labeled "Bike" and "Wheels"), and a line drawing sketch below with labels "Fur/Wool", "Eyes", "Mouth", "Tail", and "Bird". The right side shows the corresponding SVG XML code with viewBox, rect, ellipse, circle, and path elements defining the illustration's geometry and styling.](https://static.simonwillison.net/static/2025/anthropic-pelican-bicycle.jpg) Now that they can identify model features associated with visual concepts in SVG images, can they us those for steering? It turns out they can! Starting with a smiley SVG (provided as XML with no indication as to what it was drawing) and then applying a negative score to the "smile" feature produced a frown instead, and worked against ASCII art as well. They could also boost features like unicorn, cat, owl, or lion and get new SVG smileys clearly attempting to depict those creatures. > ![Diagram showing a yellow smiley face in the center with bidirectional arrows connecting to six different circular faces arranged around it, with text above asking "What can this face be steered into?" The surrounding faces are labeled clockwise from top left: "Unicorn" (pink circle with yellow triangle horn and diamond earrings), "Cat" (gray circle with triangular ears and small nose), "Wrinkles" (beige circle with eyelashes and wrinkle lines), "Owl" (brown circle with large round eyes and small beak), "Lion" (orange circle with yellow inner face), and "Eye" (white circle with large black pupil and highlight](https://static.simonwillison.net/static/2025/anthropic-faces.jpg) I'd love to see how this behaves if you jack up the feature for the [Golden Gate Bridge](https://simonwillison.net/2024/May/24/golden-gate-claude/). 2025-10-25 03:08:31+00:00
claude_code_docs_map.md https://docs.claude.com/en/docs/claude-code/claude_code_docs_map.md Something I'm enjoying about Claude Code is that any time you ask it questions about *itself* it runs tool calls like these: ![I'll check the Claude Code documentation about bash hooks to see if there's something about the configuration that might explain why it didn't trigger. Fetch(https://docs.claude.com/en/docs/claude-code/claude_code_docs_map.md) ⎿  Received 25.9KB (200 OK) Fetch(https://docs.claude.com/en/docs/claude-code/hooks-guide.md) ⎿  Received 9.4KB (200 OK) Fetch(https://docs.claude.com/en/docs/claude-code/hooks) ⎿  Received 2.2MB (200 OK) Ah, I see the issue! The bashHook in your settings.json is checking the $PROMPT variable, but according to the documentation, bash hooks should: 1. Use PreToolUse hooks (not a simple bash script) 2. Parse JSON input from stdin 3. Access the command via tool_input.command in the JSON ](https://static.simonwillison.net/static/2025/claude-code-self-documentation.jpg) In this case I'd asked it about its "hooks" feature. The [claude_code_docs_map.md](https://docs.claude.com/en/docs/claude-code/claude_code_docs_map.md) file is a neat Markdown index of all of their other documentation - the same pattern advocated by [llms.txt](https://llmstxt.org/). Claude Code can then fetch further documentation to help it answer your question. I intercepted the current Claude Code system prompt [using this trick](https://simonwillison.net/2025/Jun/2/claude-trace/) and sure enough it included a note about this URL: > `When the user directly asks about Claude Code (eg. "can Claude Code do...", "does Claude Code have..."), or asks in second person (eg. "are you able...", "can you do..."), or asks how to use a specific Claude Code feature (eg. implement a hook, or write a slash command), use the WebFetch tool to gather information to answer the question from Claude Code docs. The list of available docs is available at https://docs.claude.com/en/docs/claude-code/claude_code_docs_map.md.` I wish other LLM products - including both ChatGPT and Claude.ai themselves - would implement a similar pattern. It's infuriating how bad LLM tools are at answering questions about themselves, though unsurprising given that their model's training data pre-dates the latest version of those tools. 2025-10-24 23:01:42+00:00
OpenAI no longer has to preserve all of its ChatGPT data, with some exceptions https://www.engadget.com/ai/openai-no-longer-has-to-preserve-all-of-its-chatgpt-data-with-some-exceptions-192422093.html This is a relief: > Federal judge Ona T. Wang filed a new order on October 9 that frees OpenAI of an obligation to "preserve and segregate all output log data that would otherwise be deleted on a going forward basis." I wrote about this [in June](https://simonwillison.net/2025/Jun/5/openai-court-order/). OpenAI were compelled by a court order to preserve *all* output, even from private chats, in case it became relevant to the ongoing New York Times lawsuit. Here are those "some exceptions": > The judge in the case said that any chat logs already saved under the previous order would still be accessible and that OpenAI is required to hold on to any data related to ChatGPT accounts that have been flagged by the NYT. 2025-10-23 05:19:32+00:00
Copy and export data

Duration: 4.21ms