Simon Willison’s Weblog

Subscribe

Quotations

Filters: Sorted by date

I found the problem and it's really bad. Looking at your log, here's the catastrophic command that was run:

rm -rf tests/ patches/ plan/ ~/

See that ~/ at the end? That's your entire home directory. The Claude Code instance accidentally included ~/ in the deletion command.

Claude, after Claude Code deleted most of a user's Mac

# 9th December 2025, 12:04 pm / ai-ethics, coding-agents, claude, claude-code, generative-ai, ai, llms

Now I want to talk about how they're selling AI. The growth narrative of AI is that AI will disrupt labor markets. I use "disrupt" here in its most disreputable, tech bro sense.

The promise of AI – the promise AI companies make to investors – is that there will be AIs that can do your job, and when your boss fires you and replaces you with AI, he will keep half of your salary for himself, and give the other half to the AI company.

That's it.

That's the $13T growth story that MorganStanley is telling. It's why big investors and institutionals are giving AI companies hundreds of billions of dollars. And because they are piling in, normies are also getting sucked in, risking their retirement savings and their family's financial security.

Cory Doctorow, The Reverse Centaur’s Guide to Criticizing AI

# 7th December 2025, 9:28 pm / cory-doctorow, ai-ethics, ai

What to try first?

Run Claude Code in a repo (whether you know it well or not) and ask a question about how something works. You'll see how it looks through the files to find the answer.

The next thing to try is a code change where you know exactly what you want but it's tedious to type. Describe it in detail and let Claude figure it out. If there is similar code that it should follow, tell it so. From there, you can build intuition about more complex changes that it might be good at. [...]

As conversation length grows, each message gets more expensive while Claude gets dumber. That's a bad trade! [...] Run /reset (or just quit and restart) to start over from scratch. Tell Claude to summarize the conversation so far to give you something to paste into the next chat if you want to save some of the context.

David Crespo, Oxide's internal tips on LLM use

# 7th December 2025, 8:33 pm / coding-agents, ai-assisted-programming, oxide, claude-code, generative-ai, llms

If you work slowly, you will be more likely to stick with your slightly obsolete work. You know that professor who spent seven years preparing lecture notes twenty years ago? He is not going to throw them away and start again, as that would be a new seven-year project. So he will keep teaching using aging lecture notes until he retires and someone finally updates the course.

Daniel Lemire, Why speed matters

# 6th December 2025, 2:40 pm / productivity

Since the beginning of the project in 2023 and the private beta days of Ghostty, I've repeatedly expressed my intention that Ghostty legally become a non-profit. [...]

I want to squelch any possible concerns about a "rug pull". A non-profit structure provides enforceable assurances: the mission cannot be quietly changed, funds cannot be diverted to private benefit, and the project cannot be sold off or repurposed for commercial gain. The structure legally binds Ghostty to the public-benefit purpose it was created to serve. [...]

I believe infrastructure of this kind should be stewarded by a mission-driven, non-commercial entity that prioritizes public benefit over private profit. That structure increases trust, encourages adoption, and creates the conditions for Ghostty to grow into a widely used and impactful piece of open-source infrastructure.

Mitchell Hashimoto, Ghostty is now Non-Profit

# 3rd December 2025, 7:18 pm / open-source, mitchell-hashimoto

More than half of the teens surveyed believe journalists regularly engage in unethical behaviors like making up details or quotes in stories, paying sources, taking visual images out of context or doing favors for advertisers. Less than a third believe reporters correct their errors, confirm facts before reporting them, gather information from multiple sources or cover stories in the public interest — practices ingrained in the DNA of reputable journalists.

David Bauder, AP News, A lost generation of news consumers? Survey shows how teenagers dislike the news media

# 1st December 2025, 5:22 pm / journalism

I am increasingly worried about AI in the video game space in general. [...] I'm not sure that the CEOs and the people making the decisions at these sorts of companies understand the difference between actual content and slop. [...]

It's exactly the same cryolab, it's exactly the same robot factory place on all of these different planets. It's like there's so much to explore and nothing to find. [...]

And what was in this contraband chest was a bunch of harvested organs. And I'm like, oh, wow. If this was an actual game that people cared about the making of, this would be something interesting - an interesting bit of environmental storytelling. [...] But it's not, because it's just a cold, heartless, procedurally generated slop. [...]

Like, the point of having a giant open world to explore isn't the size of the world or the amount of stuff in it. It's that all of that stuff, however much there is, was made by someone for a reason.

Felix Nolan, TikTok about AI and procedural generation in video games

# 30th November 2025, 10:48 pm / ai-ethics, slop, game-design, tiktok, generative-ai, ai

The most annoying problem is that the [GitHub] frontend barely works without JavaScript, so we cannot open issues, pull requests, source code or CI logs in Dillo itself, despite them being mostly plain HTML, which I don't think is acceptable. In the past, it used to gracefully degrade without enforcing JavaScript, but now it doesn't.

Rodrigo Arias Mallo, Migrating Dillo from GitHub

# 30th November 2025, 2:32 pm / browsers, progressive-enhancement, github

Large language models (LLMs) can be useful tools, but they are not good at creating entirely new Wikipedia articles. Large language models should not be used to generate new Wikipedia articles from scratch.

Wikipedia content guideline, promoted to a guideline on 24th November 2025

# 29th November 2025, 10:55 am / ai-ethics, slop, generative-ai, wikipedia, ai, llms

To evaluate the model’s capability in processing long-context inputs, we construct a video “Needle-in- a-Haystack” evaluation on Qwen3-VL-235B-A22B-Instruct. In this task, a semantically salient “needle” frame—containing critical visual evidence—is inserted at varying temporal positions within a long video. The model is then tasked with accurately locating the target frame from the long video and answering the corresponding question. [...]

As shown in Figure 3, the model achieves a perfect 100% accuracy on videos up to 30 minutes in duration—corresponding to a context length of 256K tokens. Remarkably, even when extrapolating to sequences of up to 1M tokens (approximately 2 hours of video) via YaRN-based positional extension, the model retains a high accuracy of 99.5%.

Qwen3-VL Technical Report, 5.12.3: Needle-in-a-Haystack

# 27th November 2025, 5:01 pm / vision-llms, evals, generative-ai, ai-in-china, ai, qwen, llms

If the person is unnecessarily rude, mean, or insulting to Claude, Claude doesn't need to apologize and can insist on kindness and dignity from the person it’s talking with. Even if someone is frustrated or unhappy, Claude is deserving of respectful engagement.

Claude Opus 4.5 system prompt, also added to the Sonnet 4.5 and Haiku 4.5 prompts on November 19th 2025

# 24th November 2025, 11:58 pm / system-prompts, anthropic, claude, generative-ai, ai, llms, ai-personality

Previously, when malware developers wanted to go and monetize their exploits, they would do exactly one thing: encrypt every file on a person's computer and request a ransome to decrypt the files. In the future I think this will change.

LLMs allow attackers to instead process every file on the victim's computer, and tailor a blackmail letter specifically towards that person. One person may be having an affair on their spouse. Another may have lied on their resume. A third may have cheated on an exam at school. It is unlikely that any one person has done any of these specific things, but it is very likely that there exists something that is blackmailable for every person. Malware + LLMs, given access to a person's computer, can find that and monetize it.

Nicholas Carlini, Are large language models worth it? Misuse: malware at scale

# 20th November 2025, 1:01 am / ai-ethics, generative-ai, nicholas-carlini, ai, llms

Cloudflare's network began experiencing significant failures to deliver core network traffic [...] triggered by a change to one of our database systems' permissions which caused the database to output multiple entries into a “feature file” used by our Bot Management system. That feature file, in turn, doubled in size. The larger-than-expected feature file was then propagated to all the machines that make up our network. [...] The software had a limit on the size of the feature file that was below its doubled size. That caused the software to fail. [...]

This resulted in the following panic which in turn resulted in a 5xx error:

thread fl2_worker_thread panicked: called Result::unwrap() on an Err value

Matthew Prince, Cloudflare outage on November 18, 2025, see also this comment

# 19th November 2025, 8:02 am / scaling, postmortem, cloudflare, rust

Three years ago, we were impressed that a machine could write a poem about otters. Less than 1,000 days later, I am debating statistical methodology with an agent that built its own research environment. The era of the chatbot is turning into the era of the digital coworker. To be very clear, Gemini 3 isn’t perfect, and it still needs a manager who can guide and check it. But it suggests that “human in the loop” is evolving from “human who fixes AI mistakes” to “human who directs AI work.” And that may be the biggest change since the release of ChatGPT.

Ethan Mollick, Three Years from GPT-3 to Gemini 3

# 18th November 2025, 7:24 pm / gemini, ethan-mollick, generative-ai, chatgpt, ai, llms, ai-agents

With AI now, we are able to write new programs that we could never hope to write by hand before. We do it by specifying objectives (e.g. classification accuracy, reward functions), and we search the program space via gradient descent to find neural networks that work well against that objective.

This is my Software 2.0 blog post from a while ago. In this new programming paradigm then, the new most predictive feature to look at is verifiability. If a task/job is verifiable, then it is optimizable directly or via reinforcement learning, and a neural net can be trained to work extremely well. It's about to what extent an AI can "practice" something.

The environment has to be resettable (you can start a new attempt), efficient (a lot attempts can be made), and rewardable (there is some automated process to reward any specific attempt that was made).

Andrej Karpathy

# 16th November 2025, 6:29 pm / andrej-karpathy, generative-ai, ai-agents, ai, llms

On Monday, this Court entered an order requiring OpenAI to hand over to the New York Times and its co-plaintiffs 20 million ChatGPT user conversations [...]

OpenAI is unaware of any court ordering wholesale production of personal information at this scale. This sets a dangerous precedent: it suggests that anyone who files a lawsuit against an AI company can demand production of tens of millions of conversations without first narrowing for relevance. This is not how discovery works in other cases: courts do not allow plaintiffs suing Google to dig through the private emails of tens of millions of Gmail users irrespective of their relevance. And it is not how discovery should work for generative AI tools either.

Nov 12th letter from OpenAI to Judge Ona T. Wang, re: OpenAI, Inc., Copyright Infringement Litigation

# 13th November 2025, 4:34 pm / openai, privacy, ai, llms, chatgpt, ai-ethics, generative-ai, law, new-york-times

The fact that MCP is a difference surface from your normal API allows you to ship MUCH faster to MCP. This has been unlocked by inference at runtime

Normal APIs are promises to developers, because developer commit code that relies on those APIs, and then walk away. If you break the API, you break the promise, and you break that code. This means a developer gets woken up at 2am to fix the code

But MCP servers are called by LLMs which dynamically read the spec every time, which allow us to constantly change the MCP server. It doesn't matter! We haven't made any promises. The LLM can figure it out afresh every time

Steve Krouse

# 12th November 2025, 5:21 pm / model-context-protocol, generative-ai, steve-krouse, apis, ai, llms

Netflix asks partners to consider the following guiding principles before leveraging GenAI in any creative workflow: 

  1. The outputs do not replicate or substantially recreate identifiable characteristics of unowned or copyrighted material, or infringe any copyright-protected works
  2. The generative tools used do not store, reuse, or train on production data inputs or outputs.
  3. Where possible, generative tools are used in an enterprise-secured environment to safeguard inputs.
  4. Generated material is temporary and not part of the final deliverables.
  5. GenAI is not used to replace or generate new talent performances or union-covered work without consent.

[...] If you answer "no" or "unsure" to any of these principles, escalate to your Netflix contact for more guidance before proceeding, as written approval may be required.

Netflix, Using Generative AI in Content Production

# 10th November 2025, 10:08 pm / ai-ethics, netflix, ai, generative-ai

The big advantage of MCP over OpenAPI is that it is very clear about auth. [...]

Maybe an agent could read the docs and write code to auth. But we don't actually want that, because it implies the agent gets access to the API token! We want the agent's harness to handle that and never reveal the key to the agent. [...]

OAuth has always assumed that the client knows what API it's talking to, and so the client's developer can register the client with that API in advance to get a client_id/client_secret pair. Agents, though, don't know what MCPs they'll talk to in advance.

So MCP requires OAuth dynamic client registration (RFC 7591), which practically nobody actually implemented prior to MCP. DCR might as well have been introduced by MCP, and may actually be the most important unlock in the whole spec.

Kenton Varda

# 8th November 2025, 10:04 pm / kenton-varda, security, oauth, model-context-protocol, generative-ai, ai, llms

I have AiDHD

It has never been easier to build an MVP and in turn, it has never been harder to keep focus. When new features always feel like they're just a prompt away, feature creep feels like a never ending battle. Being disciplined is more important than ever.

AI still doesn't change one very important thing: you still need to make something people want. I think that getting users (even free ones) will become significantly harder as the bar for user's time will only get higher as their options increase.

Being quicker to get to the point of failure is actually incredibly valuable. Even just over a year ago, many of these projects would have taken months to build.

Josh Cohenzadeh, AiDHD

# 7th November 2025, 4:38 pm / vibe-coding, ai-assisted-programming, generative-ai, ai, llms

My trepidation extends to complex literature searches. I use LLMs as secondary librarians when I’m doing research. They reliably find primary sources (articles, papers, etc.) that I miss in my initial searches.

But these searches are dangerous. I distrust LLM librarians. There is so much data in the world: you can (in good faith!) find evidence to support almost any position or conclusion. ChatGPT is not a human, and, unlike teachers & librarians & scholars, ChatGPT does not have a consistent, legible worldview. In my experience, it readily agrees with any premise you hand it — and brings citations. It may have read every article that can be read, but it has no real opinion — so it is not a credible expert.

Ben Stolovitz, How I use AI

# 7th November 2025, 12:15 am / ai-assisted-search, llms, ai, generative-ai

At the start of the year, most people loosely following AI probably knew of 0 [Chinese] AI labs. Now, and towards wrapping up 2025, I’d say all of DeepSeek, Qwen, and Kimi are becoming household names. They all have seasons of their best releases and different strengths. The important thing is this’ll be a growing list. A growing share of cutting edge mindshare is shifting to China. I expect some of the likes of Z.ai, Meituan, or Ant Ling to potentially join this list next year. For some of these labs releasing top tier benchmark models, they literally started their foundation model effort after DeepSeek. It took many Chinese companies only 6 months to catch up to the open frontier in ballpark of performance, now the question is if they can offer something in a niche of the frontier that has real demand for users.

Nathan Lambert, 5 Thoughts on Kimi K2 Thinking

# 6th November 2025, 9:44 pm / llms, ai, ai-in-china, generative-ai, nathan-lambert, kimi, moonshot

I'm worried that they put co-pilot in Excel because Excel is the beast that drives our entire economy and do you know who has tamed that beast?

Brenda.

Who is Brenda?

She is a mid-level employee in every finance department, in every business across this stupid nation and the Excel goddess herself descended from the heavens, kissed Brenda on her forehead and the sweat from Brenda's brow is what allows us to do capitalism. [...]

She's gonna birth that formula for a financial report and then she's gonna send that financial report to a higher up and he's gonna need to make a change to the report and normally he would have sent it back to Brenda but he's like oh I have AI and AI is probably like smarter than Brenda and then the AI is gonna fuck it up real bad and he won't be able to recognize it because he doesn't understand Excel because AI hallucinates.

You know who's not hallucinating?

Brenda.

Ada James, @belligerentbarbies on TikTok

# 5th November 2025, 3:50 am / generative-ai, ai, excel, hallucinations, llms, tiktok, ai-ethics

Every time an engineer evaluates a language that isn’t “theirs,” their brain is literally working against them. They’re not just analyzing technical trade offs, they’re contemplating a version of themselves that doesn’t exist yet, that feels threatening to the version that does. The Python developer reads case studies about Go’s performance and their amygdala quietly marks each one as a threat to be neutralized. The Rust advocate looks at identical problems and their Default Mode Network constructs narratives about why “only” Rust can solve them.

We’re not lying. We genuinely believe our reasoning is sound. That’s what makes identity based thinking so expensive, and so invisible.

Steve Francia, Why Engineers Can't Be Rational About Programming Languages

# 4th November 2025, 2:54 am / technical-debt, psychology, programming-languages

Dear PEP 810 authors. The Steering Council is happy to unanimously accept "PEP 810, Explicit lazy imports". Congratulations! We appreciate the way you were able to build on and improve the previously discussed (and rejected) attempt at lazy imports as proposed in PEP 690.

Barry Warsaw, on behalf of the Python Steering Council

# 3rd November 2025, 9:27 pm / python, barry-warsaw

Interleaved thinking is essential for LLM agents: it means alternating between explicit reasoning and tool use, while carrying that reasoning forward between steps.This process significantly enhances planning, self‑correction, and reliability in long workflows. [...]

From community feedback, we've often observed failures to preserve prior-round thinking state across multi-turn interactions with M2. The root cause is that the widely-used OpenAI Chat Completion API does not support passing reasoning content back in subsequent requests. Although the Anthropic API natively supports this capability, the community has provided less support for models beyond Claude, and many applications still omit passing back the previous turns' thinking in their Anthropic API implementations. This situation has resulted in poor support for Interleaved Thinking for new models. To fully unlock M2's capabilities, preserving the reasoning process across multi-turn interactions is essential.

MiniMax, Interleaved Thinking Unlocks Reliable MiniMax-M2 Agentic Capability

# 3rd November 2025, 5:24 pm / generative-ai, ai-agents, llm-reasoning, definitions, ai, ai-in-china, llms, minimax

I plan to introduce hard Rust dependencies and Rust code into APT, no earlier than May 2026. This extends at first to the Rust compiler and standard library, and the Sequoia ecosystem.

In particular, our code to parse .deb, .ar, .tar, and the HTTP signature verification code would strongly benefit from memory safe languages and a stronger approach to unit testing.

If you maintain a port without a working Rust toolchain, please ensure it has one within the next 6 months, or sunset the port.

Julian Andres Klode, debian-devel mailing list

# 1st November 2025, 5:34 pm / open-source, debian, rust, linux

To really understand a concept, you have to "invent" it yourself in some capacity. Understanding doesn't come from passive content consumption. It is always self-built. It is an active, high-agency, self-directed process of creating and debugging your own mental models.

François Chollet

# 30th October 2025, 2:37 am / francois-chollet, teaching

Claude doesn't make me much faster on the work that I am an expert on. Maybe 15-20% depending on the day.

It's the work that I don't know how to do and would have to research. Or the grunge work I don't even want to do. On this it is hard to even put a number on. Many of the projects I do with Claude day to day I just wouldn't have done at all pre-Claude.

Infinity% improvement in productivity on those.

Aaron Boodman

# 28th October 2025, 2:08 am / ai-assisted-programming, claude, generative-ai, ai, llms, aaron-boodman, productivity

If you have an AGENTS.md file, you can source it in your CLAUDE.md using @AGENTS.md to maintain a single source of truth.

Claude Docs, with the official answer to standardizing on AGENTS.md

# 25th October 2025, 4:57 am / coding-agents, anthropic, claude, claude-code, generative-ai, ai, llms