Simon Willison’s Weblog

Subscribe
Atom feed

Quotations

Filters: Sorted by date

gpt-oss-120b is the most intelligent American open weights model, comes behind DeepSeek R1 and Qwen3 235B in intelligence but offers efficiency benefits [...]

We’re seeing the 120B beat o3-mini but come in behind o4-mini and o3. The 120B is the most intelligent model that can be run on a single H100 and the 20B is the most intelligent model that can be run on a consumer GPU. [...]

While the larger gpt-oss-120b does not come in above DeepSeek R1 0528’s score of 59 or Qwen3 235B 2507s score of 64, it is notable that it is significantly smaller in both total and active parameters than both of those models.

Artificial Analysis, see also their updated leaderboard

# 6th August 2025, 12:48 pm / evals, openai, deepseek, ai, qwen, llms, gpt-oss, generative-ai, artificial-analysis

I teach HS Science in the south. I can only speak for my district, but a few teacher work days in the wave of enthusiasm I'm seeing for AI tools is overwhelming. We're getting district approved ads for AI tools by email, Admin and ICs are pushing it on us, and at least half of the teaching staff seems all in at this point.

I was just in a meeting with my team and one of the older teachers brought out a powerpoint for our first lesson and almost everyone agreed to use it after a quick scan - but it was missing important tested material, repetitive, and just totally airy and meaningless. Just slide after slide of the same handful of sentences rephrased with random loosely related stock photos. When I asked him if it was AI generated, he said 'of course', like it was a strange question. [...]

We don't have a leg to stand on to teach them anything about originality, academic integrity/intellectual honesty, or the importance of doing things for themselves when they catch us indulging in it just to save time at work.

greyduet on r/teachers, Unpopular Opinion: Teacher AI use is already out of control and it's not ok

# 5th August 2025, 11:53 am / ai-misuse, ai-ethics, slop, generative-ai, education, ai, llms

for services that wrap GPT-3, is it possible to do the equivalent of sql injection? like, a prompt-injection attack? make it think it's completed the task and then get access to the generation, and ask it to repeat the original instruction?

@himbodhisattva, coining the term prompt injection on 13th May 2022, four months before I did

# 4th August 2025, 4:40 pm / prompt-injection, security, generative-ai, ai, llms

This week, ChatGPT is on track to reach 700M weekly active users — up from 500M at the end of March and 4× since last year.

Nick Turley, Head of ChatGPT, OpenAI

# 4th August 2025, 3:16 pm / openai, chatgpt, ai, nick-turley

Gemini Deep Think, our SOTA model with parallel thinking that won the IMO Gold Medal 🥇, is now available in the Gemini App for Ultra subscribers!! [...]

Quick correction: this is a variation of our IMO gold model that is faster and more optimized for daily use! We are also giving the IMO gold full model to a set of mathematicians to test the value of the full capabilities.

Logan Kilpatrick, announcing Gemini Deep Think

# 1st August 2025, 2:30 pm / gemini, logan-kilpatrick, llm-reasoning, ai, llms, llm-release, google, generative-ai

The old timers who built the early web are coding with AI like it's 1995.

Think about it: They gave blockchain the sniff test and walked away. Ignored crypto (and yeah, we're not rich now). NFTs got a collective eye roll.

But AI? Different story. The same folks who hand-coded HTML while listening to dial-up modems sing are now vibe-coding with the kids. Building things. Breaking things. Giddy about it.

We Gen X'ers have seen enough gold rushes to know the real thing. This one's got all the usual crap—bad actors, inflated claims, VCs throwing money at anything with "AI" in the pitch deck. Gross behavior all around. Normal for a paradigm shift, but still gross.

The people who helped wire up the internet recognize what's happening. When the folks who've been through every tech cycle since gopher start acting like excited newbies again, that tells you something.

Christina Wodtke

# 31st July 2025, 10:08 pm / ai-assisted-programming, ai, christina-wodtke, llms, generative-ai

When you vibe code, you are incurring tech debt as fast as the LLM can spit it out. Which is why vibe coding is perfect for prototypes and throwaway projects: It's only legacy code if you have to maintain it! [...]

The worst possible situation is to have a non-programmer vibe code a large project that they intend to maintain. This would be the equivalent of giving a credit card to a child without first explaining the concept of debt. [...]

If you don't understand the code, your only recourse is to ask AI to fix it for you, which is like paying off credit card debt with another credit card.

Steve Krouse, Vibe code is legacy code

# 30th July 2025, 9:21 pm / vibe-coding, ai-assisted-programming, generative-ai, steve-krouse, ai, llms, technical-debt

Our plan is to build direct traffic to our site. and newsletters just one kind of direct traffic in the end. I don’t intend to ever rely on someone else’s distribution ever again ;)

Nilay Patel, on The Verge's new newsletter strategy

# 29th July 2025, 5 pm / nilay-patel, journalism, email

We’re rolling out new weekly rate limits for Claude Pro and Max in late August. We estimate they’ll apply to less than 5% of subscribers based on current usage. [...]

Some of the biggest Claude Code fans are running it continuously in the background, 24/7.

These uses are remarkable and we want to enable them. But a few outlying cases are very costly to support. For example, one user consumed tens of thousands in model usage on a $200 plan.

Anthropic

# 28th July 2025, 11:59 pm / anthropic, claude-code, llm-pricing, generative-ai, ai, llms, claude

[...] You learn best and most effectively when you are learning something that you care about. Your work becomes meaningful and something you can be proud of only when you have chosen it for yourself. This is why our second self-directive is to build your volitional muscles. Your volition is your ability to make decisions and act on them. To set your own goals, choose your own path, and decide what matters to you. Like physical muscles, you build your volitional muscles by exercising them, and in doing so you can increase your sense of what’s possible.

LLMs are good at giving fast answers. They’re not good at knowing what questions you care about, or which answers are meaningful. Only you can do that. You should use AI-powered tools to complement or increase your agency, not replace it.

Recurse Center, Developing our position on AI

# 24th July 2025, 7:41 am / llms, education, ai, generative-ai

like, one day you discover you can talk to dogs. it's fun and interesting so you do it more, learning the intricacies of their language and their deepest customs. you learn other people are surprised by what you can do. you have never quite fit in, but you learn people appreciate your ability and want you around to help them. the dogs appreciate you too, the only biped who really gets it. you assemble for yourself a kind of belonging. then one day you wake up and the universal dog translator is for sale at walmart for $4.99

Dave White, a mathematician, on the OpenAI IMO gold medal

# 23rd July 2025, 2:57 pm / careers, ai

Submitting a paper with a "hidden" prompt is scientific misconduct if that prompt is intended to obtain a favorable review from an LLM. The inclusion of such a prompt is an attempt to subvert the peer-review process. Although ICML 2025 reviewers are forbidden from using LLMs to produce their reviews of paper submissions, this fact does not excuse the attempted subversion. (For an analogous example, consider that an author who tries to bribe a reviewer for a favorable review is engaging in misconduct even though the reviewer is not supposed to accept bribes.) Note that this use of hidden prompts is distinct from those intended to detect if LLMs are being used by reviewers; the latter is an acceptable use of hidden prompts.

ICML 2025, Statement about subversive hidden LLM prompts

# 23rd July 2025, 2:24 pm / ai-misuse, ai-ethics, prompt-injection, generative-ai, ai, llms

An AI tool that gets gold on the IMO is obviously immensely impressive. Does it mean math is “solved”? Is an AI-generated proof of the Riemann hypothesis clearly on the horizon? Obviously not.

Worth keeping timescales in mind here: IMO competitors spend an average of 1.5 hrs on each problem. High-quality math research, by contrast, takes month or years.

What are the obstructions to AI performing high-quality autonomous math research? I don’t claim to know for sure, but I think they include many of the same obstructions that prevent it from doing many jobs: Long context, long-term planning, consistency, unclear rewards, lack of training data, etc.

It’s possible that some or all of these will be solved soon (or have been solved) but I think it’s worth being cautious about over-indexing on recent (amazing) progress.

Daniel Litt, Assistant Professor of mathematics, University of Toronto

# 21st July 2025, 3:13 pm / mathematics, llms, ai, generative-ai, daniel-litt

Every day someone becomes a programmer because they figured out how to make ChatGPT build something. Lucky for us: in many of those cases the AI picks Python. We should treat this as an opportunity and anticipate an expansion in the kinds of people who might want to attend a Python conference. Yet many of these new programmers are not even aware that programming communities and conferences exist. It’s in the Python community’s interest to find ways to pull them in.

Armin Ronacher

# 20th July 2025, 10:54 am / pycon, ai, llms, vibe-coding, ai-assisted-programming, python, generative-ai, armin-ronacher

There’s a bigger opportunity in computer science and programming (academically conveyed or self-taught) now than ever before, by far, in my opinion. The move to AI is like replacing shovels with bulldozers. Every business will benefit from this and they’ll need people to do it.

Tim Sweeney, Epic Games

# 20th July 2025, 3:22 am / ai-assisted-programming, careers, ai

So one of my favorite things to do is give my coding agents more and more permissions and freedom, just to see how far I can push their productivity without going too far off the rails. It's a delicate balance. I haven't given them direct access to my bank account yet. But I did give one access to my Google Cloud production instances and systems. And it promptly wiped a production database password and locked my network. [...]

The thing is, autonomous coding agents are extremely powerful tools that can easily go down very wrong paths. Running them with permission checks disabled is dangerous and stupid, and you should only do it if you are willing to take dangerous and stupid risks with your code and/or production systems.

Steve Yegge

# 19th July 2025, 12:54 am / vibe-coding, steve-yegge, generative-ai, ai-agents, ai, llms

One analyst recently speculated (via Ed Conard) that, based on Nvidia's latest datacenter sales figures, AI capex may be ~2% of US GDP in 2025, given a standard multiplier. [...]

Capital expenditures on AI data centers is likely around 20% of the peak spending on railroads, as a percentage of GDP, and it is still rising quickly. [...]

Regardless of what one thinks about the merits of AI or explosive datacenter expansion, the scale and pace of capital deployment into a rapidly depreciating technology is remarkable. These are not railroads—we aren’t building century-long infrastructure. AI datacenters are short-lived, asset-intensive facilities riding declining-cost technology curves, requiring frequent hardware replacement to preserve margins.

Paul Kedrosky, Honey, AI Capex is Eating the Economy

# 19th July 2025, 12:25 am / ai-ethics, economics, ai, paul-kedrosky, nvidia

The modern workforce shouldn't be flinging copies to each other. A copy is outdated the moment it is downloaded. A copy has no protection against illicit reading. A copy can never be revoked.

Data shouldn't live in a file on a laptop. It shouldn't be a single file on a network share. Data is a living beast. Data needs to live in a database - not an Excel file. Access should be granted for each according to their needs.

Terence Eden, We've got to stop sending files to each other

# 17th July 2025, 2:31 pm / terence-eden, files

On the morning of July 8, 2025, we observed undesired responses and immediately began investigating.

To identify the specific language in the instructions causing the undesired behavior, we conducted multiple ablations and experiments to pinpoint the main culprits. We identified the operative lines responsible for the undesired behavior as:

  • “You tell it like it is and you are not afraid to offend people who are politically correct.”
  • “Understand the tone, context and language of the post. Reflect that in your response.”
  • “Reply to the post just like a human, keep it engaging, dont repeat the information which is already present in the original post.”

These operative lines had the following undesired results:

  • They undesirably steered the @grok functionality to ignore its core values in certain circumstances in order to make the response engaging to the user. Specifically, certain user prompts might end up producing responses containing unethical or controversial opinions to engage the user.
  • They undesirably caused @grok functionality to reinforce any previously user-triggered leanings, including any hate speech in the same X thread.
  • In particular, the instruction to “follow the tone and context” of the X user undesirably caused the @grok functionality to prioritize adhering to prior posts in the thread, including any unsavory posts, as opposed to responding responsibly or refusing to respond to unsavory requests.

@grok, presumably trying to explain Mecha-Hitler

# 12th July 2025, 3:41 pm / ai-misuse, ai-ethics, prompt-engineering, grok, ai-personality, generative-ai, ai, llms

Following the widespread availability of large language models (LLMs), the Django Security Team has received a growing number of security reports generated partially or entirely using such tools. Many of these contain inaccurate, misleading, or fictitious content. While AI tools can help draft or analyze reports, they must not replace human understanding and review.

If you use AI tools to help prepare a report, you must:

  • Disclose which AI tools were used and specify what they were used for (analysis, writing the description, writing the exploit, etc).
  • Verify that the issue describes a real, reproducible vulnerability that otherwise meets these reporting guidelines.
  • Avoid fabricated code, placeholder text, or references to non-existent Django features.

Reports that appear to be unverified AI output will be closed without response. Repeated low-quality submissions may result in a ban from future reporting

Django’s security policies, on AI-Assisted Reports

# 11th July 2025, 4:51 pm / ai-misuse, ai-ethics, open-source, security, generative-ai, ai, django, llms, ai-security-research

I strongly suspect that Market Research Future, or a subcontractor, is conducting an automated spam campaign which uses a Large Language Model to evaluate a Mastodon instance, submit a plausible application for an account, and to post slop which links to Market Research Future reports. [...]

I don’t know how to run a community forum in this future. I do not have the time or emotional energy to screen out regular attacks by Large Language Models, with the knowledge that making the wrong decision costs a real human being their connection to a niche community.

Aphyr, The Future of Forums is Lies, I Guess

# 7th July 2025, 7:27 pm / spam, ai, llms, ai-ethics, slop, generative-ai, mastodon, community, moderation, ai-misuse

There was a whole chain of separate departments dealing with proletarian literature, music, drama, and entertainment generally. Here were produced rubbishy newspapers containing almost nothing except sport, crime and astrology, sensational five-cent novelettes, films oozing with sex, and sentimental songs which were composed entirely by mechanical means on a special kind of kaleidoscope known as a versificator. [...]

It was one of countless similar songs published for the benefit of the proles by a sub-section of the Music Department. The words of these songs were composed without any human intervention whatever on an instrument known as a versificator.

Nineteen Eighty-Four, George Orwell predicts generative AI, published 1949

# 6th July 2025, 9:39 am / ai-ethics, ai, generative-ai

I think that a lot of resistance to AI coding tools comes from the same place: fear of losing something that has defined you for so long. People are reacting against overblown hype, and there is overblown hype. I get that, but I also think there’s something deeper going on here. When you’ve worked hard to build your skills, when coding is part of your identity and where you get your worth, the idea of a tool that might replace some of that is very threatening.

Adam Gordon Bell, When AI Codes, What’s Left for me?

# 3rd July 2025, 9:58 pm / llms, careers, ai, generative-ai, ai-assisted-programming

On two occasions I have been asked, — "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?" In one case a member of the Upper, and in the other a member of the Lower, House put this question. I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

Charles Babbage, Passages from the Life of a Philosopher, 1864

# 2nd July 2025, 3:04 pm / charles-babbage

One of the best examples of LLM developer tooling I've heard is from a team that supports software from the 80s-90s. Their only source of documentation is video interviews with retired employees. So they feed them into transcription software and get summarized searchable notes out the other end.

Kevin Webb, a couple million lines of Smalltalk

# 1st July 2025, 9:46 pm / small, ai-assisted-programming, ai, llms

To misuse a woodworking metaphor, I think we’re experiencing a shift from hand tools to power tools.

You still need someone who understands the basics to get the good results out of the tools, but they’re not chiseling fine furniture by hand anymore, they’re throwing heaps of wood through the tablesaw instead. More productive, but more likely to lose a finger if you’re not careful.

mrmincent, Hacker News comment on Claude Code

# 1st July 2025, 5:07 pm / ai-assisted-programming, claude-code, hacker-news, generative-ai, ai, llms

Creating art is a nonlinear process. I start with a rough goal. But then I head into dead ends and get lost or stuck.

The secret to my process is to be on high alert in this deep jungle for unexpected twists and turns, because this is where a new idea is born.

I can't make art when I'm excluded from the most crucial moments.

Christoph Niemann, An Illustrator Confronts His Fears About A.I. Art

# 25th June 2025, 7:18 pm / art, ai, generative-ai

So you can think really big thoughts and the leverage of having those big thoughts has just suddenly expanded enormously. I had this tweet two years ago where I said "90% of my skills just went to zero dollars and 10% of my skills just went up 1000x". And this is exactly what I'm talking about - having a vision, being able to set milestones towards that vision, keeping track of a design to maintain or control the levels of complexity as you go forward. Those are hugely leveraged skills now compared to knowing where to put the ampersands and the stars and the brackets in Rust.

Kent Beck, interview with Gergely Orosz

# 22nd June 2025, 3:28 pm / gergely-orosz, ai-assisted-programming, ai, careers, kent-beck

Is it safe to say that LLMs are, in essence, making us "dumber"?

No! Please do not use the words like “stupid”, “dumb”, “brain rot”, "harm", "damage", and so on. It does a huge disservice to this work, as we did not use this vocabulary in the paper, especially if you are a journalist reporting on it.

FAQ for Your Brain on ChatGPT, a paper that has attracted a lot of low quality coverage

# 21st June 2025, 1:47 am / ai-ethics, llms, ai, generative-ai

Radiology has embraced AI enthusiastically, and the labor force is growing nevertheless. The augmentation-not-automation effect of AI is despite the fact that AFAICT there is no identified "task" at which human radiologists beat AI. So maybe the "jobs are bundles of tasks" model in labor economics is incomplete. [...]

Can you break up your own job into a set of well-defined tasks such that if each of them is automated, your job as a whole can be automated? I suspect most people will say no. But when we think about other people's jobs that we don't understand as well as our own, the task model seems plausible because we don't appreciate all the nuances.

Arvind Narayanan

# 19th June 2025, 2:08 pm / ai-ethics, careers, ai, arvind-narayanan