Entries

Filters: Sorted by date

3,263 results «« first « previous page 2 / 109 next » last »»

What happens if AI labs train for pelicans riding bicycles?

Almost every time I share a new example of an SVG of a pelican riding a bicycle a variant of this question pops up: how do you know the labs aren’t training for your benchmark?

[... 325 words]

4:03 pm / 13th November 2025 / ai, generative-ai, llms, pelican-riding-a-bicycle

Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelican

OpenAI partially released a new model yesterday called GPT-5-Codex-Mini, which they describe as "a more compact and cost-efficient version of GPT-5-Codex". It’s currently only available via their Codex CLI tool and VS Code extension, with proper API access "coming soon". I decided to use Codex to reverse engineer the Codex CLI tool and give me the ability to prompt the new model directly.

[... 1,774 words]

3:31 am / 9th November 2025 / ai, rust, openai, generative-ai, llms, ai-assisted-programming, pelican-riding-a-bicycle, llm-release, vibe-coding, coding-agents, gpt-5, codex-cli, gpt-codex

Video + notes on upgrading a Datasette plugin for the latest 1.0 alpha, with help from uv and OpenAI Codex CLI

I’m upgrading various plugins for compatibility with the new Datasette 1.0a20 alpha release and I decided to record a video of the process. This post accompanies that video with detailed additional notes.

[... 1,094 words]

6:26 pm / 6th November 2025 / plugins, python, youtube, ai, datasette, generative-ai, llms, ai-assisted-programming, uv, coding-agents, codex-cli

Code research projects with async coding agents like Claude Code and Codex

I’ve been experimenting with a pattern for LLM usage recently that’s working out really well: asynchronous code research tasks. Pick a research question, spin up an asynchronous coding agent and let it go and run some experiments and report back when it’s done.

[... 2,017 words]

3:53 pm / 6th November 2025 / ai, webassembly, generative-ai, llms, ai-assisted-programming, slop, ai-agents, coding-agents, claude-code, jules, codex-cli

A new SQL-powered permissions system in Datasette 1.0a20

Datasette 1.0a20 is out with the biggest breaking API change on the road to 1.0, improving how Datasette’s permissions system works by migrating permission logic to SQL running in SQLite. This release involved 163 commits, with 10,660 additions and 1,825 deletions, most of which was written with the help of Claude Code.

[... 2,750 words]

9:34 pm / 4th November 2025 / plugins, projects, python, sql, sqlite, datasette, annotated-release-notes, uv, coding-agents, claude-code, codex-cli

New prompt injection papers: Agents Rule of Two and The Attacker Moves Second

Two interesting new papers regarding LLM security and prompt injection came to my attention this weekend.

[... 1,433 words]

11:09 pm / 2nd November 2025 / definitions, security, openai, prompt-injection, anthropic, nicholas-carlini, paper-review, lethal-trifecta

Hacking the WiFi-enabled color screen GitHub Universe conference badge

I’m at GitHub Universe this week (thanks to a free ticket from Microsoft). Yesterday I picked up my conference badge... which incorporates a ~~full Raspberry Pi~~ Raspberry Pi Pico microcontroller with a battery, color screen, WiFi and bluetooth.

[... 1,307 words]

5:17 pm / 28th October 2025 / github, hardware-hacking, microsoft, ai, generative-ai, raspberry-pi, llms, claude-code, disclosures

Video: Building a tool to copy-paste share terminal sessions using Claude Code for web

This afternoon I was manually converting a terminal session into a shared HTML file for the umpteenth time when I decided to reduce the friction by building a custom tool for it—and on the spur of the moment I fired up Descript to record the process. The result is this new 11 minute YouTube video showing my workflow for vibe-coding simple tools from start to finish.

[... 1,338 words]

4:14 am / 23rd October 2025 / github, tools, youtube, ai, cloudflare, generative-ai, llms, ai-assisted-programming, anthropic, claude, vibe-coding, coding-agents, claude-code, async-coding-agents

Dane Stuckey (OpenAI CISO) on prompt injection risks for ChatGPT Atlas

My biggest complaint about the launch of the ChatGPT Atlas browser the other day was the lack of details on how OpenAI are addressing prompt injection attacks. The launch post mostly punted that question to the System Card for their “ChatGPT agent” browser automation feature from July. Since this was my single biggest question about Atlas I was disappointed not to see it addressed more directly.

[... 1,199 words]

8:43 pm / 22nd October 2025 / security, ai, openai, prompt-injection, generative-ai, llms, ai-agents, browser-agents

Living dangerously with Claude

I gave a talk last night at Claude Code Anonymous in San Francisco, the unofficial meetup for coding agent enthusiasts. I decided to talk about a dichotomy I’ve been struggling with recently. On the one hand I’m getting enormous value from running coding agents with as few restrictions as possible. On the other hand I’m deeply concerned by the risks that accompany that freedom.

[... 2,208 words]

12:20 pm / 22nd October 2025 / sandboxing, security, ai, webassembly, prompt-injection, generative-ai, llms, anthropic, claude, annotated-talks, ai-agents, coding-agents, claude-code, lethal-trifecta, async-coding-agents

Claude Code for web—a new asynchronous coding agent from Anthropic

Anthropic launched Claude Code for web this morning. It’s an asynchronous coding agent—their answer to OpenAI’s Codex Cloud and Google’s Jules, and has a very similar shape. I had preview access over the weekend and I’ve already seen some very promising results from it.

[... 1,434 words]

7:43 pm / 20th October 2025 / armin-ronacher, jinja, sandboxing, security, ai, prompt-injection, generative-ai, llms, anthropic, claude, coding-agents, claude-code, lethal-trifecta, async-coding-agents, disclosures

Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code

DeepSeek released a new model yesterday: DeepSeek-OCR, a 6.6GB model fine-tuned specifically for OCR. They released it as model weights that run using PyTorch and CUDA. I got it running on the NVIDIA Spark by having Claude Code effectively brute force the challenge of getting it working on that particular hardware.

[... 1,971 words]

5:21 pm / 20th October 2025 / ocr, python, ai, docker, pytorch, generative-ai, llms, ai-assisted-programming, anthropic, claude, nvidia, vs-code, vision-llms, deepseek, llm-release, coding-agents, claude-code, ai-in-china, nvidia-spark

Claude Skills are awesome, maybe a bigger deal than MCP

Anthropic this morning introduced Claude Skills, a new pattern for making new abilities available to their models:

[... 1,864 words]

9:25 pm / 16th October 2025 / definitions, ai, prompt-engineering, generative-ai, llms, anthropic, claude, code-interpreter, ai-agents, coding-agents, claude-code, skills

NVIDIA DGX Spark: great hardware, early days for the ecosystem

NVIDIA sent me a preview unit of their new DGX Spark desktop “AI supercomputer”. I’ve never had hardware to review before! You can consider this my first ever sponsored post if you like, but they did not pay me any cash and aside from an embargo date they did not request (nor would I grant) any editorial input into what I write about the device.

[... 1,846 words]

11:36 pm / 14th October 2025 / hardware, ai, docker, tailscale, generative-ai, local-llms, llms, nvidia, ollama, llama-cpp, coding-agents, claude-code, lm-studio, disclosures, nvidia-spark

Claude can write complete Datasette plugins now

This isn’t necessarily surprising, but it’s worth noting anyway. Claude Sonnet 4.5 is capable of building a full Datasette plugin now.

[... 1,296 words]

11:43 pm / 8th October 2025 / plugins, projects, python, ai, datasette, generative-ai, llms, ai-assisted-programming, anthropic, claude, uv, coding-agents, claude-code

Vibe engineering

I feel like vibe coding is pretty well established now as covering the fast, loose and irresponsible way of building software with AI—entirely prompt-driven, and with no attention paid to how the code actually works. This leaves us with a terminology gap: what should we call the other end of the spectrum, where seasoned professionals accelerate their work with LLMs while staying proudly and confidently accountable for the software they produce?

[... 1,313 words]

2:32 pm / 7th October 2025 / code-review, definitions, software-engineering, ai, generative-ai, llms, ai-assisted-programming, vibe-coding, coding-agents, parallel-agents

OpenAI DevDay 2025 live blog

I’m at OpenAI DevDay in Fort Mason, San Francisco today. As I did last year, I’m going to be live blogging the announcements from the kenote. Unlike last year, this year there’s a livestream.

[... 57 words]

5:03 pm / 6th October 2025 / ai, openai, generative-ai, llms, disclosures, live-blog

Embracing the parallel coding agent lifestyle

For a while now I’ve been hearing from engineers who run multiple coding agents at once—firing up several Claude Code or Codex CLI instances at the same time, sometimes in the same repo, sometimes against multiple checkouts or git worktrees.

[... 1,275 words]

12:06 pm / 5th October 2025 / ai, generative-ai, llms, ai-assisted-programming, ai-agents, coding-agents, claude-code, async-coding-agents, jules, codex-cli, parallel-agents, jesse-vincent, peter-steinberger

Designing agentic loops

Coding agents like Anthropic’s Claude Code and OpenAI’s Codex CLI represent a genuine step change in how useful LLMs can be for producing working code. These agents can now directly exercise the code they are writing, correct errors, dig through existing implementation details, and even run experiments to find effective code solutions to problems.

[... 1,667 words]

3:20 pm / 30th September 2025 / definitions, ai, generative-ai, llms, ai-assisted-programming, ai-agents, coding-agents, async-coding-agents

Claude Sonnet 4.5 is probably the “best coding model in the world” (at least for now)

Anthropic released Claude Sonnet 4.5 today, with a very bold set of claims:

[... 1,205 words]

6:11 pm / 29th September 2025 / ai, generative-ai, llms, llm, anthropic, claude, code-interpreter, llm-tool-use, llm-pricing, pelican-riding-a-bicycle, llm-reasoning, llm-release

I think “agent” may finally have a widely enough agreed upon definition to be useful jargon now

I’ve noticed something interesting over the past few weeks: I’ve started using the term “agent” in conversations where I don’t feel the need to then define it, roll my eyes or wrap it in scare quotes.

[... 1,199 words]

7:12 pm / 18th September 2025 / definitions, ai, generative-ai, llms, ai-agents, agent-definitions

My review of Claude’s new Code Interpreter, released under a very confusing name

Today on the Anthropic blog: Claude can now create and edit files:

[... 2,771 words]

6:11 pm / 9th September 2025 / nodejs, python, visualization, ai, openai, prompt-injection, generative-ai, chatgpt, llms, ai-assisted-programming, anthropic, claude, code-interpreter, llm-tool-use, vibe-coding

Recreating the Apollo AI adoption rate chart with GPT-5, Python and Pyodide

Apollo Global Management’s “Chief Economist” Dr. Torsten Sløk released this interesting chart which appears to show a slowdown in AI adoption rates among large (>250 employees) companies:

[... 2,673 words]

6:47 am / 9th September 2025 / census, data-journalism, javascript, python, tools, visualization, ai, pyodide, openai, generative-ai, chatgpt, llms, ai-assisted-programming, code-interpreter, llm-reasoning, vibe-coding, ai-assisted-search, gpt-5

GPT-5 Thinking in ChatGPT (aka Research Goblin) is shockingly good at search

“Don’t use chatbots as search engines” was great advice for several years... until it wasn’t.

[... 2,679 words]

7:31 pm / 6th September 2025 / bing, definitions, search, ai, openai, generative-ai, llms, llm-tool-use, llm-reasoning, deep-research, ai-assisted-search, gpt-5

V&A East Storehouse and Operation Mincemeat in London

We were back in London for a few days and yesterday had a day of culture.

[... 481 words]

6:51 pm / 27th August 2025 / london, museums, theatre

The Summer of Johann: prompt injections as far as the eye can see

Independent AI researcher Johann Rehberger (previously) has had an absurdly busy August. Under the heading The Month of AI Bugs he has been publishing one report per day across an array of different tools, all of which are vulnerable to various classic prompt injection problems. This is a fantastic and horrifying demonstration of how widespread and dangerous these vulnerabilities still are, almost three years after we first started talking about them.

[... 1,425 words]

10:44 pm / 15th August 2025 / security, ai, prompt-injection, generative-ai, llms, exfiltration-attacks, johann-rehberger, coding-agents, lethal-trifecta, async-coding-agents

Open weight LLMs exhibit inconsistent performance across providers

Artificial Analysis published a new benchmark the other day, this time focusing on how an individual model—OpenAI’s gpt-oss-120b—performs across different hosted providers.

[... 847 words]

4:29 pm / 15th August 2025 / ai, openai, generative-ai, local-llms, llms, evals, gpt-oss, artificial-analysis

LLM 0.27, the annotated release notes: GPT-5 and improved tool calling

I shipped LLM 0.27 today (followed by a 0.27.1 with minor bug fixes), adding support for the new GPT-5 family of models from OpenAI plus a flurry of improvements to the tool calling features introduced in LLM 0.26. Here are the annotated release notes.

[... 1,174 words]

11:57 pm / 11th August 2025 / projects, python, ai, datasette, annotated-release-notes, generative-ai, llms, llm, llm-tool-use, gpt-5

Qwen3-4B-Thinking: “This is art—pelicans don’t ride bikes!”

I’ve fallen a few days behind keeping up with Qwen. They released two new 4B models last week: Qwen3-4B-Instruct-2507 and its thinking equivalent Qwen3-4B-Thinking-2507.

[... 991 words]

11:59 pm / 10th August 2025 / ai, generative-ai, local-llms, llms, qwen, pelican-riding-a-bicycle, llm-release, lm-studio, ai-in-china

My Lethal Trifecta talk at the Bay Area AI Security Meetup

I gave a talk on Wednesday at the Bay Area AI Security Meetup about prompt injection, the lethal trifecta and the challenges of securing systems that use MCP. It wasn’t recorded but I’ve created an annotated presentation with my slides and detailed notes on everything I talked about.

[... 2,843 words]

4:30 am / 9th August 2025 / security, my-talks, ai, prompt-injection, generative-ai, llms, annotated-talks, exfiltration-attacks, model-context-protocol, lethal-trifecta

«« first « previous page 2 / 109 next » last »»

Simon Willison’s Weblog