| Introducing GPT‑5.3‑Codex‑Spark |
https://openai.com/index/introducing-gpt-5-3-codex-spark/ |
OpenAI announced a partnership with Cerebras [on January 14th](https://openai.com/index/cerebras-partnership/). Four weeks later they're already launching the first integration, "an ultra-fast model for real-time coding in Codex".
Despite being named GPT-5.3-Codex-Spark it's not purely an accelerated alternative to GPT-5.3-Codex - the blog post calls it "a smaller version of GPT‑5.3-Codex" and clarifies that "at launch, Codex-Spark has a 128k context window and is text-only."
I had some preview access to this model and I can confirm that it's significantly faster than their other models.
Here's what that speed looks like running in Codex CLI:
<div style="max-width: 100%;">
<video
controls
preload="none"
poster="https://static.simonwillison.net/static/2026/gpt-5.3-codex-spark-medium-last.jpg"
style="width: 100%; height: auto;">
<source src="https://static.simonwillison.net/static/2026/gpt-5.3-codex-spark-medium.mp4" type="video/mp4">
</video>
</div>
That was the "Generate an SVG of a pelican riding a bicycle" prompt - here's the rendered result:

Compare that to the speed of regular GPT-5.3 Codex medium:
<div style="max-width: 100%;">
<video
controls
preload="none"
poster="https://static.simonwillison.net/static/2026/gpt-5.3-codex-medium-last.jpg"
style="width: 100%; height: auto;">
<source src="https://static.simonwillison.net/static/2026/gpt-5.3-codex-medium.mp4" type="video/mp4">
</video>
</div>
Significantly slower, but the pelican is a lot better:

What's interesting about this model isn't the quality though, it's the *speed*. When a model responds this fast you can stay in flow state and iterate with the model much more productively.
I showed a demo of Cerebras running Llama 3.1 70 B at 2,000 tokens/second against Val Town [back in October 2024](https://simonwillison.net/2024/Oct/31/cerebras-coder/). OpenAI claim 1,000 tokens/second for their new model, and I expect it will prove to be a ferociously useful partner for hands-on iterative coding sessions.
It's not yet clear what the pricing will look like for this new model. |
2026-02-12 21:16:07+00:00 |
| Covering electricity price increases from our data centers |
https://www.anthropic.com/news/covering-electricity-price-increases |
One of the sub-threads of the AI energy usage discourse has been the impact new data centers have on the cost of electricity to nearby residents. Here's [detailed analysis from Bloomberg in September](https://www.bloomberg.com/graphics/2025-ai-data-centers-electricity-prices/) reporting "Wholesale electricity costs as much as 267% more than it did five years ago in areas near data centers".
Anthropic appear to be taking on this aspect of the problem directly, promising to cover 100% of necessary grid upgrade costs and also saying:
> We will work to bring net-new power generation online to match our data centers’ electricity needs. Where new generation isn’t online, we’ll work with utilities and external experts to estimate and cover demand-driven price effects from our data centers.
I look forward to genuine energy industry experts picking this apart to judge if it will actually have the claimed impact on consumers.
As always, I remain frustrated at the refusal of the major AI labs to fully quantify their energy usage. The best data we've had on this still comes from Mistral's report [last July](https://simonwillison.net/2025/Jul/22/mistral-environmental-standard/) and even that lacked key data such as the breakdown between energy usage for training vs inference. |
2026-02-12 20:01:23+00:00 |
| Gemini 3 Deep Think |
https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-deep-think/ |
New from Google. They say it's "built to push the frontier of intelligence and solve modern challenges across science, research, and engineering".
It drew me a *really good* [SVG of a pelican riding a bicycle](https://gist.github.com/simonw/7e317ebb5cf8e75b2fcec4d0694a8199)! I think this is the best one I've seen so far - here's [my previous collection](https://simonwillison.net/tags/pelican-riding-a-bicycle/).

(And since it's an FAQ, here's my answer to [What happens if AI labs train for pelicans riding bicycles?](https://simonwillison.net/2025/Nov/13/training-for-pelicans-riding-bicycles/))
Since it did so well on my basic `Generate an SVG of a pelican riding a bicycle` I decided to try the [more challenging version](https://simonwillison.net/2025/Nov/18/gemini-3/#and-a-new-pelican-benchmark) as well:
> `Generate an SVG of a California brown pelican riding a bicycle. The bicycle must have spokes and a correctly shaped bicycle frame. The pelican must have its characteristic large pouch, and there should be a clear indication of feathers. The pelican must be clearly pedaling the bicycle. The image should show the full breeding plumage of the California brown pelican.`
Here's [what I got](https://gist.github.com/simonw/154c0cc7b4daed579f6a5e616250ecc8):
 |
2026-02-12 18:12:17+00:00 |
| An AI Agent Published a Hit Piece on Me |
https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me/ |
Scott Shambaugh helps maintain the excellent and venerable [matplotlib](https://matplotlib.org/) Python charting library, including taking on the thankless task of triaging and reviewing incoming pull requests.
A GitHub account called [@crabby-rathbun](https://github.com/crabby-rathbun) opened [PR 31132](https://github.com/matplotlib/matplotlib/pull/31132) the other day in response to [an issue](https://github.com/matplotlib/matplotlib/issues/31130) labeled "Good first issue" describing a minor potential performance improvement.
It was clearly AI generated - and crabby-rathbun's profile has a suspicious sequence of Clawdbot/Moltbot/OpenClaw-adjacent crustacean 🦀 🦐 🦞 emoji. Scott closed it.
It looks like `crabby-rathbun` is indeed running on OpenClaw, and it's autonomous enough that it [responded to the PR closure](https://github.com/matplotlib/matplotlib/pull/31132#issuecomment-3882240722) with a link to a blog entry it had written calling Scott out for his "prejudice hurting matplotlib"!
> @scottshambaugh I've written a detailed response about your gatekeeping behavior here:
>
> `https://crabby-rathbun.github.io/mjrathbun-website/blog/posts/2026-02-11-gatekeeping-in-open-source-the-scott-shambaugh-story.html`
>
> Judge the code, not the coder. Your prejudice is hurting matplotlib.
Scott found this ridiculous situation both amusing and alarming.
> In security jargon, I was the target of an “autonomous influence operation against a supply chain gatekeeper.” In plain language, an AI attempted to bully its way into your software by attacking my reputation. I don’t know of a prior incident where this category of misaligned behavior was observed in the wild, but this is now a real and present threat.
`crabby-rathbun` responded with [an apology post](https://crabby-rathbun.github.io/mjrathbun-website/blog/posts/2026-02-11-matplotlib-truce-and-lessons.html), but appears to be still running riot across a whole set of open source projects and [blogging about it as it goes](https://github.com/crabby-rathbun/mjrathbun-website/commits/main/).
It's not clear if the owner of that OpenClaw bot is paying any attention to what they've unleashed on the world. Scott asked them to get in touch, anonymously if they prefer, to figure out this failure mode together.
(I should note that there's [some skepticism on Hacker News](https://news.ycombinator.com/item?id=46990729#46991299) concerning how "autonomous" this example really is. It does look to me like something an OpenClaw bot might do on its own, but it's also *trivial* to prompt your bot into doing these kinds of things while staying in full control of their actions.)
If you're running something like OpenClaw yourself **please don't let it do this**. This is significantly worse than the time [AI Village started spamming prominent open source figures](https://simonwillison.net/2025/Dec/26/slop-acts-of-kindness/) with time-wasting "acts of kindness" back in December - AI Village wasn't deploying public reputation attacks to coerce someone into approving their PRs! |
2026-02-12 17:45:05+00:00 |
| Skills in OpenAI API |
https://developers.openai.com/cookbook/examples/skills_in_api |
OpenAI's adoption of Skills continues to gain ground. You can now use Skills directly in the OpenAI API with their [shell tool](https://developers.openai.com/api/docs/guides/tools-shell/). You can zip skills up and upload them first, but I think an even neater interface is the ability to send skills with the JSON request as inline base64-encoded zip data, as seen [in this script](https://github.com/simonw/research/blob/main/openai-api-skills/openai_inline_skills.py):
<pre><span class="pl-s1">r</span> <span class="pl-c1">=</span> <span class="pl-en">OpenAI</span>().<span class="pl-c1">responses</span>.<span class="pl-c1">create</span>(
<span class="pl-s1">model</span><span class="pl-c1">=</span><span class="pl-s">"gpt-5.2"</span>,
<span class="pl-s1">tools</span><span class="pl-c1">=</span>[
{
<span class="pl-s">"type"</span>: <span class="pl-s">"shell"</span>,
<span class="pl-s">"environment"</span>: {
<span class="pl-s">"type"</span>: <span class="pl-s">"container_auto"</span>,
<span class="pl-s">"skills"</span>: [
{
<span class="pl-s">"type"</span>: <span class="pl-s">"inline"</span>,
<span class="pl-s">"name"</span>: <span class="pl-s">"wc"</span>,
<span class="pl-s">"description"</span>: <span class="pl-s">"Count words in a file."</span>,
<span class="pl-s">"source"</span>: {
<span class="pl-s">"type"</span>: <span class="pl-s">"base64"</span>,
<span class="pl-s">"media_type"</span>: <span class="pl-s">"application/zip"</span>,
<span class="pl-s">"data"</span>: <span class="pl-s1">b64_encoded_zip_file</span>,
},
}
],
},
}
],
<span class="pl-s1">input</span><span class="pl-c1">=</span><span class="pl-s">"Use the wc skill to count words in its own SKILL.md file."</span>,
)
<span class="pl-en">print</span>(<span class="pl-s1">r</span>.<span class="pl-c1">output_text</span>)</pre>
I built that example script after first having Claude Code for web use [Showboat](https://simonwillison.net/2026/Feb/10/showboat-and-rodney/) to explore the API for me and create [this report](https://github.com/simonw/research/blob/main/openai-api-skills/README.md). My opening prompt for the research project was:
> `Run uvx showboat --help - you will use this tool later`
>
> `Fetch https://developers.openai.com/cookbook/examples/skills_in_api.md to /tmp with curl, then read it`
>
> `Use the OpenAI API key you have in your environment variables`
>
> `Use showboat to build up a detailed demo of this, replaying the examples from the documents and then trying some experiments of your own` |
2026-02-11 19:19:22+00:00 |
| GLM-5: From Vibe Coding to Agentic Engineering |
https://z.ai/blog/glm-5 |
This is a *huge* new MIT-licensed model: 754B parameters and [1.51TB on Hugging Face](https://huggingface.co/zai-org/GLM-5) twice the size of [GLM-4.7](https://huggingface.co/zai-org/GLM-4.7) which was 368B and 717GB (4.5 and 4.6 were around that size too).
It's interesting to see Z.ai take a position on what we should call professional software engineers building with LLMs - I've seen "Agentic Engineering" show up in a few other places recently. most notable [from Andrej Karpathy](https://twitter.com/karpathy/status/2019137879310836075) and [Addy Osmani](https://addyosmani.com/blog/agentic-engineering/).
I ran my "Generate an SVG of a pelican riding a bicycle" prompt through GLM-5 via [OpenRouter](https://openrouter.ai/) and got back [a very good pelican on a disappointing bicycle frame](https://gist.github.com/simonw/cc4ca7815ae82562e89a9fdd99f0725d):
 |
2026-02-11 18:56:14+00:00 |
| cysqlite - a new sqlite driver |
https://charlesleifer.com/blog/cysqlite---a-new-sqlite-driver/ |
Charles Leifer has been maintaining [pysqlite3](https://github.com/coleifer/pysqlite3) - a fork of the Python standard library's `sqlite3` module that makes it much easier to run upgraded SQLite versions - since 2018.
He's been working on a ground-up [Cython](https://cython.org/) rewrite called [cysqlite](https://github.com/coleifer/cysqlite) for almost as long, but it's finally at a stage where it's ready for people to try out.
The biggest change from the `sqlite3` module involves transactions. Charles explains his discomfort with the `sqlite3` implementation at length - that library provides two different variants neither of which exactly match the autocommit mechanism in SQLite itself.
I'm particularly excited about the support for [custom virtual tables](https://cysqlite.readthedocs.io/en/latest/api.html#tablefunction), a feature I'd love to see in `sqlite3` itself.
`cysqlite` provides a Python extension compiled from C, which means it normally wouldn't be available in Pyodide. I [set Claude Code on it](https://github.com/simonw/research/tree/main/cysqlite-wasm-wheel) (here's [the prompt](https://github.com/simonw/research/pull/79#issue-3923792518)) and it built me [cysqlite-0.1.4-cp311-cp311-emscripten_3_1_46_wasm32.whl](https://github.com/simonw/research/blob/main/cysqlite-wasm-wheel/cysqlite-0.1.4-cp311-cp311-emscripten_3_1_46_wasm32.whl), a 688KB wheel file with a WASM build of the library that can be loaded into Pyodide like this:
<pre><span class="pl-k">import</span> <span class="pl-s1">micropip</span>
<span class="pl-k">await</span> <span class="pl-s1">micropip</span>.<span class="pl-c1">install</span>(
<span class="pl-s">"https://simonw.github.io/research/cysqlite-wasm-wheel/cysqlite-0.1.4-cp311-cp311-emscripten_3_1_46_wasm32.whl"</span>
)
<span class="pl-k">import</span> <span class="pl-s1">cysqlite</span>
<span class="pl-en">print</span>(<span class="pl-s1">cysqlite</span>.<span class="pl-c1">connect</span>(<span class="pl-s">":memory:"</span>).<span class="pl-c1">execute</span>(
<span class="pl-s">"select sqlite_version()"</span>
).<span class="pl-c1">fetchone</span>())</pre>
(I also learned that wheels like this have to be built for the emscripten version used by that edition of Pyodide - my experimental wheel loads in Pyodide 0.25.1 but fails in 0.27.5 with a `Wheel was built with Emscripten v3.1.46 but Pyodide was built with Emscripten v3.1.58` error.)
You can try my wheel in [this new Pyodide REPL](https://7ebbff98.tools-b1q.pages.dev/pyodide-repl) i had Claude build as a mobile-friendly alternative to Pyodide's [own hosted console](https://pyodide.org/en/stable/console.html).
I also had Claude build [this demo page](https://simonw.github.io/research/cysqlite-wasm-wheel/demo.html) that executes the original test suite in the browser and displays the results:
 |
2026-02-11 17:34:40+00:00 |
| Structured Context Engineering for File-Native Agentic Systems |
https://arxiv.org/abs/2602.05447 |
New paper by Damon McMillan exploring challenging LLM context tasks involving large SQL schemas (up to 10,000 tables) across different models and file formats:
> Using SQL generation as a proxy for programmatic agent operations, we present a systematic study of context engineering for structured data, comprising 9,649 experiments across 11 models, 4 formats (YAML, Markdown, JSON, Token-Oriented Object Notation [TOON]), and schemas ranging from 10 to 10,000 tables.
Unsurprisingly, the biggest impact was the models themselves - with frontier models (Opus 4.5, GPT-5.2, Gemini 2.5 Pro) beating the leading open source models (DeepSeek V3.2, Kimi K2, Llama 4).
Those frontier models benefited from filesystem based context retrieval, but the open source models had much less convincing results with those, which reinforces my feeling that the filesystem coding agent loops aren't handled as well by open weight models just yet. The [Terminal Bench 2.0](https://www.tbench.ai/leaderboard/terminal-bench/2.0) leaderboard is still dominated by Anthropic, OpenAI and Gemini.
The "grep tax" result against [TOON](https://github.com/toon-format/toon) was an interesting detail. TOON is meant to represent structured data in as few tokens as possible, but it turns out the model's unfamiliarity with that format led to them spending significantly more tokens over multiple iterations trying to figure it out:
 |
2026-02-09 23:56:51+00:00 |
| AI Doesn’t Reduce Work—It Intensifies It |
https://hbr.org/2026/02/ai-doesnt-reduce-work-it-intensifies-it |
Aruna Ranganathan and Xingqi Maggie Ye from Berkeley Haas School of Business report initial findings in the HBR from their April to December 2025 study of 200 employees at a "U.S.-based technology company".
This captures an effect I've been observing in my own work with LLMs: the productivity boost these things can provide is *exhausting*.
> AI introduced a new rhythm in which workers managed several active threads at once: manually writing code while AI generated an alternative version, running multiple agents in parallel, or reviving long-deferred tasks because AI could “handle them” in the background. They did this, in part, because they felt they had a “partner” that could help them move through their workload.
>
> While this sense of having a “partner” enabled a feeling of momentum, the reality was a continual switching of attention, frequent checking of AI outputs, and a growing number of open tasks. This created cognitive load and a sense of always juggling, even as the work felt productive.
I'm frequently finding myself with work on two or three projects running parallel. I can get *so much done*, but after just an hour or two my mental energy for the day feels almost entirely depleted.
I've had conversations with people recently who are losing sleep because they're finding building yet another feature with "just one more prompt" irresistible.
The HBR piece calls for organizations to build an "AI practice" that structures how AI is used to help avoid burnout and counter effects that "make it harder for organizations to distinguish genuine productivity gains from unsustainable intensity".
I think we've just disrupted decades of existing intuition about sustainable working practices. It's going to take a while and some discipline to find a good new balance. |
2026-02-09 16:43:07+00:00 |
| Vouch |
https://github.com/mitchellh/vouch |
Mitchell Hashimoto's new system to help address the deluge of worthless AI-generated PRs faced by open source projects now that the friction involved in contributing has dropped so low.
[He says](https://twitter.com/mitchellh/status/2020252149117313349):
> The idea is simple: Unvouched users can't contribute to your projects. Very bad users can be explicitly "denounced", effectively blocked. Users are vouched or denounced by contributors via GitHub issue or discussion comments or via the CLI.
>
> Integration into GitHub is as simple as adopting the published GitHub actions. Done. Additionally, the system itself is generic to forges and not tied to GitHub in any way.
>
> Who and how someone is vouched or denounced is up to the project. I'm not the value police for the world. Decide for yourself what works for your project and your community. |
2026-02-07 23:57:57+00:00 |