Archive for January 2026

January 2026

87 posts: 9 entries, 32 links, 14 quotes, 4 notes, 28 beats

Jan. 23, 2026

Wilson Lin on FastRender: a browser built by thousands of parallel agents

Last week Cursor published Scaling long-running autonomous coding, an article describing their research efforts into coordinating large numbers of autonomous coding agents. One of the projects mentioned in the article was FastRender, a web browser they built from scratch using their agent swarms. I wanted to learn more so I asked Wilson Lin, the engineer behind FastRender, if we could record a conversation about the project. That 47 minute video is now available on YouTube. I’ve included some of the highlights below.

[... 2,243 words]

9:26 pm / browsers, youtube, ai, rust, generative-ai, llms, ai-assisted-programming, coding-agents, cursor, parallel-agents, browser-challenge

Jan. 24, 2026

If you tell a friend they can now instantly create any app, they’ll probably say “Cool! Now I need to think of an idea.” Then they will forget about it, and never build a thing. The problem is not that your friend is horribly uncreative. It’s that most people’s problems are not software-shaped, and most won’t notice even when they are. [...]

Programmers are trained to see everything as a software-shaped problem: if you do a task three times, you should probably automate it with a script. Rename every IMG_*.jpg file from the last week to hawaii2025_*.jpg, they tell their terminal, while the rest of us painfully click and copy-paste. We are blind to the solutions we were never taught to see, asking for faster horses and never dreaming of cars.

— Jasmine Sun

# 9:34 pm / vibe-coding, coding-agents, claude-code, generative-ai, ai, llms

Don’t “Trust the Process” (via) Jenny Wen, Design Lead at Anthropic (and previously Director of Design at Figma) gave a provocative keynote at Hatch Conference in Berlin last September.

Don't "Trust the process" slide, speaker shown on the left

Jenny argues that the Design Process - user research leading to personas leading to user journeys leading to wireframes... all before anything gets built - may be outdated for today's world.

Hypothesis: In a world where anyone can make anything — what matters is your ability to choose and curate what you make.

In place of the Process, designers should lean into prototypes. AI makes these much more accessible and less time-consuming than they used to be.

Watching this talk made me think about how AI-assisted programming significantly reduces the cost of building the wrong thing. Previously if the design wasn't right you could waste months of development time building in the wrong direction, which was a very expensive mistake. If a wrong direction wastes just a few days instead we can take more risks and be much more proactive in exploring the problem space.

I've always been a compulsive prototyper though, so this is very much playing into my own existing biases!

# 11:31 pm / design, prototyping, ai, generative-ai, llms, ai-assisted-programming, vibe-coding

Jan. 25, 2026

Kākāpō Cam: Rakiura live stream (via) Critical update for this year's Kākāpō breeding season: the New Zealand Department of Conservation have a livestream running of Rakiura's nest!

You’re looking at the underground nest of 23-year-old Rakiura. She has chosen this same site to nest for all seven breeding seasons since 2008, a large cavity under a rātā tree. Because she returns to the site so reliably, we’ve been able to make modifications over the years to keep it safe and dry, including adding a well-placed hatch for monitoring eggs and chicks.

Rakiura is a legendary Kākāpō:

Rakiura hatched on 19 February 2002 on Whenua Hou/Codfish Island. She is the offspring of Flossie and Bill. Her name comes from the te reo Māori name for Stewart Island, the place where most of the founding kākāpō population originated.

Rakiura has nine living descendants, three females and six males, across six breeding seasons. In 2008 came Tōitiiti, in 2009 Tamahou and Te Atapō, in 2011 Tia and Tūtoko, in 2014 Taeatanga and Te Awa, in 2019 Mati-mā and Tautahi. She also has many grandchicks.

She laid her first egg of the season at 4:30pm NZ time on 22nd January. The livestream went live shortly afterwards, once she committed to this nest.

The stream is on YouTube. I used Claude Code to write a livestream-gif.py script and used that to capture this sped-up video of the last few hours of footage, within which you can catch a glimpse of the egg!

# 4:53 am / youtube, kakapo, conservation, claude-code

Release claude-code-transcripts 0.6 — Tools for publishing transcripts for Claude Code sessions

25th Jan 2026, 5:49 am

Tool Directory Explorer Demo — Explore the contents of a local directory in your browser with an interactive file tree viewer and search functionality. This tool demonstrates the capabilities of the `webkitdirectory` input attribute to access folder structures and analyze file distribution. Once a directory is selected, view detailed statistics about file counts, total size, and file type breakdown, with options to filter by name and file extension.

25th Jan 2026, 6:39 pm

the browser is the sandbox. Paul Kinlan is a web platform developer advocate at Google and recently turned his attention to coding agents. He quickly identified the importance of a robust sandbox for agents to operate in and put together these detailed notes on how the web browser can help:

This got me thinking about the browser. Over the last 30 years, we have built a sandbox specifically designed to run incredibly hostile, untrusted code from anywhere on the web, the instant a user taps a URL. [...]

Could you build something like Cowork in the browser? Maybe. To find out, I built a demo called Co-do that tests this hypothesis. In this post I want to discuss the research I've done to see how far we can get, and determine if the browser's ability to run untrusted code is useful (and good enough) for enabling software to do more for us directly on our computer.

Paul then describes how the three key aspects of a sandbox - filesystem, network access and safe code execution - can be handled by browser technologies: the File System Access API (still Chrome-only as far as I can tell), CSP headers with <iframe sandbox> and WebAssembly in Web Workers.

Co-do is a very interesting demo that illustrates all of these ideas in a single application:

You select a folder full of files and configure an LLM provider and set an API key, Co-do then uses CSP-approved API calls to interact with that provider and provides a chat interface with tools for interacting with those files. It does indeed feel similar to Claude Cowork but without running a multi-GB local container to provide the sandbox.

My biggest complaint about <iframe sandbox> remains how thinly documented it is, especially across different browsers. Paul's post has all sorts of useful details on that which I've not encountered elsewhere, including a complex double-iframe technique to help apply network rules to the inner of the two frames.

Thanks to this post I also learned about the <input type="file" webkitdirectory> tag which turns out to work on Firefox, Safari and Chrome and allows a browser read-only access to a full directory of files at once. I had Claude knock up a webkitdirectory demo to try it out and I'll certainly be using it for projects in the future.

# 11:51 pm / browsers, javascript, sandboxing, ai, generative-ai, llms, ai-agents, coding-agents, claude-code

Jan. 26, 2026

Release datasette-visible-internal-db 0.1.1a2 — Datasette plugin for making the internal database visible for debugging

26th Jan 2026, 3:39 am · datasette

ChatGPT Containers can now run bash, pip/npm install packages, and download files

One of my favourite features of ChatGPT is its ability to write and execute code in a container. This feature launched as ChatGPT Code Interpreter nearly three years ago, was half-heartedly rebranded to “Advanced Data Analysis” at some point and is generally really difficult to find detailed documentation about. Case in point: it appears to have had a massive upgrade at some point in the past few months, and I can’t find documentation about the new capabilities anywhere!

[... 3,019 words]

7:19 pm / pypi, sandboxing, npm, ai, openai, generative-ai, chatgpt, llms, ai-assisted-programming, code-interpreter

Tool Minesweeper — Play a classic Minesweeper game directly in your browser with support for three difficulty levels: Easy, Medium, and Hard. Use left-click to reveal cells and right-click to place flags, or toggle between reveal and flag modes on mobile with long-press support. The game features a timer to track your speed, mine counter to monitor remaining flags, and immediate visual feedback when you win or hit a mine.

26th Jan 2026, 9:46 pm

Someone asked on Hacker News if I had any tips for getting coding agents to write decent quality tests. Here's what I said:

I work in Python which helps a lot because there are a TON of good examples of pytest tests floating around in the training data, including things like usage of fixture libraries for mocking external HTTP APIs and snapshot testing and other neat patterns.

Or I can say "use pytest-httpx to mock the endpoints" and Claude knows what I mean.

Keeping an eye on the tests is important. The most common anti-pattern I see is large amounts of duplicated test setup code - which isn't a huge deal, I'm much more more tolerant of duplicated logic in tests than I am in implementation, but it's still worth pushing back on.

"Refactor those tests to use pytest.mark.parametrize" and "extract the common setup into a pytest fixture" work really well there.

Generally though the best way to get good tests out of a coding agent is to make sure it's working in a project with an existing test suite that uses good patterns. Coding agents pick the existing patterns up without needing any extra prompting at all.

I find that once a project has clean basic tests the new tests added by the agents tend to match them in quality. It's similar to how working on large projects with a team of other developers work - keeping the code clean means when people look for examples of how to write a test they'll be pointed in the right direction.

One last tip I use a lot is this:

Clone datasette/datasette-enrichments
from GitHub to /tmp and imitate the
testing patterns it uses

I do this all the time with different existing projects I've written - the quickest way to show an agent how you like something to be done is to have it look at an example.

# 11:55 pm / testing, coding-agents, python, generative-ai, ai, llms, hacker-news, pytest

Jan. 27, 2026

Kimi K2.5: Visual Agentic Intelligence (via) Kimi K2 landed in July as a 1 trillion parameter open weight LLM. It was joined by Kimi K2 Thinking in November which added reasoning capabilities. Now they've made it multi-modal: the K2 models were text-only, but the new 2.5 can handle image inputs as well:

Kimi K2.5 builds on Kimi K2 with continued pretraining over approximately 15T mixed visual and text tokens. Built as a native multimodal model, K2.5 delivers state-of-the-art coding and vision capabilities and a self-directed agent swarm paradigm.

The "self-directed agent swarm paradigm" claim there means improved long-sequence tool calling and training on how to break down tasks for multiple agents to work on at once:

For complex tasks, Kimi K2.5 can self-direct an agent swarm with up to 100 sub-agents, executing parallel workflows across up to 1,500 tool calls. Compared with a single-agent setup, this reduces execution time by up to 4.5x. The agent swarm is automatically created and orchestrated by Kimi K2.5 without any predefined subagents or workflow.

I used the OpenRouter Chat UI to have it "Generate an SVG of a pelican riding a bicycle", and it did quite well:

Cartoon illustration of a white pelican with a large orange beak and yellow throat pouch riding a green bicycle with yellow feet on the pedals, set against a light blue sky with soft bokeh circles and a green grassy hill. The bicycle frame is a little questionable. The pelican is quite good. The feet do not quite align with the pedals, which are floating clear of the frame.

As a more interesting test, I decided to exercise the claims around multi-agent planning with this prompt:

I want to build a Datasette plugin that offers a UI to upload files to an S3 bucket and stores information about them in a SQLite table. Break this down into ten tasks suitable for execution by parallel coding agents.

Here's the full response. It produced ten realistic tasks and reasoned through the dependencies between them. For comparison here's the same prompt against Claude Opus 4.5 and against GPT-5.2 Thinking.

The Hugging Face repository is 595GB. The model uses Kimi's janky "modified MIT" license, which adds the following clause:

Our only modification part is that, if the Software (or any derivative works thereof) is used for any of your commercial products or services that have more than 100 million monthly active users, or more than 20 million US dollars (or equivalent in other currencies) in monthly revenue, you shall prominently display "Kimi K2.5" on the user interface of such product or service.

Given the model's size, I expect one way to run it locally would be with MLX and a pair of $10,000 512GB RAM M3 Ultra Mac Studios. That setup has been demonstrated to work with previous trillion parameter K2 models.

# 3:07 pm / ai, llms, hugging-face, vision-llms, llm-tool-use, ai-agents, pelican-riding-a-bicycle, llm-release, ai-in-china, moonshot, parallel-agents, kimi, janky-licenses

One Human + One Agent = One Browser From Scratch (via) embedding-shapes was so infuriated by the hype around Cursor's FastRender browser project - thousands of parallel agents producing ~1.6 million lines of Rust - that they were inspired to take a go at building a web browser using coding agents themselves.

The result is one-agent-one-browser and it's really impressive. Over three days they drove a single Codex CLI agent to build 20,000 lines of Rust that successfully renders HTML+CSS with no Rust crate dependencies at all - though it does (reasonably) use Windows, macOS and Linux system frameworks for image and text rendering.

I installed the 1MB macOS binary release and ran it against my blog:

chmod 755 ~/Downloads/one-agent-one-browser-macOS-ARM64 
~/Downloads/one-agent-one-browser-macOS-ARM64 https://simonwillison.net/

Here's the result:

My blog rendered in a window. Everything is in the right place, the CSS gradients look good, the feed subscribe SVG icon is rendered correctly but there's a missing PNG image.

It even rendered my SVG feed subscription icon! A PNG image is missing from the page, which looks like an intermittent bug (there's code to render PNGs).

The code is pretty readable too - here's the flexbox implementation.

I had thought that "build a web browser" was the ideal prompt to really stretch the capabilities of coding agents - and that it would take sophisticated multi-agent harnesses (as seen in the Cursor project) and millions of lines of code to achieve.

Turns out one agent driven by a talented engineer, three days and 20,000 lines of Rust is enough to get a very solid basic renderer working!

I'm going to upgrade my prediction for 2029: I think we're going to get a production-grade web browser built by a small team using AI assistance by then.

# 4:58 pm / browsers, predictions, ai, rust, generative-ai, llms, ai-assisted-programming, coding-agents, codex-cli, browser-challenge

Jan. 28, 2026

The Five Levels: from Spicy Autocomplete to the Dark Factory. Dan Shapiro proposes a five level model of AI-assisted programming, inspired by the five (or rather six, it's zero-indexed) levels of driving automation.

Spicy autocomplete, aka original GitHub Copilot or copying and pasting snippets from ChatGPT.
The coding intern, writing unimportant snippets and boilerplate with full human review.
The junior developer, pair programming with the model but still reviewing every line.
The developer. Most code is generated by AI, and you take on the role of full-time code reviewer.
The engineering team. You're more of an engineering manager or product/program/project manager. You collaborate on specs and plans, the agents do the work.
The dark software factory, like a factory run by robots where the lights are out because robots don't need to see.

Dan says about that last category:

At level 5, it's not really a car any more. You're not really running anybody else's software any more. And your software process isn't really a software process any more. It's a black box that turns specs into software.

Why Dark? Maybe you've heard of the Fanuc Dark Factory, the robot factory staffed by robots. It's dark, because it's a place where humans are neither needed nor welcome.

I know a handful of people who are doing this. They're small teams, less than five people. And what they're doing is nearly unbelievable -- and it will likely be our future.

I've talked to one team that's doing the pattern hinted at here. It was fascinating. The key characteristics:

Nobody reviews AI-produced code, ever. They don't even look at it.
The goal of the system is to prove that the system works. A huge amount of the coding agent work goes into testing and tooling and simulating related systems and running demos.
The role of the humans is to design that system - to find new patterns that can help the agents work more effectively and demonstrate that the software they are building is robust and effective.

It was a tiny team and they stuff they had built in just a few months looked very convincing to me. Some of them had 20+ years of experience as software developers working on systems with high reliability requirements, so they were not approaching this from a naive perspective.

I'm hoping they come out of stealth soon because I can't really share more details than this.

Update 7th February 2026: The demo was by StrongDM's AI team, and they've now gone public with details of how they work.

# 9:44 pm / ai, generative-ai, llms, ai-assisted-programming, coding-agents

Adding dynamic features to an aggressively cached website

My blog uses aggressive caching: it sits behind Cloudflare with a 15 minute cache header, which guarantees it can survive even the largest traffic spike to any given page. I’ve recently added a couple of dynamic features that work in spite of that full-page caching. Here’s how those work.

[... 1,145 words]

10:10 pm / caching, django, javascript, localstorage, ai, cloudflare, generative-ai, llms, ai-assisted-programming

Jan. 29, 2026

Release datasette 1.0a24 — An open source multi-tool for exploring and publishing data

29th Jan 2026, 5:04 pm · datasette

Datasette 1.0a24. New Datasette alpha this morning. Key new features:

Datasette's Request object can now handle multipart/form-data file uploads via the new await request.form(files=True) method. I plan to use this for a datasette-files plugin to support attaching files to rows of data.
The recommended development environment for hacking on Datasette itself now uses uv. Crucially, you can clone Datasette and run uv run pytest to run the tests without needing to manually create a virtual environment or install dependencies first, thanks to the dev dependency group pattern.
A new ?_extra=render_cell parameter for both table and row JSON pages to return the results of executing the render_cell() plugin hook. This should unlock new JavaScript UI features in the future.

More details in the release notes. I also invested a bunch of work in eliminating flaky tests that were intermittently failing in CI - I think those are all handled now.

# 5:21 pm / projects, python, datasette, annotated-release-notes, uv

Jan. 30, 2026

We gotta talk about AI as a programming tool for the arts. Chris Ashworth is the creator and CEO of QLab, a macOS software package for “cue-based, multimedia playback” which is designed to automate lighting and audio for live theater productions.

I recently started following him on TikTok where he posts about his business and theater automation in general - Chris founded the Voxel theater in Baltimore which QLab use as a combined performance venue, teaching hub and research lab (here's a profile of the theater), and the resulting videos offer a fascinating glimpse into a world I know virtually nothing about.

This latest TikTok describes his Claude Opus moment, after he used Claude Code to build a custom lighting design application for a very niche project and put together a useful application in just a few days that he would never have been able to spare the time for otherwise.

Chris works full time in the arts and comes at generative AI from a position of rational distrust. It's interesting to see him working through that tension to acknowledge that there are valuable applications here to build tools for the community he serves.

I have been at least gently skeptical about all this stuff for the last two years. Every time I checked in on it, I thought it was garbage, wasn't interested in it, wasn't useful. [...] But as a programmer, if you hear something like, this is changing programming, it's important to go check it out once in a while. So I went and checked it out a few weeks ago. And it's different. It's astonishing. [...]

One thing I learned in this exercise is that it can't make you a fundamentally better programmer than you already are. It can take a person who is a bad programmer and make them faster at making bad programs. And I think it can take a person who is a good programmer and, from what I've tested so far, make them faster at making good programs. [...] You see programmers out there saying, "I'm shipping code I haven't looked at and don't understand." I'm terrified by that. I think that's awful. But if you're capable of understanding the code that it's writing, and directing, designing, editing, deleting, being quality control on it, it's kind of astonishing. [...]

The positive thing I see here, and I think is worth coming to terms with, is this is an application that I would never have had time to write as a professional programmer. Because the audience is three people. [...] There's no way it was worth it to me to spend my energy of 20 years designing and implementing software for artists to build an app for three people that is this level of polish. And it took me a few days. [...]

I know there are a lot of people who really hate this technology, and in some ways I'm among them. But I think we've got to come to terms with this is a career-changing moment. And I really hate that I'm saying that because I didn't believe it for the last two years. [...] It's like having a room full of power tools. I wouldn't want to send an untrained person into a room full of power tools because they might chop off their fingers. But if someone who knows how to use tools has the option to have both hand tools and a power saw and a power drill and a lathe, there's a lot of work they can do with those tools at a lot faster speed.

# 3:51 am / theatre, ai, generative-ai, llms, ai-assisted-programming, tiktok, ai-ethics, coding-agents, claude-code

Release sqlite-ast-conformance 0.1 — A language independent conformance suite for implementations of a SQLite SELECT query parser

30th Jan 2026, 4:45 am

Release tiny-haversine 0.1.0 — A Python package with a C module for haversine() - as a WASM experiment

30th Jan 2026, 5:33 am

Release tiny-haversine 0.1.1 — A Python package with a C module for haversine() - as a WASM experiment

30th Jan 2026, 5:43 am

Release sqlite-ast 0.1a0

I wanted a Python library that could parse SQLite SELECT statements, so I vibe coded this one up based on a specification I reverse-engineered from SQLite's own parser behavior.

There's an interactive playground here for trying it out in the browser (via Pyodide).

30th Jan 2026, 6:12 am · sqlite, vibe-coding, python

Tool SQLite AST — Parse SQLite SELECT queries into abstract syntax trees and view the results in JSON and Python representations. This tool uses the sqlite-ast library running in your browser via Pyodide to analyze SQL syntax and display both the dictionary-based parse tree and a rich pretty-printed representation. It supports complex queries including compound selects, common table expressions, and window functions, and can display partial parse results when encountering syntax errors.

30th Jan 2026, 6:26 am

Moltbook is the most interesting place on the internet right now

The hottest project in AI right now is Clawdbot, renamed to Moltbot, renamed to OpenClaw. It’s an open source implementation of the digital personal assistant pattern, built by Peter Steinberger to integrate with the messaging system of your choice. It’s two months old, has over 114,000 stars on GitHub and is seeing incredible adoption, especially given the friction involved in setting it up.

[... 1,307 words]

4:43 pm / ai, tailscale, prompt-injection, generative-ai, llms, claude, ai-agents, ai-ethics, lethal-trifecta, skills, peter-steinberger, openclaw

Getting agents using Beads requires much less prompting, because Beads now has 4 months of “Desire Paths” design, which I’ve talked about before. Beads has evolved a very complex command-line interface, with 100+ subcommands, each with many sub-subcommands, aliases, alternate syntaxes, and other affordances.

The complicated Beads CLI isn’t for humans; it’s for agents. What I did was make their hallucinations real, over and over, by implementing whatever I saw the agents trying to do with Beads, until nearly every guess by an agent is now correct.

— Steve Yegge, Software Survival 3.0

# 10:31 pm / steve-yegge, coding-agents, generative-ai, ai-agents, ai, llms, hallucinations

Jan. 31, 2026

Singing the gospel of collective efficacy. Lovely piece from Matt Webb about how you can "just do things" to help make your community better for everyone:

Similarly we all love when the swifts visit (beautiful birds), so somebody started a group to get swift nest boxes made and installed collectively, then applied for subsidy funding, then got everyone to chip in such that people who couldn’t afford it could have their boxes paid for, and now suddenly we’re all writing to MPs and following the legislation to include swift nesting sites in new build houses. Etc.

It’s called collective efficacy, the belief that you can make a difference by acting together.

My current favorite "you can just do things" is a bit of a stretch, but apparently you can just build a successful software company for 20 years and then use the proceeds to start a theater in Baltimore (for "research") and give the space away to artists for free.

# 1:22 am / matt-webb, theatre

Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with $8/hour/TPUv3 back then, for a total cost of approx. $43K. It achieves 0.256525 CORE score, which is an ensemble metric introduced in the DCLM paper over 22 evaluations like ARC/MMLU/etc.

As of the last few improvements merged into nanochat (many of them originating in modded-nanogpt repo), I can now reach a higher CORE score in 3.04 hours (~$73) on a single 8XH100 node. This is a 600X cost reduction over 7 years, i.e. the cost to train GPT-2 is falling approximately 2.5X every year.

— Andrej Karpathy

# 9:44 pm / andrej-karpathy, gpt-2, generative-ai, ai, llms, openai

«« first « previous page 3 / 3