Example dashboard

Various statistics from my blog.

Owned by simonw, visibility: Public

Entries

3272

SQL query
select 'Entries' as label, count(*) as big_number from blog_entry

Blogmarks

8285

SQL query
select 'Blogmarks' as label, count(*) as big_number from blog_blogmark

Quotations

1338

SQL query
select 'Quotations' as label, count(*) as big_number from blog_quotation

Chart of number of entries per month over time

SQL query
select '<h2>Chart of number of entries per month over time</h2>' as html
SQL query
select to_char(date_trunc('month', created), 'YYYY-MM') as bar_label,
count(*) as bar_quantity from blog_entry group by bar_label order by count(*) desc

Ten most recent blogmarks (of 8285 total)

SQL query
select '## Ten most recent blogmarks (of ' || count(*) || ' total)' as markdown from blog_blogmark
SQL query
select link_title, link_url, commentary, created from blog_blogmark order by created desc limit 10

10 rows

link_title link_url commentary created
Gemini 3.1 Pro https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/ The first in the Gemini 3.1 series, priced the same as Gemini 3 Pro ($2/million input, $12/million output under 200,000 tokens, $4/$18 for 200,000 to 1,000,000). That's less than half the price of Claude Opus 4.6 with very similar benchmark scores to that model. They boast about its improved SVG animation performance compared to Gemini 3 Pro in the announcement! I tried "Generate an SVG of a pelican riding a bicycle" [in Google AI Studio](https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%5B%221ugF9fBfLGxnNoe8_rLlluzo9NSPJDWuF%22%5D,%22action%22:%22open%22,%22userId%22:%22106366615678321494423%22,%22resourceKeys%22:%7B%7D%7D&usp=sharing) and it thought for 323.9 seconds ([thinking trace here](https://gist.github.com/simonw/03a755865021739a3659943a22c125ba#thinking-trace)) before producing this one: ![Whimsical flat-style illustration of a pelican wearing a blue and white baseball cap, riding a red bicycle with yellow-rimmed wheels along a road. The pelican has a large orange bill and a green scarf. A small fish peeks out of a brown basket on the handlebars. The background features a light blue sky with a yellow sun, white clouds, and green hills.](https://static.simonwillison.net/static/2026/gemini-3.1-pro-pelican.png) It's good to see the legs clearly depicted on both sides of the frame (should [satisfy Elon](https://twitter.com/elonmusk/status/2023833496804839808)), the fish in the basket is a nice touch and I appreciated this comment in [the SVG code](https://gist.github.com/simonw/03a755865021739a3659943a22c125ba#response): <!-- Black Flight Feathers on Wing Tip --> <path d="M 420 175 C 440 182, 460 187, 470 190 C 450 210, 430 208, 410 198 Z" fill="#374151" /> I've [added](https://github.com/simonw/llm-gemini/issues/121) the two new model IDs `gemini-3.1-pro-preview` and `gemini-3.1-pro-preview-customtools` to my [llm-gemini plugin](https://github.com/simonw/llm-gemini) for [LLM](https://llm.datasette.io/). That "custom tools" one is [described here](https://ai.google.dev/gemini-api/docs/models/gemini-3.1-pro-preview#gemini-31-pro-preview-customtools) - apparently it may provide better tool performance than the default model in some situations. The model appears to be *incredibly* slow right now - it took 104s to respond to a simple "hi" and a few of my other tests met "Error: This model is currently experiencing high demand. Spikes in demand are usually temporary. Please try again later." or "Error: Deadline expired before operation could complete" errors. I'm assuming that's just teething problems on launch day. It sounds like last week's [Deep Think release](https://simonwillison.net/2026/Feb/12/gemini-3-deep-think/) was our first exposure to the 3.1 family: > Last week, we released a major update to Gemini 3 Deep Think to solve modern challenges across science, research and engineering. Today, we’re releasing the upgraded core intelligence that makes those breakthroughs possible: Gemini 3.1 Pro. **Update**: In [What happens if AI labs train for pelicans riding bicycles?](https://simonwillison.net/2025/nov/13/training-for-pelicans-riding-bicycles/) last November I said: > If a model finally comes out that produces an excellent SVG of a pelican riding a bicycle you can bet I’m going to test it on all manner of creatures riding all sorts of transportation devices. Google's Gemini Lead Jeff Dean [tweeted this video](https://x.com/JeffDean/status/2024525132266688757) featuring an animated pelican riding a bicycle, plus a frog on a penny-farthing and a giraffe driving a tiny car and an ostrich on roller skates and a turtle kickflipping a skateboard and a dachshund driving a stretch limousine. <video style="margin-bottom: 1em" poster="https://static.simonwillison.net/static/2026/gemini-animated-pelicans.jpg" muted controls preload="none" style="max-width: 100%"> <source src="https://static.simonwillison.net/static/2026/gemini-animated-pelicans.mp4" type="video/mp4"> </video> I've been saying for a while that I wish AI labs would highlight things that their new models can do that their older models could not, so top marks to the Gemini team for this video. **Update 2**: I used `llm-gemini` to run my [more detailed Pelican prompt](https://simonwillison.net/2025/Nov/18/gemini-3/#and-a-new-pelican-benchmark), with [this result](https://gist.github.com/simonw/a3bdd4ec9476ba9e9ba7aa61b46d8296): ![Flat-style illustration of a brown pelican riding a teal bicycle with dark blue-rimmed wheels against a plain white background. Unlike the previous image's white cartoon pelican, this pelican has realistic brown plumage with detailed feather patterns, a dark maroon head, yellow eye, and a large pink-tinged pouch bill. The bicycle is a simpler design without a basket, and the scene lacks the colorful background elements like the sun, clouds, road, hills, cap, and scarf from the first illustration, giving it a more minimalist feel.](https://static.simonwillison.net/static/2026/gemini-3.1-pro-pelican-2.png) From the SVG comments: <!-- Pouch Gradient (Breeding Plumage: Red to Olive/Green) --> ... <!-- Neck Gradient (Breeding Plumage: Chestnut Nape, White/Yellow Front) --> 2026-02-19 17:58:37+00:00
SWE-bench February 2026 leaderboard update https://www.swebench.com/ SWE-bench is one of the benchmarks that the labs love to list in their model releases. The official leaderboard is infrequently updated but they just did a full run of it against the current generation of models, which is notable because it's always good to see benchmark results like this that *weren't* self-reported by the labs. The fresh results are for their "Bash Only" benchmark, which runs their [mini-swe-bench](https://github.com/SWE-agent/mini-swe-agent) agent (~9,000 lines of Python, [here are the prompts](https://github.com/SWE-agent/mini-swe-agent/blob/v2.2.1/src/minisweagent/config/benchmarks/swebench.yaml) they use) against the [SWE-bench](https://huggingface.co/datasets/princeton-nlp/SWE-bench) dataset of coding problems - 2,294 real-world examples pulled from 12 open source repos: [django/django](https://github.com/django/django) (850), [sympy/sympy](https://github.com/sympy/sympy) (386), [scikit-learn/scikit-learn](https://github.com/scikit-learn/scikit-learn) (229), [sphinx-doc/sphinx](https://github.com/sphinx-doc/sphinx) (187), [matplotlib/matplotlib](https://github.com/matplotlib/matplotlib) (184), [pytest-dev/pytest](https://github.com/pytest-dev/pytest) (119), [pydata/xarray](https://github.com/pydata/xarray) (110), [astropy/astropy](https://github.com/astropy/astropy) (95), [pylint-dev/pylint](https://github.com/pylint-dev/pylint) (57), [psf/requests](https://github.com/psf/requests) (44), [mwaskom/seaborn](https://github.com/mwaskom/seaborn) (22), [pallets/flask](https://github.com/pallets/flask) (11). **Correction**: *The Bash only benchmark runs against SWE-bench Verified, not original SWE-bench. Verified is a manually curated subset of 500 samples [described here](https://openai.com/index/introducing-swe-bench-verified/), funded by OpenAI. Here's [SWE-bench Verified](https://huggingface.co/datasets/princeton-nlp/SWE-bench_Verified) on Hugging Face - since it's just 2.1MB of Parquet it's easy to browse [using Datasette Lite](https://lite.datasette.io/?parquet=https%3A%2F%2Fhuggingface.co%2Fdatasets%2Fprinceton-nlp%2FSWE-bench_Verified%2Fresolve%2Fmain%2Fdata%2Ftest-00000-of-00001.parquet#/data/test-00000-of-00001?_facet=repo), which cuts those numbers down to django/django (231), sympy/sympy (75), sphinx-doc/sphinx (44), matplotlib/matplotlib (34), scikit-learn/scikit-learn (32), astropy/astropy (22), pydata/xarray (22), pytest-dev/pytest (19), pylint-dev/pylint (10), psf/requests (8), mwaskom/seaborn (2), pallets/flask (1)*. Here's how the top ten models performed: ![Bar chart showing "% Resolved" by "Model". Bars in descending order: Claude 4.5 Opus (high reasoning) 76.8%, Gemini 3 Flash (high reasoning) 75.8%, MiniMax M2.5 (high reasoning) 75.8%, Claude Opus 4.6 75.6%, GLM-5 (high reasoning) 72.8%, GPT-5.2 (high reasoning) 72.8%, Claude 4.5 Sonnet (high reasoning) 72.8%, Kimi K2.5 (high reasoning) 71.4%, DeepSeek V3.2 (high reasoning) 70.8%, Claude 4.5 Haiku (high reasoning) 70.0%, and a partially visible final bar at 66.6%.](https://static.simonwillison.net/static/2026/swbench-feb-2026.jpg) It's interesting to see Claude Opus 4.5 beat Opus 4.6, though only by about a percentage point. 4.5 Opus is top, then Gemini 3 Flash, then MiniMax M2.5 - a 229B model released [last week](https://www.minimax.io/news/minimax-m25) by Chinese lab MiniMax. GLM-5, Kimi K2.5 and DeepSeek V3.2 are three more Chinese models that make the top ten as well. OpenAI's GPT-5.2 is their highest performing model at position 6, but it's worth noting that their best coding model, GPT-5.3-Codex, is not represented - maybe because it's not yet available in the OpenAI API. This benchmark uses the same system prompt for every model, which is important for a fair comparison but does mean that the quality of the different harnesses or optimized prompts is not being measured here. The chart above is a screenshot from the SWE-bench website, but their charts don't include the actual percentage values visible on the bars. I successfully used Claude for Chrome to add these - [transcript here](https://claude.ai/share/81a0c519-c727-4caa-b0d4-0d866375d0da). My prompt sequence included: > Use claude in chrome to open https://www.swebench.com/ > Click on "Compare results" and then select "Select top 10" > See those bar charts? I want them to display the percentage on each bar so I can take a better screenshot, modify the page like that I'm impressed at how well this worked - Claude injected custom JavaScript into the page to draw additional labels on top of the existing chart. ![Screenshot of a Claude AI conversation showing browser automation. A thinking step reads "Pivoted strategy to avoid recursion issues with chart labeling >" followed by the message "Good, the chart is back. Now let me carefully add the labels using an inline plugin on the chart instance to avoid the recursion issue." A collapsed "Browser_evaluate" section shows a browser_evaluate tool call with JavaScript code using Chart.js canvas context to draw percentage labels on bars: meta.data.forEach((bar, index) => { const value = dataset.data[index]; if (value !== undefined && value !== null) { ctx.save(); ctx.textAlign = 'center'; ctx.textBaseline = 'bottom'; ctx.fillStyle = '#333'; ctx.font = 'bold 12px sans-serif'; ctx.fillText(value.toFixed(1) + '%', bar.x, bar.y - 5); A pending step reads "Let me take a screenshot to see if it worked." followed by a completed "Done" step, and the message "Let me take a screenshot to check the result."](https://static.simonwillison.net/static/2026/claude-chrome-draw-on-chart.jpg) **Update**: If you look at the transcript Claude claims to have switched to Playwright, which is confusing because I didn't think I had that configured. 2026-02-19 04:48:47+00:00
LadybirdBrowser/ladybird: Abandon Swift adoption https://github.com/LadybirdBrowser/ladybird/commit/e87f889e31afbb5fa32c910603c7f5e781c97afd Back [in August 2024](https://simonwillison.net/2024/Aug/11/ladybird-set-to-adopt-swift/) the Ladybird browser project announced an intention to adopt Swift as their memory-safe language of choice. As of [this commit](https://github.com/LadybirdBrowser/ladybird/commit/e87f889e31afbb5fa32c910603c7f5e781c97afd) it looks like they've changed their mind: > **Everywhere: Abandon Swift adoption** > > After making no progress on this for a very long time, let's acknowledge it's not going anywhere and remove it from the codebase. 2026-02-19 01:25:33+00:00
The A.I. Disruption We’ve Been Waiting for Has Arrived https://www.nytimes.com/2026/02/18/opinion/ai-software.html?unlocked_article_code=1.NFA.UkLv.r-XczfzYRdXJ&smid=url-share New opinion piece from Paul Ford in the New York Times. Unsurprisingly for a piece by Paul it's packed with quoteworthy snippets, but a few stood out for me in particular. Paul describes the [November moment](https://simonwillison.net/2026/Jan/4/inflection/) that so many other programmers have observed, and highlights Claude Code's ability to revive old side projects: > [Claude Code] was always a helpful coding assistant, but in November it suddenly got much better, and ever since I’ve been knocking off side projects that had sat in folders for a decade or longer. It’s fun to see old ideas come to life, so I keep a steady flow. Maybe it adds up to a half-hour a day of my time, and an hour of Claude’s. > > November was, for me and many others in tech, a great surprise. Before, A.I. coding tools were often useful, but halting and clumsy. Now, the bot can run for a full hour and make whole, designed websites and apps that may be flawed, but credible. I spent an entire session of therapy talking about it. And as the former CEO of a respected consultancy firm (Postlight) he's well positioned to evaluate the potential impact: > When you watch a large language model slice through some horrible, expensive problem — like migrating data from an old platform to a modern one — you feel the earth shifting. I was the chief executive of a software services firm, which made me a professional software cost estimator. When I rebooted my messy personal website a few weeks ago, I realized: I would have paid $25,000 for someone else to do this. When a friend asked me to convert a large, thorny data set, I downloaded it, cleaned it up and made it pretty and easy to explore. In the past I would have charged $350,000. > > That last price is full 2021 retail — it implies a product manager, a designer, two engineers (one senior) and four to six months of design, coding and testing. Plus maintenance. Bespoke software is joltingly expensive. Today, though, when the stars align and my prompts work out, I can do hundreds of thousands of dollars worth of work for fun (fun for me) over weekends and evenings, for the price of the Claude $200-a-month plan. He also neatly captures the inherent community tension involved in exploring this technology: > All of the people I love hate this stuff, and all the people I hate love it. And yet, likely because of the same personality flaws that drew me to technology in the first place, I am annoyingly excited. 2026-02-18 17:07:31+00:00
Introducing Claude Sonnet 4.6 https://www.anthropic.com/news/claude-sonnet-4-6 Sonnet 4.6 is out today, and Anthropic claim it offers similar performance to [November's Opus 4.5](https://simonwillison.net/2025/Nov/24/claude-opus/) while maintaining the Sonnet pricing of $3/million input and $15/million output tokens (the Opus models are $5/$25). Here's [the system card PDF](https://www-cdn.anthropic.com/78073f739564e986ff3e28522761a7a0b4484f84.pdf). Sonnet 4.6 has a "reliable knowledge cutoff" of August 2025, compared to Opus 4.6's May 2025 and Haiku 4.5's February 2025. Both Opus and Sonnet default to 200,000 max input tokens but can stretch to 1 million in beta and at a higher cost. I just released [llm-anthropic 0.24](https://github.com/simonw/llm-anthropic/releases/tag/0.24) with support for both Sonnet 4.6 and Opus 4.6. Claude Code [did most of the work](https://github.com/simonw/llm-anthropic/pull/65) - the new models had a fiddly amount of extra details around adaptive thinking and no longer supporting prefixes, as described [in Anthropic's migration guide](https://platform.claude.com/docs/en/about-claude/models/migration-guide). Here's [what I got](https://gist.github.com/simonw/b185576a95e9321b441f0a4dfc0e297c) from: uvx --with llm-anthropic llm 'Generate an SVG of a pelican riding a bicycle' -m claude-sonnet-4.6 ![The pelican has a jaunty top hat with a red band. There is a string between the upper and lower beaks for some reason. The bicycle frame is warped in the wrong way.](https://static.simonwillison.net/static/2026/pelican-sonnet-4.6.png) The SVG comments include: <!-- Hat (fun accessory) --> I tried a second time and also got a top hat. Sonnet 4.6 apparently loves top hats! For comparison, here's the pelican Opus 4.5 drew me [in November]((https://simonwillison.net/2025/Nov/24/claude-opus/)): ![The pelican is cute and looks pretty good. The bicycle is not great - the frame is wrong and the pelican is facing backwards when the handlebars appear to be forwards.There is also something that looks a bit like an egg on the handlebars.](https://static.simonwillison.net/static/2025/claude-opus-4.5-pelican.jpg) And here's Anthropic's current best pelican, drawn by Opus 4.6 [on February 5th](https://simonwillison.net/2026/Feb/5/two-new-models/): ![Slightly wonky bicycle frame but an excellent pelican, very clear beak and pouch, nice feathers.](https://static.simonwillison.net/static/2026/opus-4.6-pelican.png) Opus 4.6 produces the best pelican beak/pouch. I do think the top hat from Sonnet 4.6 is a nice touch though. 2026-02-17 23:58:58+00:00
Rodney v0.4.0 https://github.com/simonw/rodney/releases/tag/v0.4.0 My [Rodney](https://github.com/simonw/rodney) CLI tool for browser automation attracted quite the flurry of PRs since I announced it [last week](https://simonwillison.net/2026/Feb/10/showboat-and-rodney/#rodney-cli-browser-automation-designed-to-work-with-showboat). Here are the release notes for the just-released v0.4.0: > - Errors now use exit code 2, which means exit code 1 is just for for check failures. [#15](https://github.com/simonw/rodney/pull/15) > - New `rodney assert` command for running JavaScript tests, exit code 1 if they fail. [#19](https://github.com/simonw/rodney/issues/19) > - New directory-scoped sessions with `--local`/`--global` flags. [#14](https://github.com/simonw/rodney/pull/14) > - New `reload --hard` and `clear-cache` commands. [#17](https://github.com/simonw/rodney/pull/17) > - New `rodney start --show` option to make the browser window visible. Thanks, [Antonio Cuni](https://github.com/antocuni). [#13](https://github.com/simonw/rodney/paull/13) > - New `rodney connect PORT` command to debug an already-running Chrome instance. Thanks, [Peter Fraenkel](https://github.com/pnf). [#12](https://github.com/simonw/rodney/pull/12) > - New `RODNEY_HOME` environment variable to support custom state directories. Thanks, [Senko Rašić](https://github.com/senko). [#11](https://github.com/simonw/rodney/pull/11) > - New `--insecure` flag to ignore certificate errors. Thanks, [Jakub Zgoliński](https://github.com/zgolus). [#10](https://github.com/simonw/rodney/pull/10) > - Windows support: avoid `Setsid` on Windows via build-tag helpers. Thanks, [adm1neca](https://github.com/adm1neca). [#18](https://github.com/simonw/rodney/pull/18) > - Tests now run on `windows-latest` and `macos-latest` in addition to Linux. I've been using [Showboat](https://github.com/simonw/showboat) to create demos of new features - here those are for [rodney assert](https://github.com/simonw/rodney/tree/v0.4.0/notes/assert-command-demo), [rodney reload --hard](https://github.com/simonw/rodney/tree/v0.4.0/notes/clear-cache-demo), [rodney exit codes](https://github.com/simonw/rodney/tree/v0.4.0/notes/error-codes-demo), and [rodney start --local](https://github.com/simonw/rodney/tree/v0.4.0/notes/local-sessions-demo). The `rodney assert` command is pretty neat: you can now Rodney to test a web app through multiple steps in a shell script that looks something like this (adapted from [the README](https://github.com/simonw/rodney/blob/v0.4.0/README.md#combining-checks-in-a-shell-script)): <div class="highlight highlight-source-shell"><pre><span class="pl-c"><span class="pl-c">#!</span>/bin/bash</span> <span class="pl-c1">set</span> -euo pipefail FAIL=0 <span class="pl-en">check</span>() { <span class="pl-k">if</span> <span class="pl-k">!</span> <span class="pl-s"><span class="pl-pds">"</span><span class="pl-smi">$@</span><span class="pl-pds">"</span></span><span class="pl-k">;</span> <span class="pl-k">then</span> <span class="pl-c1">echo</span> <span class="pl-s"><span class="pl-pds">"</span>FAIL: <span class="pl-smi">$*</span><span class="pl-pds">"</span></span> FAIL=1 <span class="pl-k">fi</span> } rodney start rodney open <span class="pl-s"><span class="pl-pds">"</span>https://example.com<span class="pl-pds">"</span></span> rodney waitstable <span class="pl-c"><span class="pl-c">#</span> Assert elements exist</span> check rodney exists <span class="pl-s"><span class="pl-pds">"</span>h1<span class="pl-pds">"</span></span> <span class="pl-c"><span class="pl-c">#</span> Assert key elements are visible</span> check rodney visible <span class="pl-s"><span class="pl-pds">"</span>h1<span class="pl-pds">"</span></span> check rodney visible <span class="pl-s"><span class="pl-pds">"</span>#main-content<span class="pl-pds">"</span></span> <span class="pl-c"><span class="pl-c">#</span> Assert JS expressions</span> check rodney assert <span class="pl-s"><span class="pl-pds">'</span>document.title<span class="pl-pds">'</span></span> <span class="pl-s"><span class="pl-pds">'</span>Example Domain<span class="pl-pds">'</span></span> check rodney assert <span class="pl-s"><span class="pl-pds">'</span>document.querySelectorAll("p").length<span class="pl-pds">'</span></span> <span class="pl-s"><span class="pl-pds">'</span>2<span class="pl-pds">'</span></span> <span class="pl-c"><span class="pl-c">#</span> Assert accessibility requirements</span> check rodney ax-find --role navigation rodney stop <span class="pl-k">if</span> [ <span class="pl-s"><span class="pl-pds">"</span><span class="pl-smi">$FAIL</span><span class="pl-pds">"</span></span> <span class="pl-k">-ne</span> 0 ]<span class="pl-k">;</span> <span class="pl-k">then</span> <span class="pl-c1">echo</span> <span class="pl-s"><span class="pl-pds">"</span>Some checks failed<span class="pl-pds">"</span></span> <span class="pl-c1">exit</span> 1 <span class="pl-k">fi</span> <span class="pl-c1">echo</span> <span class="pl-s"><span class="pl-pds">"</span>All checks passed<span class="pl-pds">"</span></span></pre></div> 2026-02-17 23:02:33+00:00
First kākāpō chick in four years hatches on Valentine's Day https://www.doc.govt.nz/news/media-releases/2026-media-releases/first-kakapo-chick-in-four-years-hatches-on-valentines-day/ First chick of [the 2026 breeding season](https://simonwillison.net/2026/Jan/8/llm-predictions-for-2026/#1-year-k-k-p-parrots-will-have-an-outstanding-breeding-season)! > Kākāpō Yasmine hatched an egg fostered from kākāpō Tīwhiri on Valentine's Day, bringing the total number of kākāpō to 237 – though it won’t be officially added to the population until it fledges. Here's why the egg was fostered: > "Kākāpō mums typically have the best outcomes when raising a maximum of two chicks. Biological mum Tīwhiri has four fertile eggs this season already, while Yasmine, an experienced foster mum, had no fertile eggs." And an [update from conservation biologist Andrew Digby](https://bsky.app/profile/digs.bsky.social/post/3mf25glzt2c2b) - a second chick hatched this morning! > The second #kakapo chick of the #kakapo2026 breeding season hatched this morning: Hine Taumai-A1-2026 on Ako's nest on Te Kākahu. We transferred the egg from Anchor two nights ago. This is Ako's first-ever chick, which is just a few hours old in this video. That post [has a video](https://bsky.app/profile/digs.bsky.social/post/3mf25glzt2c2b) of mother and chick. ![A beautiful charismatic green Kākāp feeding a little grey chick](https://static.simonwillison.net/static/2026/kakapo-plus-chick.jpg) 2026-02-17 14:09:43+00:00
Qwen3.5: Towards Native Multimodal Agents https://qwen.ai/blog?id=qwen3.5 Alibaba's Qwen just released the first two models in the Qwen 3.5 series - one open weights, one proprietary. Both are multi-modal for vision input. The open weight one is a Mixture of Experts model called Qwen3.5-397B-A17B. Interesting to see Qwen call out serving efficiency as a benefit of that architecture: > Built on an innovative hybrid architecture that fuses linear attention (via Gated Delta Networks) with a sparse mixture-of-experts, the model attains remarkable inference efficiency: although it comprises 397 billion total parameters, just 17 billion are activated per forward pass, optimizing both speed and cost without sacrificing capability. It's [807GB on Hugging Face](https://huggingface.co/Qwen/Qwen3.5-397B-A17B), and Unsloth have a [collection of smaller GGUFs](https://huggingface.co/unsloth/Qwen3.5-397B-A17B-GGUF) ranging in size from 94.2GB 1-bit to 462GB Q8_K_XL. I got this [pelican](https://simonwillison.net/tags/pelican-riding-a-bicycle/) from the [OpenRouter hosted model](https://openrouter.ai/qwen/qwen3.5-397b-a17b) ([transcript](https://gist.github.com/simonw/625546cf6b371f9c0040e64492943b82)): ![Pelican is quite good although the neck lacks an outline for some reason. Bicycle is very basic with an incomplete frame](https://static.simonwillison.net/static/2026/qwen3.5-397b-a17b.png) The proprietary hosted model is called Qwen3.5 Plus 2026-02-15, and is a little confusing. Qwen researcher [Junyang Lin says](https://twitter.com/JustinLin610/status/2023340126479569140): > Qwen3-Plus is a hosted API version of 397B. As the model natively supports 256K tokens, Qwen3.5-Plus supports 1M token context length. Additionally it supports search and code interpreter, which you can use on Qwen Chat with Auto mode. Here's [its pelican](https://gist.github.com/simonw/9507dd47483f78dc1195117735273e20), which is similar in quality to the open weights model: ![Similar quality pelican. The bicycle is taller and has a better frame shape. They are visually quite similar.](https://static.simonwillison.net/static/2026/qwen3.5-plus-02-15.png) 2026-02-17 04:30:57+00:00
The AI Vampire https://steve-yegge.medium.com/the-ai-vampire-eda6e4f07163 Steve Yegge's take on agent fatigue, and its relationship to burnout. > Let's pretend you're the only person at your company using AI. > > In Scenario A, you decide you're going to impress your employer, and work for 8 hours a day at 10x productivity. You knock it out of the park and make everyone else look terrible by comparison. > > In that scenario, your employer captures 100% of the value from *you* adopting AI. You get nothing, or at any rate, it ain't gonna be 9x your salary. And everyone hates you now. > > And you're *exhausted.* You're tired, Boss. You got nothing for it. > > Congrats, you were just drained by a company. I've been drained to the point of burnout several times in my career, even at Google once or twice. But now with AI, it's oh, so much easier. Steve reports needing more sleep due to the cognitive burden involved in agentic engineering, and notes that four hours of agent work a day is a more realistic pace: > I’ve argued that AI has turned us all into Jeff Bezos, by automating the easy work, and leaving us with all the difficult decisions, summaries, and problem-solving. I find that I am only really comfortable working at that pace for short bursts of a few hours once or occasionally twice a day, even with lots of practice. 2026-02-15 23:59:36+00:00
Gwtar: a static efficient single-file HTML format https://gwern.net/gwtar Fascinating new project from Gwern Branwen and Said Achmiz that targets the challenge of combining large numbers of assets into a single archived HTML file without that file being inconvenient to view in a browser. The key trick it uses is to fire [window.stop()](https://developer.mozilla.org/en-US/docs/Web/API/Window/stop) early in the page to prevent the browser from downloading the whole thing, then following that call with inline tar uncompressed content. It can then make HTTP range requests to fetch content from that tar data on-demand when it is needed by the page. The JavaScript that has already loaded rewrites asset URLs to point to `https://localhost/` purely so that they will fail to load. Then it uses a [PerformanceObserver](https://developer.mozilla.org/en-US/docs/Web/API/PerformanceObserver) to catch those attempted loads: let perfObserver = new PerformanceObserver((entryList, observer) => { resourceURLStringsHandler(entryList.getEntries().map(entry => entry.name)); }); perfObserver.observe({ entryTypes: [ "resource" ] }); That `resourceURLStringsHandler` callback finds the resource if it is already loaded or fetches it with an HTTP range request otherwise and then inserts the resource in the right place using a `blob:` URL. Here's what the `window.stop()` portion of the document looks like if you view the source: ![Screenshot of a macOS terminal window titled "gw — more big.html — 123×46" showing the source code of a gwtar (self-extracting HTML archive) file. The visible code includes JavaScript with `requestIdleCallback(getMainPageHTML);`, a ` noscript ` block with warnings: a "js-disabled-warning" stating "This HTML page requires JavaScript to be enabled to render, as it is a self-extracting gwtar HTML file," a description of gwtar as "a portable self-contained standalone HTML file which is designed to nevertheless support efficient lazy loading of all assets such as large media files," with a link to https://gwern.net/gwtar, a "local-file-warning" with a shell command `perl -ne'print $_ if $x; $x=1 if /<!-- GWTAR END/' &lt; foo.gwtar.html | tar --extract`, and a "server-fail-warning" about misconfigured servers. Below the HTML closing tags and `<!-- GWTAR END` comment is binary tar archive data with the filename `2010-02-brianmoriarty-thesecretofpsalm46.html`, showing null-padded tar header fields including `ustar^@00root` and octal size/permission values. At the bottom, a SingleFile metadata comment shows `url: https://web.archive.org/web/20230512001411/http://ludix.com/moriarty/psalm46.html` and `saved date: Sat Jan 17 2026 19:26:49 GMT-0800 (Pacific Standard Time)`.](https://static.simonwillison.net/static/2026/gwtar.jpg) Amusingly for an archive format it doesn't actually work if you open the file directly on your own computer. Here's what you see if you try to do that: > You are seeing this message, instead of the page you should be seeing, because `gwtar` files **cannot be opened locally** (due to web browser security restrictions). > > To open this page on your computer, use the following shell command: > > `perl -ne'print $_ if $x; $x=1 if /<!-- GWTAR END/' < foo.gwtar.html | tar --extract` > > Then open the file `foo.html` in any web browser. 2026-02-15 18:26:08+00:00
Copy and export data

Duration: 3.55ms