Simon Willison’s Weblog

Subscribe

Thursday, 22nd January 2026

Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine".

For each frame our pipeline constructs a scene graph with React then

-> layouts elements
-> rasterizes them to a 2d screen
-> diffs that against the previous screen
-> finally uses the diff to generate ANSI sequences to draw

We have a ~16ms frame budget so we have roughly ~5ms to go from the React scene graph to ANSI written.

Thariq Shihipar

# 3:34 pm / react, claude-code

Qwen3-TTS Family is Now Open Sourced: Voice Design, Clone, and Generation (via) I haven't been paying much attention to the state-of-the-art in speech generation models other than noting that they've got really good, so I can't speak for how notable this new release from Qwen is.

From the accompanying paper:

In this report, we present the Qwen3-TTS series, a family of advanced multilingual, controllable, robust, and streaming text-to-speech models. Qwen3-TTS supports state-of- the-art 3-second voice cloning and description-based control, allowing both the creation of entirely novel voices and fine-grained manipulation over the output speech. Trained on over 5 million hours of speech data spanning 10 languages, Qwen3-TTS adopts a dual-track LM architecture for real-time synthesis [...]. Extensive experiments indicate state-of-the-art performance across diverse objective and subjective benchmark (e.g., TTS multilingual test set, InstructTTSEval, and our long speech test set). To facilitate community research and development, we release both tokenizers and models under the Apache 2.0 license.

To give an idea of size, Qwen/Qwen3-TTS-12Hz-1.7B-Base is 4.54GB on Hugging Face and Qwen/Qwen3-TTS-12Hz-0.6B-Base is 2.52GB.

The Hugging Face demo lets you try out the 0.6B and 1.7B models for free in your browser, including voice cloning:

Screenshot of a Qwen3-TTS voice cloning web interface with three tabs at top: "Voice Design", "Voice Clone (Base)" (selected), and "TTS (CustomVoice)". The page is titled "Clone Voice from Reference Audio" and has two main sections. Left section: "Reference Audio (Upload a voice sample clone)" showing an audio waveform player at 0:00/0:34 with playback controls, upload and microphone icons, followed by "Reference Text (Transcript of the reference audio)" containing three paragraphs: "Simon Willison is the creator of Datasette, an open source tool for exploring and publishing data. He currently works full-time building open source tools for data journalism, built around Datasette and SQLite. Prior to becoming an independent open source developer, Simon was an engineering director at Eventbrite. Simon joined Eventbrite through their acquisition of Lanyrd, a Y Combinator funded company he co-founded in 2010. He is a co-creator of the Django Web Framework, and has been blogging about web development and programming since 2002 at simonwillison.net". Right section: "Target Text (Text to synthesize with cloned voice)" containing text about Qwen3-TTS speech generation capabilities, with "Language" dropdown set to "Auto" and "Model Size" dropdown set to "1.7B", and a purple "Clone & Generate" button at bottom.

I tried this out by recording myself reading my about page and then having Qwen3-TTS generate audio of me reading the Qwen3-TTS announcement post. Here's the result:

It's important that everyone understands that voice cloning is now something that's available to anyone with a GPU and a few GBs of VRAM... or in this case a web browser that can access Hugging Face.

# 5:42 pm / text-to-speech, ai, generative-ai, hugging-face, qwen, ai-in-china

2026 » January

MTWTFSS
   1234
567891011
12131415161718
19202122232425
262728293031