Simon Willison’s Weblog

Subscribe

Wednesday, 4th February 2026

Distributing Go binaries like sqlite-scanner through PyPI using go-to-wheel

Visit Distributing Go binaries like sqlite-scanner through PyPI using go-to-wheel

I’ve been exploring Go for building small, fast and self-contained binary applications recently. I’m enjoying how there’s generally one obvious way to do things and the resulting code is boring and readable—and something that LLMs are very competent at writing. The one catch is distribution, but it turns out publishing Go binaries to PyPI means any Go binary can be just a uvx package-name call away.

[... 1,312 words]

Voxtral transcribes at the speed of sound (via) Mistral just released Voxtral Transcribe 2 - a family of two new models, one open weights, for transcribing audio to text. This is the latest in their Whisper-like model family, and a sequel to the original Voxtral which they released in July 2025.

Voxtral Realtime - official name Voxtral-Mini-4B-Realtime-2602 - is the open weights (Apache-2.0) model, available as a 8.87GB download from Hugging Face.

You can try it out in this live demo - don't be put off by the "No microphone found" message, clicking "Record" should have your browser request permission and then start the demo working. I was very impressed by the demo - I talked quickly and used jargon like Django and WebAssembly and it correctly transcribed my text within moments of me uttering each sound.

The closed weight model is called voxtral-mini-latest and can be accessed via the Mistral API, using calls that look something like this:

curl -X POST "https://api.mistral.ai/v1/audio/transcriptions" \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -F model="voxtral-mini-latest" \
  -F file=@"Pelican talk at the library.m4a" \
  -F diarize=true \
  -F context_bias="Datasette" \
  -F timestamp_granularities="segment"

It's priced at $0.003/minute, which is $0.18/hour.

The Mistral API console now has a speech-to-text playground for exercising the new model and it is excellent. You can upload an audio file and promptly get a diarized transcript in a pleasant interface, with options to download the result in text, SRT or JSON format.

Screenshot of a speech-to-text transcription interface for a file named "Pelican talk at the library.m4a". The toolbar shows "Speech to text" with Code, Transcribe, and Download buttons. The transcript shows timestamped segments from 5:53 to 6:53 with a speaker icon, reading: "5:53 – 6:01 So pelicans love to, they're very good at getting the most they can out of the topography when they're flying. 6:01 – 6:06 And our winds come in from the northwest and they hit those bluffs and they're deflected up. 6:07 – 6:18 And they will sit right, they'll fly north into a wind like five feet off those bluffs, but just five or ten feet off the surface because the winds dissipate. 6:19 – 6:22 And they will surf that bluff all the way north. 6:23 – 6:30 So you'll see a wind from the north at 15 miles an hour, and the pelicans are flying north into that wind and not flapping their wings. 6:31 – 6:33 And it's one of the coolest things. 6:33 – 6:35 You can only find it on San Francisco Coast. 6:36 – 6:39 Where right where the bluffs are steep. 6:41 – 6:43 Pacifica, you can find them there. 6:43 – 6:51 They like their, what we call pier bums, which are typically pelicans that have, are in some sort of trouble. 6:51 – 6:53 They're unable to catch food." The segment at 6:41–6:43 is highlighted in yellow. An audio waveform is shown at the bottom with a playhead near 6:40. Stats in the lower right show 53.90s, 7946.00s, and #45833.

# 10:42 pm / ai, generative-ai, llms, hugging-face, mistral, speech-to-text

2026 » February

MTWTFSS
      1
2345678
9101112131415
16171819202122
232425262728