Simon Willison’s Weblog

Subscribe

Inspired by this conversation on Hacker News I decided to upgrade MacWhisper to try out NVIDIA Parakeet and the new Automatic Speaker Recognition feature.

It appears to work really well! Here's the result against this 39.7MB m4a file from my Gemini 3 Pro write-up this morning:

A screenshot of the MacWhisper transcription application interface displaying a file named "HMB_compressed." The center panel shows a transcript of a City Council meeting. Speaker 2 begins, "Thank you, Mr. Mayor, uh City Council... Victor Hernandez, Spanish interpreter," followed by Spanish instructions: "Buenas noches, les queremos dejar saber a todos ustedes que pueden acceder lo que es el canal de Zoom..." Speaker 1 responds, "Thank you. Appreciate that. Can we please have a roll call?" Speaker 3 then calls out "Councilmember Johnson?" and "Councilmember Nagengast?" to which Speaker 1 answers, "Here." The interface includes metadata on the right indicating the model "Parakeet v3" and a total word count of 26,109.

You can export the transcript with both timestamps and speaker names using the Share -> Segments > .json menu item:

A close-up of the MacWhisper interface showing the export dropdown menu with "Segments" selected. A secondary menu lists various file formats including .txt, .csv, and .pdf, with a red arrow pointing specifically to the ".json" option, set against the background of the meeting transcript.

Here's the resulting JSON.

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe