Simon Willison’s Weblog


11 items tagged “whisper”


The Zen of Python, Unix, and LLMs. Here’s the YouTube recording of my 1.5 hour conversation with Hugo Bowne-Anderson yesterday.

I fed a Whisper transcript to Google Gemini Pro 1.5 and asked it for the themes from our conversation, and it said we talked about “Python’s success and versatility, the rise and potential of LLMs, data sharing and ethics in the age of LLMs, Unix philosophy and its influence on software development and the future of programming and human-computer interaction”.

# 29th February 2024, 9:04 pm / python, speaking, ai, whisper, llms, gemini

GPUs on are available to everyone! We’ve been experimenting with GPUs on Fly for a few months for Datasette Cloud. They’re well documented and quite easy to use—any example Python code you find that uses NVIDIA CUDA stuff generally Just Works. Most interestingly of all, Fly GPUs can scale to zero—so while they cost $2.50/hr for a A100 40G (VRAM) and $3.50/hr for a A100 80G you can configure them to stop running when the machine runs out of things to do.

We’ve successfully used them to run Whisper and to experiment with running various Llama 2 LLMs as well.

To look forward to: “We are working on getting some lower-cost A10 GPUs in the next few weeks”.

# 14th February 2024, 4:28 am / ai, datasette-cloud, fly, generative-ai, whisper, llms, nvidia

Talking about Open Source LLMs on Oxide and Friends

Visit Talking about Open Source LLMs on Oxide and Friends

I recorded an episode of the Oxide and Friends podcast on Monday, talking with Bryan Cantrill and Adam Leventhal about Open Source LLMs.

[... 1,995 words]


LLaMA voice chat, with Whisper and Siri TTS. llama.cpp author Georgi Gerganov has stitched together the LLaMA language model, the Whisper voice to text model (with his whisper.cpp library) and the macOS “say” command to create an entirely offline AI agent that he can talk to with his voice and that can speak replies straight back to him.

# 27th March 2023, 9:06 pm / macosx, text-to-speech, ai, generative-ai, whisper, llama, homebrew-llms, llms

Large language models are having their Stable Diffusion moment

Visit Large language models are having their Stable Diffusion moment

The open release of the Stable Diffusion image generation model back in August 2022 was a key moment. I wrote how Stable Diffusion is a really big deal at the time.

[... 1,810 words]

Weeknotes: NICAR, and an appearance on KQED Forum

I spent most of this week at NICAR 2023, the data journalism conference hosted this year in Nashville, Tennessee.

[... 1,941 words]

OpenAI: Introducing ChatGPT and Whisper APIs. The ChatGPT API is a new model called “gpt-3.5-turbo” and is priced at 1/10th of the price of text-davinci-003, previously the most powerful GPT-3 model. Whisper (speech to text transcription) is now available via an API as well, priced at 36 cents per hour of audio.

# 1st March 2023, 7:36 pm / ai, gpt3, openai, generative-ai, chatgpt, whisper, llms

OpenAI’s Whisper is another case study in Colonisation (via) Really interesting perspective on Whisper from the Papa Reo project—a group working to nurture and proliferate the Māori language. “The main questions we ask when we see papers like FLEURS and Whisper are: where did they get their indigenous data from, who gave them access to it, and who gave them the right to create a derived work from that data and then open source the derivation?”

# 8th February 2023, 5:22 pm / openai, generative-ai, whisper


Speech-to-text with Whisper: How I Use It & Why. Sumana Harihareswara’s in-depth review of Whisper, the shockingly effective open source text-to-speech transcription model release by OpenAI a few months ago. Includes an extremely thoughtful section considering the ethics of using this model—some of the most insightful short-form writing I’ve seen on AI model ethics generally.

# 22nd December 2022, 9:49 pm / ethics, ai, openai, whisper

talk.wasm (via) “Talk with an Artificial Intelligence in your browser”. Absolutely stunning demo which loads the Whisper speech recognition model (75MB) and a GPT-2 model (240MB) and executes them both in your browser via WebAssembly, then uses the Web Speech API to talk back to you. The result is a full speak-with-an-AI interface running entirely client-side. GPT-2 sadly mostly generates gibberish but the fact that this works at all is pretty astonishing.

# 7th December 2022, 10:52 pm / ai, webassembly, gpt3, openai, generative-ai, whisper

A tool to run caption extraction against online videos using Whisper and GitHub Issues/Actions

Visit A tool to run caption extraction against online videos using Whisper and GitHub Issues/Actions

I released a new project this weekend, built during the Bellingcat Hackathon (I came second!) It’s called Action Transcription and it’s a tool for caturing captions and transcripts from online videos.

[... 1,362 words]