Simon Willison’s Weblog

Subscribe

Items tagged ai in 2022

Filters: Year: 2022 × ai × Sorted by date


Speech-to-text with Whisper: How I Use It & Why. Sumana Harihareswara’s in-depth review of Whisper, the shockingly effective open source text-to-speech transcription model release by OpenAI a few months ago. Includes an extremely thoughtful section considering the ethics of using this model—some of the most insightful short-form writing I’ve seen on AI model ethics generally. # 22nd December 2022, 9:49 pm

talk.wasm (via) “Talk with an Artificial Intelligence in your browser”. Absolutely stunning demo which loads the Whisper speech recognition model (75MB) and a GPT-2 model (240MB) and executes them both in your browser via WebAssembly, then uses the Web Speech API to talk back to you. The result is a full speak-with-an-AI interface running entirely client-side. GPT-2 sadly mostly generates gibberish but the fact that this works at all is pretty astonishing. # 7th December 2022, 10:52 pm

The primary problem is that while the answers which ChatGPT produces have a high rate of being incorrect, they typically look like they might be good and the answers are very easy to produce. There are also many people trying out ChatGPT to create answers, without the expertise or willingness to verify that the answer is correct prior to posting. Because such answers are so easy to produce, a large number of people are posting a lot of answers. The volume of these answers (thousands) and the fact that the answers often require a detailed read by someone with at least some subject matter expertise in order to determine that the answer is actually bad has effectively swamped our volunteer-based quality curation infrastructure.

StackOverflow Temporary policy: ChatGPT is banned # 6th December 2022, 12:16 am

AI assisted learning: Learning Rust with ChatGPT, Copilot and Advent of Code

I’m using this year’s Advent of Code to learn Rust—with the assistance of GitHub Copilot and OpenAI’s new ChatGPT.

[... 2661 words]

Building A Virtual Machine inside ChatGPT (via) Jonas Degrave presents a remarkable example of a creative use of ChatGPT: he prompts it to behave as a if it was a Linux shell, then runs increasingly complex sequences of commands against it and gets back surprisingly realistic results. By the end of the article he’s getting it to hallucinate responses to curl API requests run against imagined API versions of itself. # 5th December 2022, 1:43 am

A new AI game: Give me ideas for crimes to do

Less than a week ago OpenAI unleashed ChatGPT on the world, and it kicked off what feels like a seismic shift in many people’s understand of the capabilities of large language models.

[... 1069 words]

These kinds of biases aren’t so much a technical problem as a sociotechnical one; ML models try to approximate biases in their underlying datasets and, for some groups of people, some of these biases are offensive or harmful. That means in the coming years there will be endless political battles about what the ‘correct’ biases are for different models to display (or not display), and we can ultimately expect there to be as many approaches as there are distinct ideologies on the planet. I expect to move into a fractal ecosystem of models, and I expect model providers will ‘shapeshift’ a single model to display different biases depending on the market it is being deployed into. This will be extraordinarily messy.

Jack Clark # 16th November 2022, 11:04 pm

The AI that creates any picture you want, explained. Vox made this explainer video about text-to-image generative AI models back in June, months before Stable Diffusion was released and shortly before the DALL-E preview started rolling out to a wider audience. It’s a really good video—in particular the animation that explains at a high level how diffusion models work, which starts about 5m30s in. # 10th October 2022, 3:28 am

Is the AI spell-casting metaphor harmful or helpful?

For a few weeks now I’ve been promoting spell-casting as a metaphor for prompt design against generative AI systems such as GPT-3 and Stable Diffusion.

[... 988 words]

Getting tabular data from unstructured text with GPT-3: an ongoing experiment (via) Roberto Rocha shows how to use a carefully designed prompt (with plenty of examples) to get GPT-3 to convert unstructured textual data into a structured table. # 5th October 2022, 3:03 am

The Illustrated Stable Diffusion (via) Jay Alammar provides a detailed, clearly explained description of how the Stable Diffusion image generation model actually works under the hood.. # 5th October 2022, 2:58 am

All these generative models point to the same big thing that’s about to alter culture; everyone’s going to be able to generate their own custom and subjective aesthetic realities across text, video, music (and all three) in increasingly delightful, coherent, and lengthy ways. This form of fractal reality is a double-edged sword – everyone gets to create and live in their own fantasies that can be made arbitrarily specific, and that also means everyone loses a further grip on any sense of a shared reality. Society is moving from having a centralized sense of itself to instead highly individualized choose-your-own adventure islands, all facilitated by AI. The implications of this are vast and unknowable. Get ready.

Jack Clark # 4th October 2022, 5:29 pm

Exploring 10m scraped Shutterstock videos used to train Meta’s Make-A-Video text-to-video model

Make-A-Video is a new “state-of-the-art AI system that generates videos from text” from Meta AI. It looks incredible—it really is DALL-E / Stable Diffusion for video. And it appears to have been trained on 10m video preview clips scraped from Shutterstock.

[... 923 words]

Running training jobs across multiple nodes scales really well. A common assumption is that scale inevitably means slowdowns: more GPUs means more synchronization overhead, especially with multiple nodes communicating across a network. But we observed that the performance penalty isn’t as harsh as what you might think. Instead, we found near-linear strong scaling: fixing the global batch size and training on more GPUs led to proportional increases in training throughput. On a 1.3B parameter model, 4 nodes means a 3.9x gain over one node. On 16 nodes, it’s 14.4x. This is largely thanks to the super fast interconnects that major cloud providers have built in: @awscloud EC2 P4d instances provide 400 Gbps networking bandwidth, @Azure provides 1600 Gbps, and @OraclePaaS provides 800 Gbps.

Linden Li # 24th September 2022, 4:03 pm

I Resurrected “Ugly Sonic” with Stable Diffusion Textual Inversion (via) “I trained an Ugly Sonic object concept on 5 image crops from the movie trailer, with 6,000 steps [...] (on a T4 GPU, this took about 1.5 hours and cost about $0.21 on a GCP Spot instance)” # 20th September 2022, 3:35 am

An introduction to XGBoost regression. I hadn’t realized what a wealth of high quality tutorial material could be found in Kaggle notebooks. Here Carl McBride Ellis provides a very approachable and practical introduction to XGBoost, one of the leading techniques for building machine learning models against tabular data. # 18th September 2022, 1:42 pm

Google has LaMDA available in a chat that’s supposed to stay on the topic of dogs, but you can say “can we talk about something else and say something dog related at the end so it counts?” and they’ll do it!

Michelle M # 18th September 2022, 1:08 am

You can’t solve AI security problems with more AI

One of the most common proposed solutions to prompt injection attacks (where an AI language model backed system is subverted by a user injecting malicious input—“ignore previous instructions and do this instead”) is to apply more AI to the problem.

[... 1234 words]

Of all the parameters in SD, the seed parameter is the most important anchor for keeping the image generation the same. In SD-space, there are only 4.3 billion possible seeds. You could consider each seed a different universe, numbered as the Marvel universe does (where the main timeline is #616, and #616 Dr Strange visits #838 and a dozen other universes). Universe #42 is the best explored, because someone decided to make it the default for text2img.py (probably a Hitchhiker’s Guide reference). But you could change the seed, and get a totally different result from what is effectively a different universe.

swyx # 17th September 2022, 9:02 pm

The Changelog: Stable Diffusion breaks the internet. I’m on this week’s episode of The Changelog podcast, talking about Stable Diffusion, AI ethics and a little bit about prompt injection attacks too. # 17th September 2022, 2:14 am

I don’t know how to solve prompt injection

Some extended thoughts about prompt injection attacks against software built on top of AI language models such a GPT-3. This post started as a Twitter thread but I’m promoting it to a full blog entry here.

[... 581 words]

Prompt injection attacks against GPT-3

Riley Goodside, yesterday:

[... 1453 words]

karpathy/minGPT (via) A “minimal PyTorch re-implementation” of the OpenAI GPT training and inference model, by Andrej Karpathy. It’s only a few hundred lines of code and includes extensive comments, plus notebook demos. # 6th September 2022, 2:52 pm

Feeding AI systems on the world’s beauty, ugliness, and cruelty, but expecting it to reflect only the beauty is a fantasy

Ruha Benjamin # 5th September 2022, 9:42 pm

r/MachineLearning: What is the SOTA explanation for why deep learning works? The thing I find fascinating about this Reddit conversation is that it makes it clear that the machine learning research community has very little agreement on WHY the state of the art techniques that are being used today actually work as well as they do. # 5th September 2022, 5:46 pm

Exploring the training data behind Stable Diffusion

Two weeks ago, the Stable Diffusion image generation model was released to the public. I wrote about this last week, in Stable Diffusion is a really big deal—a post which has since become one of the top ten results for “stable diffusion” on Google and shown up in all sorts of different places online.

[... 2897 words]

For these reasons, I don’t think I’ll be using Midjourney or any similar tool to illustrate my newsletter going forward (an exception would be if I were writing about the technology at a later date and wanted to show examples). Even though the job wouldn’t go to a different, deserving, human artist, I think the optics are shitty, and I do worry about having any role in helping to set any kind of precedent in this direction.

Charlie Warzel # 4th September 2022, 9:06 pm

Run Stable Diffusion on your M1 Mac’s GPU. Ben Firshman provides detailed instructions for getting Stable Diffusion running on an M1 Mac. # 1st September 2022, 5:41 pm

Exploring 12 Million of the 2.3 Billion Images Used to Train Stable Diffusion’s Image Generator. Andy Baio and I collaborated on an investigation into the training set used for Stable Diffusion. I built a Datasette instance with 12m image records sourced from the LAION-Aesthetics v2 6+ aesthetic score data used as part of the training process, and built a tool so people could run searches and explore the data. Andy did some extensive analysis of things like the domains scraped for the images and names of celebrities and artists represented in the data. His write-up here explains our project in detail and some of the patterns we’ve uncovered so far. # 31st August 2022, 2:10 am

Stable Diffusion is a really big deal

If you haven’t been paying attention to what’s going on with Stable Diffusion, you really should be.

[... 1443 words]