Simon Willison’s Weblog

16 items tagged “ai”


First impressions of DALL-E, generating images from text

I made it off the DALL-E waiting list a few days ago and I’ve been having an enormous amount of fun experimenting with it. Here are some notes on what I’ve learned so far (and a bunch of example images too).

[... 2101 words]

How Imagen Actually Works. Imagen is Google’s new text-to-image model, similar to (but possibly even more effective than) DALL-E. This article is the clearest explanation I’ve seen of how Imagen works: it uses Google’s existing T5 text encoder to convert the input sentence into an encoding that captures the semantic meaning of the sentence (including things like items being described as being on top of other items), then uses a trained diffusion model to generate a 64x64 image. That image is passed through two super-res models to increase the resolution to the final 1024x1024 output. # 23rd June 2022, 6:05 pm

How to play with the GPT-3 language model

I ran a Twitter poll the other day asking if people had tried GPT-3 and why or why not. The winning option, by quite a long way, was “No, I don’t know how to”. So here’s how to try it out, for free, without needing to write any code.

[... 804 words]

A Datasette tutorial written by GPT-3

I’ve been playing around with OpenAI’s GPT-3 language model playground for a few months now. It’s a fascinating piece of software. You can sign up here—apparently there’s no longer a waiting list.

[... 1244 words]


The art of asking nicely (via) CLIP+VQGAN Is a GAN that generates images based on some text input—you can run it on Google Collab notebooks, there are instructions linked at the bottom of this post. Janelle Shane of AI Weirdness explores tricks for getting the best results out of it for “a herd of sheep grazing on a lush green hillside”—various modifiers like “amazing awesome and epic” produce better images, but the one with the biggest impact, quite upsettingly, is “ultra high definition free desktop wallpaper”. # 2nd July 2021, 3:02 pm

DALL·E: Creating Images from Text (via) “DALL·E is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text–image pairs.”. The examples in this paper are astonishing—“an illustration of a baby daikon radish in a tutu walking a dog” generates exactly that. # 5th January 2021, 8:31 pm


How GPT3 Works—Visualizations and Animations. Nice essay full of custom animations illustrating how GPT-3 actually works. # 30th July 2020, 12:58 am

Tempering Expectations for GPT-3 and OpenAI’s API. Insightful commentary on GPT-3 (which is producing some ridiculously cool demos at the moment thanks to the invite-only OpenAI API) from Max Woolf. # 18th July 2020, 7:29 pm

When data is messy. I love this story: a neural network trained on images was asked what the most significant pixels in pictures of tench (a kind of fish) were: it returned pictures of fingers on a green background, because most of the tench photos it had seen were fisherfolk showing off their catch. # 7th July 2020, 7:03 pm


I have sometimes wondered how I would fare with a problem where the solution really isn’t in sight. I decided that I should give it a try before I get too old. I’m going to work on artificial general intelligence (AGI). I think it is possible, enormously valuable, and that I have a non-negligible chance of making a difference there, so by a Pascal’s Mugging sort of logic, I should be working on it.

John Carmack # 14th November 2019, 1:18 am


Without deep understanding of the basic tools needed to build and train new algorithms, he says, researchers creating AIs resort to hearsay, like medieval alchemists. “People gravitate around cargo-cult practices,” relying on “folklore and magic spells,” adds François Chollet, a computer scientist at Google in Mountain View, California.

Matthew Hutson # 26th December 2018, 4:53 pm

Relational databases are a commodity now, but they power a much larger fraction of the world’s economy that AI ever will. And no company has a “relational database strategy”.

Erik Bernhardsson # 8th October 2018, 12:20 pm

Text to Image (via) Ridiculously entertaining demo by Cris Valenzuela that feeds any text you type to a neural network that then attempts to generate an image for your text. # 18th August 2018, 5:33 pm

Half of the time when companies say they need “AI” what they really need is a SELECT clause with GROUP BY.

Mat Velloso # 1st June 2018, 2:35 pm

The synthetic voice of synthetic intelligence should sound synthetic. Successful spoofing of any kind destroys trust. When trust is gone, what remains becomes vicious fast.

Stewart Brand # 10th May 2018, 4:56 am


Originally, however, speech recognition was going to lead to artificial intelligence. Computing pioneer Alan Turing suggested in 1950 that we “provide the machine with the best sense organs that money can buy, and then teach it to understand and speak English.” Over half a century later, artificial intelligence has become prerequisite to understanding speech. We have neither the chicken nor the egg.

Robert Fortner # 4th May 2010, 12:35 pm