Simon Willison’s Weblog

Subscribe

538 items tagged “ai”

2022

karpathy/minGPT (via) A “minimal PyTorch re-implementation” of the OpenAI GPT training and inference model, by Andrej Karpathy. It’s only a few hundred lines of code and includes extensive comments, plus notebook demos. # 6th September 2022, 2:52 pm

Feeding AI systems on the world’s beauty, ugliness, and cruelty, but expecting it to reflect only the beauty is a fantasy

Ruha Benjamin # 5th September 2022, 9:42 pm

r/MachineLearning: What is the SOTA explanation for why deep learning works? The thing I find fascinating about this Reddit conversation is that it makes it clear that the machine learning research community has very little agreement on WHY the state of the art techniques that are being used today actually work as well as they do. # 5th September 2022, 5:46 pm

Exploring the training data behind Stable Diffusion

Two weeks ago, the Stable Diffusion image generation model was released to the public. I wrote about this last week, in Stable Diffusion is a really big deal—a post which has since become one of the top ten results for “stable diffusion” on Google and shown up in all sorts of different places online.

[... 2897 words]

For these reasons, I don’t think I’ll be using Midjourney or any similar tool to illustrate my newsletter going forward (an exception would be if I were writing about the technology at a later date and wanted to show examples). Even though the job wouldn’t go to a different, deserving, human artist, I think the optics are shitty, and I do worry about having any role in helping to set any kind of precedent in this direction.

Charlie Warzel # 4th September 2022, 9:06 pm

Run Stable Diffusion on your M1 Mac’s GPU. Ben Firshman provides detailed instructions for getting Stable Diffusion running on an M1 Mac. # 1st September 2022, 5:41 pm

Exploring 12 Million of the 2.3 Billion Images Used to Train Stable Diffusion’s Image Generator. Andy Baio and I collaborated on an investigation into the training set used for Stable Diffusion. I built a Datasette instance with 12m image records sourced from the LAION-Aesthetics v2 6+ aesthetic score data used as part of the training process, and built a tool so people could run searches and explore the data. Andy did some extensive analysis of things like the domains scraped for the images and names of celebrities and artists represented in the data. His write-up here explains our project in detail and some of the patterns we’ve uncovered so far. # 31st August 2022, 2:10 am

Stable Diffusion is a really big deal

If you haven’t been paying attention to what’s going on with Stable Diffusion, you really should be.

[... 1443 words]

To make the analogy explicit, in Software 1.0, human-engineered source code (e.g. some .cpp files) is compiled into a binary that does useful work. In Software 2.0 most often the source code comprises 1) the dataset that defines the desirable behavior and 2) the neural net architecture that gives the rough skeleton of the code, but with many details (the weights) to be filled in. The process of training the neural network compiles the dataset into the binary — the final neural network. In most practical applications today, the neural net architectures and the training systems are increasingly standardized into a commodity, so most of the active “software development” takes the form of curating, growing, massaging and cleaning labeled datasets.

Andrej Karpathy # 24th August 2022, 9:28 pm

The DALL·E 2 Prompt Book (via) This is effectively DALL-E: The Missing Manual: an 81 page PDF book that goes into exhaustive detail about how to get the most out of DALL-E through creative prompt design. # 14th July 2022, 11:26 pm

Using GPT-3 to explain how code works

One of my favourite uses for the GPT-3 AI language model is generating explanations of how code works. It’s shockingly effective at this: its training set clearly include a vast amount of source code.

[... 1983 words]

First impressions of DALL-E, generating images from text

I made it off the DALL-E waiting list a few days ago and I’ve been having an enormous amount of fun experimenting with it. Here are some notes on what I’ve learned so far (and a bunch of example images too).

[... 2102 words]

How Imagen Actually Works. Imagen is Google’s new text-to-image model, similar to (but possibly even more effective than) DALL-E. This article is the clearest explanation I’ve seen of how Imagen works: it uses Google’s existing T5 text encoder to convert the input sentence into an encoding that captures the semantic meaning of the sentence (including things like items being described as being on top of other items), then uses a trained diffusion model to generate a 64x64 image. That image is passed through two super-res models to increase the resolution to the final 1024x1024 output. # 23rd June 2022, 6:05 pm

How to use the GPT-3 language model

I ran a Twitter poll the other day asking if people had tried GPT-3 and why or why not. The winning option, by quite a long way, was “No, I don’t know how to”. So here’s how to try it out, for free, without needing to write any code.

[... 838 words]

A Datasette tutorial written by GPT-3

I’ve been playing around with OpenAI’s GPT-3 language model playground for a few months now. It’s a fascinating piece of software. You can sign up here—apparently there’s no longer a waiting list.

[... 1244 words]

2021

The art of asking nicely (via) CLIP+VQGAN Is a GAN that generates images based on some text input—you can run it on Google Collab notebooks, there are instructions linked at the bottom of this post. Janelle Shane of AI Weirdness explores tricks for getting the best results out of it for “a herd of sheep grazing on a lush green hillside”—various modifiers like “amazing awesome and epic” produce better images, but the one with the biggest impact, quite upsettingly, is “ultra high definition free desktop wallpaper”. # 2nd July 2021, 3:02 pm

DALL·E: Creating Images from Text (via) “DALL·E is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text–image pairs.”. The examples in this paper are astonishing—“an illustration of a baby daikon radish in a tutu walking a dog” generates exactly that. # 5th January 2021, 8:31 pm

2020

How GPT3 Works—Visualizations and Animations. Nice essay full of custom animations illustrating how GPT-3 actually works. # 30th July 2020, 12:58 am

Tempering Expectations for GPT-3 and OpenAI’s API. Insightful commentary on GPT-3 (which is producing some ridiculously cool demos at the moment thanks to the invite-only OpenAI API) from Max Woolf. # 18th July 2020, 7:29 pm

When data is messy. I love this story: a neural network trained on images was asked what the most significant pixels in pictures of tench (a kind of fish) were: it returned pictures of fingers on a green background, because most of the tench photos it had seen were fisherfolk showing off their catch. # 7th July 2020, 7:03 pm

2019

I have sometimes wondered how I would fare with a problem where the solution really isn’t in sight. I decided that I should give it a try before I get too old.

I’m going to work on artificial general intelligence (AGI).

I think it is possible, enormously valuable, and that I have a non-negligible chance of making a difference there, so by a Pascal’s Mugging sort of logic, I should be working on it.

John Carmack # 14th November 2019, 1:18 am

2018

Without deep understanding of the basic tools needed to build and train new algorithms, he says, researchers creating AIs resort to hearsay, like medieval alchemists. “People gravitate around cargo-cult practices,” relying on “folklore and magic spells,” adds François Chollet, a computer scientist at Google in Mountain View, California.

Matthew Hutson # 26th December 2018, 4:53 pm

Relational databases are a commodity now, but they power a much larger fraction of the world’s economy that AI ever will. And no company has a “relational database strategy”.

Erik Bernhardsson # 8th October 2018, 12:20 pm

Text to Image (via) Ridiculously entertaining demo by Cris Valenzuela that feeds any text you type to a neural network that then attempts to generate an image for your text. # 18th August 2018, 5:33 pm

Half of the time when companies say they need “AI” what they really need is a SELECT clause with GROUP BY.

Mat Velloso # 1st June 2018, 2:35 pm

The synthetic voice of synthetic intelligence should sound synthetic. Successful spoofing of any kind destroys trust. When trust is gone, what remains becomes vicious fast.

Stewart Brand # 10th May 2018, 4:56 am

Text Embedding Models Contain Bias. Here’s Why That Matters (via) Excellent discussion from the Google AI team of the enormous challenge of building machine learning models without accidentally encoding harmful bias in a way that cannot be easily detected. # 17th April 2018, 8:54 pm

2010

Originally, however, speech recognition was going to lead to artificial intelligence. Computing pioneer Alan Turing suggested in 1950 that we “provide the machine with the best sense organs that money can buy, and then teach it to understand and speak English.” Over half a century later, artificial intelligence has become prerequisite to understanding speech. We have neither the chicken nor the egg.

Robert Fortner # 4th May 2010, 12:35 pm