Simon Willison’s Weblog

50 items tagged “machinelearning”


Is the AI spell-casting metaphor harmful or helpful?

For a few weeks now I’ve been promoting spell-casting as a metaphor for prompt design against generative AI systems such as GPT-3 and Stable Diffusion.

[... 988 words]

konstantint/SKompiler (via) A tool for compiling trained SKLearn models into other representations —including SQL queries and Excel formulas. I’ve been pondering the most light-weight way to package a simple machine learning model as part of a larger application without needing to bundle heavy dependencies, this set of techniques looks ideal! # 2nd October 2022, 11:56 pm

Exploring 10m scraped Shutterstock videos used to train Meta’s Make-A-Video text-to-video model

Make-A-Video is a new “state-of-the-art AI system that generates videos from text” from Meta AI. It looks incredible—it really is DALL-E / Stable Diffusion for video. And it appears to have been trained on 10m video preview clips scraped from Shutterstock.

[... 923 words]

Running training jobs across multiple nodes scales really well. A common assumption is that scale inevitably means slowdowns: more GPUs means more synchronization overhead, especially with multiple nodes communicating across a network. But we observed that the performance penalty isn’t as harsh as what you might think. Instead, we found near-linear strong scaling: fixing the global batch size and training on more GPUs led to proportional increases in training throughput. On a 1.3B parameter model, 4 nodes means a 3.9x gain over one node. On 16 nodes, it’s 14.4x. This is largely thanks to the super fast interconnects that major cloud providers have built in: @awscloud EC2 P4d instances provide 400 Gbps networking bandwidth, @Azure provides 1600 Gbps, and @OraclePaaS provides 800 Gbps.

Linden Li # 24th September 2022, 4:03 pm

I Resurrected “Ugly Sonic” with Stable Diffusion Textual Inversion (via) “I trained an Ugly Sonic object concept on 5 image crops from the movie trailer, with 6,000 steps [...] (on a T4 GPU, this took about 1.5 hours and cost about $0.21 on a GCP Spot instance)” # 20th September 2022, 3:35 am

An introduction to XGBoost regression. I hadn’t realized what a wealth of high quality tutorial material could be found in Kaggle notebooks. Here Carl McBride Ellis provides a very approachable and practical introduction to XGBoost, one of the leading techniques for building machine learning models against tabular data. # 18th September 2022, 1:42 pm

In a previous iteration of the machine learning paradigm, researchers were obsessed with cleaning their datasets and ensuring that every data point seen by their models is pristine, gold-standard, and does not disturb the fragile learning process of billions of parameters finding their home in model space. Many began to realize that data scale trumps most other priorities in the deep learning world; utilizing general methods that allow models to scale in tandem with the complexity of the data is a superior approach. Now, in the era of LLMs, researchers tend to dump whole mountains of barely filtered, mostly unedited scrapes of the internet into the eager maw of a hungry model.

roon # 12th September 2022, 4:57 pm

karpathy/minGPT (via) A “minimal PyTorch re-implementation” of the OpenAI GPT training and inference model, by Andrej Karpathy. It’s only a few hundred lines of code and includes extensive comments, plus notebook demos. # 6th September 2022, 2:52 pm

r/MachineLearning: What is the SOTA explanation for why deep learning works? The thing I find fascinating about this Reddit conversation is that it makes it clear that the machine learning research community has very little agreement on WHY the state of the art techniques that are being used today actually work as well as they do. # 5th September 2022, 5:46 pm

Run Stable Diffusion on your M1 Mac’s GPU. Ben Firshman provides detailed instructions for getting Stable Diffusion running on an M1 Mac. # 1st September 2022, 5:41 pm

Exploring 12 Million of the 2.3 Billion Images Used to Train Stable Diffusion’s Image Generator. Andy Baio and I collaborated on an investigation into the training set used for Stable Diffusion. I built a Datasette instance with 12m image records sourced from the LAION-Aesthetics v2 6+ aesthetic score data used as part of the training process, and built a tool so people could run searches and explore the data. Andy did some extensive analysis of things like the domains scraped for the images and names of celebrities and artists represented in the data. His write-up here explains our project in detail and some of the patterns we’ve uncovered so far. # 31st August 2022, 2:10 am

Stable Diffusion is a really big deal

If you haven’t been paying attention to what’s going on with Stable Diffusion, you really should be.

[... 1443 words]

To make the analogy explicit, in Software 1.0, human-engineered source code (e.g. some .cpp files) is compiled into a binary that does useful work. In Software 2.0 most often the source code comprises 1) the dataset that defines the desirable behavior and 2) the neural net architecture that gives the rough skeleton of the code, but with many details (the weights) to be filled in. The process of training the neural network compiles the dataset into the binary — the final neural network. In most practical applications today, the neural net architectures and the training systems are increasingly standardized into a commodity, so most of the active “software development” takes the form of curating, growing, massaging and cleaning labeled datasets.

Andrej Karpathy # 24th August 2022, 9:28 pm

Stable Diffusion Public Release (via) New AI just dropped. Stable Diffusion is similar to DALL-E, but completely open source and with a CC0 license applied to everything it generates. I have a Twitter thread (the via) link of comparisons I’ve made between its output and my previous DALL-E experiments. The announcement buries the lede somewhat: to try it out, visit—which you can use for free at the moment, but it’s unclear to me how billing is supposed to work. # 22nd August 2022, 7:12 pm

storysniffer (via) Ben Welsh built a small Python library that guesses if a URL points to an article on a news website, or if it’s more likely to be a category page or /about page or similar. I really like this as an example of what you can do with a tiny machine learning model: the model is bundled as a ~3MB pickle file as part of the package, and the repository includes the Jupyter notebook that was used to train it. # 1st August 2022, 11:40 pm

First impressions of DALL-E, generating images from text

I made it off the DALL-E waiting list a few days ago and I’ve been having an enormous amount of fun experimenting with it. Here are some notes on what I’ve learned so far (and a bunch of example images too).

[... 2102 words]

How Imagen Actually Works. Imagen is Google’s new text-to-image model, similar to (but possibly even more effective than) DALL-E. This article is the clearest explanation I’ve seen of how Imagen works: it uses Google’s existing T5 text encoder to convert the input sentence into an encoding that captures the semantic meaning of the sentence (including things like items being described as being on top of other items), then uses a trained diffusion model to generate a 64x64 image. That image is passed through two super-res models to increase the resolution to the final 1024x1024 output. # 23rd June 2022, 6:05 pm

How to use the GPT-3 language model

I ran a Twitter poll the other day asking if people had tried GPT-3 and why or why not. The winning option, by quite a long way, was “No, I don’t know how to”. So here’s how to try it out, for free, without needing to write any code.

[... 838 words]


The art of asking nicely (via) CLIP+VQGAN Is a GAN that generates images based on some text input—you can run it on Google Collab notebooks, there are instructions linked at the bottom of this post. Janelle Shane of AI Weirdness explores tricks for getting the best results out of it for “a herd of sheep grazing on a lush green hillside”—various modifiers like “amazing awesome and epic” produce better images, but the one with the biggest impact, quite upsettingly, is “ultra high definition free desktop wallpaper”. # 2nd July 2021, 3:02 pm

DALL·E: Creating Images from Text (via) “DALL·E is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text–image pairs.”. The examples in this paper are astonishing—“an illustration of a baby daikon radish in a tutu walking a dog” generates exactly that. # 5th January 2021, 8:31 pm


How GPT3 Works—Visualizations and Animations. Nice essay full of custom animations illustrating how GPT-3 actually works. # 30th July 2020, 12:58 am

When I was curating my generated tweets, I estimated 30-40% of the tweets were usable comedically, a massive improvement over the 5-10% usability from my GPT-2 tweet generation. However, a 30-40% success rate implies a 60-70% failure rate, which is patently unsuitable for a production application.

Max Woolf # 18th July 2020, 7:33 pm

Tempering Expectations for GPT-3 and OpenAI’s API. Insightful commentary on GPT-3 (which is producing some ridiculously cool demos at the moment thanks to the invite-only OpenAI API) from Max Woolf. # 18th July 2020, 7:29 pm

When data is messy. I love this story: a neural network trained on images was asked what the most significant pixels in pictures of tench (a kind of fish) were: it returned pictures of fingers on a green background, because most of the tench photos it had seen were fisherfolk showing off their catch. # 7th July 2020, 7:03 pm

Data Science is a lot like Harry Potter, except there’s no magic, it’s just math, and instead of a sorting hat you just sort the data with a Python script.

GPT-3, shepherded by Max Woolf # 29th June 2020, 4:45 am

If you have to choose between engineering and ML, choose engineering. It’s easier for great engineers to pick up ML knowledge, but it’s a lot harder for ML experts to become great engineers.

Chip Huyen # 24th June 2020, 5:24 am

Deep learning isn’t hard anymore. This article does a great job of explaining how transfer learning is unlocking a new wave of innovation around deep learning. Previously if you wanted to train a model you needed vast amounts if data and thousands of dollars of compute time. Thanks to transfer learning you can now take an existing model (such as GPT2) and train something useful on top of it that’s specific to a new domain in just minutes it hours, with only a few hundred or a few thousand new labeled samples. # 7th February 2020, 8:47 am

A visual introduction to machine learning. Beautiful interactive essay explaining how a decision tree machine learning module is constructed, and using that to illustrate the concept of overfitting. This is one of the best explanations of machine learning fundamentals I’ve seen anywhere. # 10th January 2020, 5:12 am


Machine Learning on Mobile and at the Edge: 2019 industry year-in-review (via) This is a fantastic detailed overview of advances made in the field of machine learning on the edge (primarily on mobile devices) over 2019. I’m really excited about this trend: I love the improved privacy implications of running models on my phone without uploading data to a server, and it’s great to see techniques like Federated Learning (from Google Labs) which enable devices to privately train models in a distributed way without having to upload their training data. # 30th December 2019, 10:17 pm

We don’t like limits on discrimination and lending, so we’re gonna use machine learning, which is a form of money laundering for bias, a way to blame mathematical algorithms for desires to simply avoid rules that everybody else has to play by in this industry.

Maciej Ceglowski # 8th May 2019, 11:11 pm