<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: machine-learning</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/machine-learning.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2025-06-18T14:11:33+00:00</updated><author><name>Simon Willison</name></author><entry><title>I counted all of the yurts in Mongolia using machine learning</title><link href="https://simonwillison.net/2025/Jun/18/yurts-in-mongolia/#atom-tag" rel="alternate"/><published>2025-06-18T14:11:33+00:00</published><updated>2025-06-18T14:11:33+00:00</updated><id>https://simonwillison.net/2025/Jun/18/yurts-in-mongolia/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://monroeclinton.com/counting-all-yurts-in-mongolia/"&gt;I counted all of the yurts in Mongolia using machine learning&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Fascinating, detailed account by Monroe Clinton of a geospatial machine learning project. Monroe wanted to count visible yurts in Mongolia using Google Maps satellite view. The resulting project incorporates &lt;a href="https://github.com/mapbox/mercantile"&gt;mercantile&lt;/a&gt; for tile calculations, &lt;a href="https://labelstud.io/"&gt;Label Studio&lt;/a&gt; for help label the first 10,000 examples, a model trained on top of &lt;a href="https://github.com/ultralytics/ultralytics"&gt;YOLO11&lt;/a&gt; and a bunch of clever custom Python code to co-ordinate a brute force search across 120 CPU workers running the model.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=44307629"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/geospatial"&gt;geospatial&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;&lt;/p&gt;



</summary><category term="geospatial"/><category term="machine-learning"/><category term="python"/><category term="ai"/></entry><entry><title>Career Update: Google DeepMind -&gt; Anthropic</title><link href="https://simonwillison.net/2025/Mar/5/google-deepmind-anthropic/#atom-tag" rel="alternate"/><published>2025-03-05T22:24:02+00:00</published><updated>2025-03-05T22:24:02+00:00</updated><id>https://simonwillison.net/2025/Mar/5/google-deepmind-anthropic/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://nicholas.carlini.com/writing/2025/career-update.html"&gt;Career Update: Google DeepMind -&amp;gt; Anthropic&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Nicholas Carlini (&lt;a href="https://simonwillison.net/tags/nicholas-carlini/"&gt;previously&lt;/a&gt;) on joining Anthropic, driven partly by his frustration at friction he encountered publishing his research at Google DeepMind after their merge with Google Brain. His area of expertise is adversarial machine learning.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The recent advances in machine learning and language modeling are going to be transformative &lt;span style="font-size: 0.75em; line-height: 0; position: relative; vertical-align: baseline; top: -0.5em;"&gt;[&lt;a href="https://nicholas.carlini.com/writing/2025/career-update.html#footnote4"&gt;d&lt;/a&gt;]&lt;/span&gt; But in order to realize this potential future in a way that doesn't put everyone's safety and security at risk, we're going to need to make a &lt;em&gt;lot&lt;/em&gt; of progress---and soon. We need to make so much progress that no one organization will be able to figure everything out by themselves; we need to work together, we need to talk about what we're doing, and we need to start doing this now.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nicholas-carlini"&gt;nicholas-carlini&lt;/a&gt;&lt;/p&gt;



</summary><category term="google"/><category term="machine-learning"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="nicholas-carlini"/></entry><entry><title>Quoting Greg Brockman</title><link href="https://simonwillison.net/2025/Jan/16/greg-brockman/#atom-tag" rel="alternate"/><published>2025-01-16T22:38:57+00:00</published><updated>2025-01-16T22:38:57+00:00</updated><id>https://simonwillison.net/2025/Jan/16/greg-brockman/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/gdb/status/1622683988736479232"&gt;&lt;p&gt;Manual inspection of data has probably the highest value-to-prestige ratio of any activity in machine learning.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/gdb/status/1622683988736479232"&gt;Greg Brockman&lt;/a&gt;, OpenAI, Feb 2023&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="openai"/><category term="ai"/></entry><entry><title>Bridging Language Gaps in Multilingual Embeddings via Contrastive Learning</title><link href="https://simonwillison.net/2024/Oct/10/bridging-language-gaps-in-multilingual-embeddings-via-contrastiv/#atom-tag" rel="alternate"/><published>2024-10-10T16:00:35+00:00</published><updated>2024-10-10T16:00:35+00:00</updated><id>https://simonwillison.net/2024/Oct/10/bridging-language-gaps-in-multilingual-embeddings-via-contrastiv/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://jina.ai/news/bridging-language-gaps-in-multilingual-embeddings-via-contrastive-learning/"&gt;Bridging Language Gaps in Multilingual Embeddings via Contrastive Learning&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Most text embeddings models suffer from a "language gap", where phrases in different languages with the same semantic meaning end up with embedding vectors that aren't clustered together.&lt;/p&gt;
&lt;p&gt;Jina claim their new &lt;a href="https://jina.ai/news/jina-embeddings-v3-a-frontier-multilingual-embedding-model"&gt;jina-embeddings-v3&lt;/a&gt; (CC BY-NC 4.0, which means you need to license it for commercial use if you're not using &lt;a href="https://jina.ai/embeddings/"&gt;their API&lt;/a&gt;) is much better on this front, thanks to a training technique called "contrastive learning".&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;There are 30 languages represented in our contrastive learning dataset, but 97% of pairs and triplets are in just one language, with only 3% involving cross-language pairs or triplets. But this 3% is enough to produce a dramatic result: Embeddings show very little language clustering and semantically similar texts produce close embeddings regardless of their language&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;img alt="Scatter plot diagram, titled Desired Outcome: Clustering by Meaning. My dog is blue and Mein Hund ist blau are located near to each other, and so are Meine Katze ist rot and My cat is red" src="https://static.simonwillison.net/static/2024/jina-multi-language.png" /&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/JinaAI_/status/1844401388878762209"&gt;@JinaAI_&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/embeddings"&gt;embeddings&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jina"&gt;jina&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="ai"/><category term="embeddings"/><category term="jina"/></entry><entry><title>Quoting Nicholas Carlini</title><link href="https://simonwillison.net/2024/Sep/18/nicholas-carlini/#atom-tag" rel="alternate"/><published>2024-09-18T18:52:56+00:00</published><updated>2024-09-18T18:52:56+00:00</updated><id>https://simonwillison.net/2024/Sep/18/nicholas-carlini/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://www.youtube.com/watch?v=umfeF0Dx-r4"&gt;&lt;p&gt;The problem that you face is that it's relatively easy to take a model and make it look like it's aligned. You ask GPT-4, “how do I end all of humans?” And the model says, “I can't possibly help you with that”. But there are a million and one ways to take the exact same question - pick your favorite - and you can make the model still answer the question even though initially it would have refused. And the question this reminds me a lot of coming from adversarial machine learning. We have a very simple objective: Classify the image correctly according to the original label. And yet, despite the fact that it was essentially trivial to find all of the bugs in principle, the community had a very hard time coming up with actually effective defenses. We wrote like over 9,000 papers in ten years, and have made very very very limited progress on this one small problem. You all have a harder problem and maybe less time.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://www.youtube.com/watch?v=umfeF0Dx-r4"&gt;Nicholas Carlini&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jailbreaking"&gt;jailbreaking&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nicholas-carlini"&gt;nicholas-carlini&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="ai"/><category term="jailbreaking"/><category term="security"/><category term="nicholas-carlini"/></entry><entry><title>State-of-the-art music scanning by Soundslice</title><link href="https://simonwillison.net/2024/Jun/20/music-scanning-by-soundslice/#atom-tag" rel="alternate"/><published>2024-06-20T04:37:28+00:00</published><updated>2024-06-20T04:37:28+00:00</updated><id>https://simonwillison.net/2024/Jun/20/music-scanning-by-soundslice/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.soundslice.com/sheet-music-scanner/"&gt;State-of-the-art music scanning by Soundslice&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
It's been a while since I checked in on &lt;a href="https://www.soundslice.com/"&gt;Soundslice&lt;/a&gt;, Adrian Holovaty's beautiful web application focused on music education.&lt;/p&gt;
&lt;p&gt;The latest feature is spectacular. The Soundslice music editor - already one of the most impressive web applications I've ever experienced - can now import notation directly from scans or photos of sheet music.&lt;/p&gt;
&lt;p&gt;The attention to detail is immaculate. The custom machine learning model can handle a wide variety of notation details, and the system asks the user to verify or correct details that it couldn't perfectly determine using a neatly designed flow.&lt;/p&gt;
&lt;p&gt;Free accounts can scan two single page documents a month, and paid plans get a much higher allowance. I tried it out just now on a low resolution image I found on Wikipedia and it did a fantastic job, even allowing me to listen to a simulated piano rendition of the music once it had finished processing.&lt;/p&gt;
&lt;p&gt;It's worth spending some time with the &lt;a href="https://www.soundslice.com/blog/music-scanning/"&gt;release notes&lt;/a&gt; for the feature to appreciate how much work they've out into improving it since the initial release.&lt;/p&gt;
&lt;p&gt;If you're new to Soundslice, here's &lt;a href="https://www.soundslice.com/slices/RXTDc/course-preview-5904/"&gt;an example&lt;/a&gt; of their core player interface which syncs the display of music notation to an accompanying video.&lt;/p&gt;
&lt;p&gt;Adrian wrote up some &lt;a href="https://www.holovaty.com/writing/machine-learning-thoughts/"&gt;detailed notes&lt;/a&gt; on the machine learning behind the feature when they first launched it in beta back in November 2022.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;OMR [Optical Music Recognition] is an inherently hard problem, significantly more difficult than text OCR. For one, music symbols have complex spatial relationships, and mistakes have a tendency to cascade. A single misdetected key signature might result in &lt;em&gt;multiple&lt;/em&gt; incorrect note pitches. And there’s a wide diversity of symbols, each with its own behavior and semantics — meaning the problems and subproblems aren’t just hard, there are &lt;em&gt;many&lt;/em&gt; of them.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/adrian-holovaty"&gt;adrian-holovaty&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/music"&gt;music&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ocr"&gt;ocr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;&lt;/p&gt;



</summary><category term="adrian-holovaty"/><category term="machine-learning"/><category term="music"/><category term="ocr"/><category term="ai"/></entry><entry><title>Quoting Eric Lehman</title><link href="https://simonwillison.net/2024/Feb/11/eric-lehman/#atom-tag" rel="alternate"/><published>2024-02-11T22:59:38+00:00</published><updated>2024-02-11T22:59:38+00:00</updated><id>https://simonwillison.net/2024/Feb/11/eric-lehman/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://www.techemails.com/i/141315424/google-engineer-ai-is-a-serious-risk-to-our-business"&gt;&lt;p&gt;One consideration is that such a deep ML system could well be developed outside of Google-- at Microsoft, Baidu, Yandex, Amazon, Apple, or even a startup. My impression is that the Translate team experienced this. Deep ML reset the translation game; past advantages were sort of wiped out. Fortunately, Google's huge investment in deep ML largely paid off, and we excelled in this new game. Nevertheless, our new ML-based translator was still beaten on benchmarks by a small startup. The risk that Google could similarly be beaten in relevance by another company is highlighted by a startling conclusion from BERT: huge amounts of user feedback can be largely replaced by unsupervised learning from raw text. That could have heavy implications for Google.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://www.techemails.com/i/141315424/google-engineer-ai-is-a-serious-risk-to-our-business"&gt;Eric Lehman&lt;/a&gt;, internal Google email in 2018&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/translation"&gt;translation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/bert"&gt;bert&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="translation"/><category term="google"/><category term="generative-ai"/><category term="ai"/><category term="llms"/><category term="bert"/></entry><entry><title>Quoting Daniel Situnayake</title><link href="https://simonwillison.net/2024/Jan/16/daniel-situnayake/#atom-tag" rel="alternate"/><published>2024-01-16T18:49:03+00:00</published><updated>2024-01-16T18:49:03+00:00</updated><id>https://simonwillison.net/2024/Jan/16/daniel-situnayake/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://news.ycombinator.com/item?id=39016433"&gt;&lt;p&gt;You likely have a TinyML system in your pocket right now: every cellphone has a low power DSP chip running a deep learning model for keyword spotting, so you can say "Hey Google" or "Hey Siri" and have it wake up on-demand without draining your battery. It’s an increasingly pervasive technology. [...]&lt;/p&gt;
&lt;p&gt;It’s astonishing what is possible today: real time computer vision on microcontrollers, on-device speech transcription, denoising and upscaling of digital signals. Generative AI is happening, too, assuming you can find a way to squeeze your models down to size. We are an unsexy field compared to our hype-fueled neighbors, but the entire world is already filling up with this stuff and it’s only the very beginning. Edge AI is being rapidly deployed in a ton of fields: medical sensing, wearables, manufacturing, supply chain, health and safety, wildlife conservation, sports, energy, built environment—we see new applications every day.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://news.ycombinator.com/item?id=39016433"&gt;Daniel Situnayake&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tinyml"&gt;tinyml&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-energy-usage"&gt;ai-energy-usage&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="ai"/><category term="tinyml"/><category term="ai-energy-usage"/></entry><entry><title>Daniel Situnayake explains TinyML in a Hacker News comment</title><link href="https://simonwillison.net/2024/Jan/16/daniel-situnayake-explains-tinyml-in-a-hacker-news-comment/#atom-tag" rel="alternate"/><published>2024-01-16T18:46:02+00:00</published><updated>2024-01-16T18:46:02+00:00</updated><id>https://simonwillison.net/2024/Jan/16/daniel-situnayake-explains-tinyml-in-a-hacker-news-comment/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://news.ycombinator.com/item?id=39016433"&gt;Daniel Situnayake explains TinyML in a Hacker News comment&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Daniel worked on TensorFlow Lite at Google and co-wrote the TinyML O’Reilly book. He just posted a multi-paragraph comment on Hacker News explaining the term and describing some of the recent innovations in that space.&lt;/p&gt;

&lt;p&gt;“TinyML means running machine learning on low power embedded devices, like microcontrollers, with constrained compute and memory.”


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tinyml"&gt;tinyml&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="ai"/><category term="tinyml"/></entry><entry><title>Observable notebook: Detect objects in images</title><link href="https://simonwillison.net/2023/Oct/1/detect-objects-in-images/#atom-tag" rel="alternate"/><published>2023-10-01T15:46:14+00:00</published><updated>2023-10-01T15:46:14+00:00</updated><id>https://simonwillison.net/2023/Oct/1/detect-objects-in-images/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://observablehq.com/@simonw/detect-objects-in-images"&gt;Observable notebook: Detect objects in images&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I built an Observable notebook that uses &lt;a href="https://github.com/huggingface/transformers.js"&gt;Transformers.js&lt;/a&gt; and the &lt;a href="https://huggingface.co/Xenova/detr-resnet-50"&gt;Xenova/detra-resnet-50&lt;/a&gt; model to detect objects in images, entirely running within your browser. You can select an image using a file picker and it will show you that image with bounding boxes and labels drawn around items within it. I have a demo image showing some pelicans flying ahead, but it works with any image you give it - all without uploading that image to a server.&lt;/p&gt;
&lt;p&gt;&lt;img alt="A blue image with four birds flying, each is enclosed in a bounding box labelled bird with a %99.something accuracy label." src="https://static.simonwillison.net/static/2023/bird-detection.jpg" /&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/simonw/status/1708508183189410288"&gt;@simonw&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/transformers"&gt;transformers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/observable"&gt;observable&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/transformers-js"&gt;transformers-js&lt;/a&gt;&lt;/p&gt;



</summary><category term="javascript"/><category term="machine-learning"/><category term="transformers"/><category term="ai"/><category term="observable"/><category term="transformers-js"/></entry><entry><title>All models on Hugging Face, sorted by downloads</title><link href="https://simonwillison.net/2023/Sep/10/all-models-on-hugging-face/#atom-tag" rel="alternate"/><published>2023-09-10T17:24:42+00:00</published><updated>2023-09-10T17:24:42+00:00</updated><id>https://simonwillison.net/2023/Sep/10/all-models-on-hugging-face/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/models?sort=downloads"&gt;All models on Hugging Face, sorted by downloads&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I realized this morning that “sort by downloads” against the list of all of the models on Hugging Face can work as a reasonably good proxy for “which of these models are easiest to get running on your own computer”.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://fedi.simonwillison.net/@simon/111041868759156076"&gt;@simon&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="ai"/><category term="hugging-face"/></entry><entry><title>AI photo sorter</title><link href="https://simonwillison.net/2023/Apr/2/ai-photo-sorter/#atom-tag" rel="alternate"/><published>2023-04-02T04:27:22+00:00</published><updated>2023-04-02T04:27:22+00:00</updated><id>https://simonwillison.net/2023/Apr/2/ai-photo-sorter/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://ai-photo-sorter.vercel.app/"&gt;AI photo sorter&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Really interesting implementation of machine learning photo classification by Alexander Visheratin. This tool lets you select as many photos as you like from your own machine, then provides a web interface for classifying them into labels that you provide. It loads a 102MB quantized CLIP model and executes it in the browser using WebAssembly. Once classified, a “Generate script” button produces a copyable list of shell commands for moving your images into corresponding folders on your own machine. Your photos never get uploaded to a server—everything happens directly in your browser.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/visheratin/status/1642294595663847424"&gt;@visheratin&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/clip"&gt;clip&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="webassembly"/><category term="openai"/><category term="clip"/></entry><entry><title>Transformers.js</title><link href="https://simonwillison.net/2023/Mar/16/transformers-js/#atom-tag" rel="alternate"/><published>2023-03-16T23:41:55+00:00</published><updated>2023-03-16T23:41:55+00:00</updated><id>https://simonwillison.net/2023/Mar/16/transformers-js/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://xenova.github.io/transformers.js/"&gt;Transformers.js&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Hugging Face Transformers is a library of Transformer machine learning models plus a Python package for loading and running them. Transformers.js provides a JavaScript alternative interface which runs in your browser, thanks to a set of precompiled WebAssembly binaries for a selection of models. This interactive demo is incredible: in particular, try running the Image classification with google/vit-base-patch16-224 (91MB) model against any photo to get back labels representing that photo. Dropping one of these models onto a page is as easy as linking to a hosted CDN script and running a few lines of JavaScript.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/transformers"&gt;transformers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/transformers-js"&gt;transformers-js&lt;/a&gt;&lt;/p&gt;



</summary><category term="javascript"/><category term="machine-learning"/><category term="transformers"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="hugging-face"/><category term="transformers-js"/></entry><entry><title>Quoting Jeonghwan Kim</title><link href="https://simonwillison.net/2023/Mar/16/jeonghwan-kim/#atom-tag" rel="alternate"/><published>2023-03-16T05:39:58+00:00</published><updated>2023-03-16T05:39:58+00:00</updated><id>https://simonwillison.net/2023/Mar/16/jeonghwan-kim/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/masterjeongk/status/1635967360866877442"&gt;&lt;p&gt;As an NLP researcher I'm kind of worried about this field after 10-20 years. Feels like these oversized LLMs are going to eat up this field and I'm sitting in my chair thinking, "What's the point of my research when GPT-4 can do it better?"&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/masterjeongk/status/1635967360866877442"&gt;Jeonghwan Kim&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nlp"&gt;nlp&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-4"&gt;gpt-4&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="generative-ai"/><category term="nlp"/><category term="gpt-4"/><category term="ai"/><category term="llms"/></entry><entry><title>Online gradient descent written in SQL</title><link href="https://simonwillison.net/2023/Mar/7/online-gradient-descent-written-in-sql/#atom-tag" rel="alternate"/><published>2023-03-07T18:56:21+00:00</published><updated>2023-03-07T18:56:21+00:00</updated><id>https://simonwillison.net/2023/Mar/7/online-gradient-descent-written-in-sql/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://maxhalford.github.io/blog/ogd-in-sql/"&gt;Online gradient descent written in SQL&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Max Halford trains an online gradient descent model against two years of AAPL stock data using just a single advanced SQL query. He built this against DuckDB—I tried to replicate his query in SQLite and it almost worked, but it gave me a “recursive reference in a subquery” error that I was unable to resolve.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=35054786"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/duckdb"&gt;duckdb&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="sql"/><category term="sqlite"/><category term="ai"/><category term="duckdb"/></entry><entry><title>Quoting The GLM-130B License</title><link href="https://simonwillison.net/2023/Jan/10/the-glm-130b-license/#atom-tag" rel="alternate"/><published>2023-01-10T22:45:21+00:00</published><updated>2023-01-10T22:45:21+00:00</updated><id>https://simonwillison.net/2023/Jan/10/the-glm-130b-license/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://github.com/THUDM/GLM-130B/blob/main/MODEL_LICENSE"&gt;&lt;p&gt;You will not use the Software for any act that may undermine China's national security and national unity, harm the public interest of society, or infringe upon the rights and interests of human beings.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://github.com/THUDM/GLM-130B/blob/main/MODEL_LICENSE"&gt;The GLM-130B License&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/glm"&gt;glm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/licensing"&gt;licensing&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-in-china"/><category term="glm"/><category term="licensing"/></entry><entry><title>Quoting Jack Clark</title><link href="https://simonwillison.net/2022/Nov/16/jack-clark/#atom-tag" rel="alternate"/><published>2022-11-16T23:04:50+00:00</published><updated>2022-11-16T23:04:50+00:00</updated><id>https://simonwillison.net/2022/Nov/16/jack-clark/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://jack-clark.net/2022/11/14/import-ai-309-generative-bias-bloom-isnt-great-how-china-and-russia-use-ai/"&gt;&lt;p&gt;These kinds of biases aren’t so much a technical problem as a sociotechnical one; ML models try to approximate biases in their underlying datasets and, for some groups of people, some of these biases are offensive or harmful. That means in the coming years there will be endless political battles about what the ‘correct’ biases are for different models to display (or not display), and we can ultimately expect there to be as many approaches as there are distinct ideologies on the planet. I expect to move into a fractal ecosystem of models, and I expect model providers will ‘shapeshift’ a single model to display different biases depending on the market it is being deployed into. This will be extraordinarily messy.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://jack-clark.net/2022/11/14/import-ai-309-generative-bias-bloom-isnt-great-how-china-and-russia-use-ai/"&gt;Jack Clark&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jack-clark"&gt;jack-clark&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="ai"/><category term="jack-clark"/><category term="generative-ai"/><category term="llms"/></entry><entry><title>Semantic text search using embeddings</title><link href="https://simonwillison.net/2022/Nov/9/semantic-text-search-using-embeddings/#atom-tag" rel="alternate"/><published>2022-11-09T19:57:42+00:00</published><updated>2022-11-09T19:57:42+00:00</updated><id>https://simonwillison.net/2022/Nov/9/semantic-text-search-using-embeddings/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/openai/openai-cookbook/blob/main/examples/Semantic_text_search_using_embeddings.ipynb"&gt;Semantic text search using embeddings&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Example Python notebook from OpenAI demonstrating how to build a search engine using embeddings rather than straight up token matching. This is a fascinating way of implementing search, providing results that match the intent of the search (“delicious beans” for example) even if none of the keywords are actually present in the text.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search"&gt;search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/embeddings"&gt;embeddings&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="search"/><category term="openai"/><category term="embeddings"/></entry><entry><title>Is the AI spell-casting metaphor harmful or helpful?</title><link href="https://simonwillison.net/2022/Oct/5/spell-casting/#atom-tag" rel="alternate"/><published>2022-10-05T20:40:16+00:00</published><updated>2022-10-05T20:40:16+00:00</updated><id>https://simonwillison.net/2022/Oct/5/spell-casting/#atom-tag</id><summary type="html">
    &lt;p&gt;For a few weeks now I've been promoting spell-casting as a metaphor for prompt design against generative AI systems such as GPT-3 and Stable Diffusion.&lt;/p&gt;
&lt;p&gt;Here's an example, in &lt;a href="https://twitter.com/changelog/status/1575875939212238848"&gt;this snippet&lt;/a&gt; from my &lt;a href="https://changelog.com/podcast/506"&gt;recent Changelog podcast episode&lt;/a&gt;.&lt;/p&gt;

&lt;video 
poster="https://static.simonwillison.net/static/2022/changelog-wizards-poster.jpg"
controls="controls" style="max-width: 100%"
  width="640" height="360" preload="none"
&gt;
  &lt;source
    src="https://static.simonwillison.net/static/2022/changelog-spells.mp4"
    type="video/mp4"
   /&gt;
&lt;/video&gt;

&lt;p&gt;Relevant section towards the end (transcription assisted &lt;a href="https://replicate.com/p/wugv4xxsxfes7mlwr5c5nzt52y"&gt;by Whisper&lt;/a&gt;):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;When you're working with these, you're not a programmer anymore. You're a wizard, right? I always wanted to be a wizard. We get to be wizards now. And we're learning these spells. We don't know why they work. Why does Neuromancer work? Who knows? Nobody knows. But you add it to your spell book and then you combine it with other spells. And if you're unlucky and combine them in the wrong way, you might get demons coming out at you.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I had an interesting &lt;a href="https://twitter.com/simonw/status/1577647202930868224"&gt;debate on Twitter&lt;/a&gt; this  morning about whether or not this metaphor is harmful or helpful. There are some very interesting points to discuss!&lt;/p&gt;
&lt;p&gt;The short version: I'm now convinced that the value of this metaphor changes based on the audience.&lt;/p&gt;
&lt;p&gt;The key challenge here is to avoid implying that these systems are "magical" in that they are incomprehensible and mysterious. As such, I believe the metaphor is only appropriate when you're talking to people who are working with these systems from a firm technical perspective.&lt;/p&gt;
&lt;h4 id="spell-casting-metaphor"&gt;Expanding the spell-casting metaphor&lt;/h4&gt;
&lt;p&gt;When I compare prompts to spells and I'm talking to another software engineer, here's the message I am trying to convey:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Writing prompts is not like writing regular code. There is no API reference or programming language specification that will let you predict exactly what will happen.&lt;/p&gt;
&lt;p&gt;Instead, you have to experiment: try different fragments of prompts and see what works. As you get a feel for these fragments you can then start exploring what happens when you combine them together.&lt;/p&gt;
&lt;p&gt;Over time you will start to develop an intuition for what works. You'll build your own collection of fragments and patterns, and exchange those with other people.&lt;/p&gt;
&lt;p&gt;The weird thing about this process is that no-one can truly understand exactly how each fragment works - not even the creators of the models. We've learned that "Trending on artstation" produces better images with Stable Diffusion - but we can only ever develop a vague intuition for why.&lt;/p&gt;
&lt;p&gt;It honestly feels more like fictional spell-casting than programming. Each fragment is a new spell that you have learned and can add to your spell book.&lt;/p&gt;
&lt;p&gt;It's confusing, and surprising, and a great deal of fun.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;For me, this captures my experience working with prompts pretty accurately. My hope is that this is a useful way to tempt other programmers into exploring this fascinating new area.&lt;/p&gt;
&lt;p&gt;The other thing I like about this metaphor is that, to my mind, it touches on some of the risks of generative AI as well.&lt;/p&gt;
&lt;p&gt;Fiction is full of tales of magic gone wrong: of wizards who lost control of forces that they did not fully understand.&lt;/p&gt;
&lt;p&gt;When I think about &lt;a href="https://simonwillison.net/2022/Sep/12/prompt-injection/"&gt;prompt injection attacks&lt;/a&gt; I imagine good wizards and evil wizards casting spells and counter-spells at each other! Software vulnerabilities in plain English totally fit my mental model of casting spells.&lt;/p&gt;
&lt;p&gt;But in debating this on Twitter I realized that whether this metaphor makes sense to you relies pretty heavily on which specific magic system comes to mind for you.&lt;/p&gt;
&lt;p&gt;I was raised on Terry Pratchett's Discworld, which has a fantastically rich and deeply satirical magic system. Incorrect incantations frequently produce demons! Discworld wizards are mostly academics who spend more time thinking about lunch than practicing magic. The most interesting practitioners are &lt;a href="https://en.wikipedia.org/wiki/Witches_(Discworld)"&gt;the witches&lt;/a&gt;, for who the most useful magic is more like applied psychology ("headalogy" in the books.)&lt;/p&gt;
&lt;p&gt;If your mental model of "magic" is unexplained supernatural phenomenon and fairies granting wishes then my analogy doesn't really fit.&lt;/p&gt;
&lt;h4 id="harmful-metaphor"&gt;Magic as a harmful metaphor for AI&lt;/h4&gt;
&lt;p&gt;The argument for this metaphor causing harm is tied to the larger challenge of helping members of the public understand what is happening in this field.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.seattletimes.com/opinion/look-behind-the-curtain-dont-be-dazzled-by-claims-of-artificial-intelligence/"&gt;Look behind the curtain: Don’t be dazzled by claims of ‘artificial intelligence’&lt;/a&gt; by Emily M. Bender is a useful summary of some of these challenges.&lt;/p&gt;
&lt;p&gt;In &lt;a href="https://techcrunch.com/2015/07/25/technology-is-magic-just-ask-the-washington-post/"&gt;Technology Is Magic, Just Ask The Washington Post&lt;/a&gt; from 2015 Jon Evans makes the case that treating technology as "magic" runs a risk of people demanding solutions to societal problems that cannot be delivered.&lt;/p&gt;
&lt;p&gt;Understanding exactly what these systems are capable of and how they work is a hard enough for people with twenty years of software engineering experience, let alone everyone else.&lt;/p&gt;
&lt;p&gt;The last thing people need is to be told that these systems are "magic" - something that is permanently beyond their understanding and control.&lt;/p&gt;
&lt;p&gt;These systems are not magic. They're mathematics. It turns out that if you throw enough matrix multiplication and example data (literally terabytes of it) at a problem, you can get a system that can appear to do impossible things.&lt;/p&gt;
&lt;p&gt;But implying that they are magic - or even that they are "intelligent" - does not give people a useful mental model. GPT-3 is not a wizard, and it's not intelligent: it's a &lt;a href="https://dl.acm.org/doi/10.1145/3442188.3445922"&gt;stochastic parrot&lt;/a&gt;, capable of nothing more than predicting which word should come next to form a sentence that best matches the corpus it has been trained on.&lt;/p&gt;
&lt;p&gt;This matters to me a great deal. In conversations I have had around AI ethics the only universal answer I've found is that it is ethical to help people understand what these systems can do and how they work.&lt;/p&gt;
&lt;p&gt;So I plan to be more intentional with my metaphors. I'll continue to enthuse about spell-casting with fellow nerds who aren't at risk of assuming these systems are incomprehensible magic, but I'll keep searching for better ways to help explain these systems to everyone else.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ethics"&gt;ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-3"&gt;gpt-3&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-injection"&gt;prompt-injection&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/terry-pratchett"&gt;terry-pratchett&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ethics"/><category term="machine-learning"/><category term="ai"/><category term="gpt-3"/><category term="openai"/><category term="prompt-engineering"/><category term="prompt-injection"/><category term="generative-ai"/><category term="llms"/><category term="terry-pratchett"/><category term="ai-ethics"/></entry><entry><title>konstantint/SKompiler</title><link href="https://simonwillison.net/2022/Oct/2/skompiler/#atom-tag" rel="alternate"/><published>2022-10-02T23:56:54+00:00</published><updated>2022-10-02T23:56:54+00:00</updated><id>https://simonwillison.net/2022/Oct/2/skompiler/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/konstantint/SKompiler"&gt;konstantint/SKompiler&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
A tool for compiling trained SKLearn models into other representations —including SQL queries and Excel formulas. I’ve been pondering the most light-weight way to package a simple machine learning model as part of a larger application without needing to bundle heavy dependencies, this set of techniques looks ideal!

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/tsuname/status/1576721065622917121"&gt;@tsuname&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="sql"/></entry><entry><title>Exploring 10m scraped Shutterstock videos used to train Meta's Make-A-Video text-to-video model</title><link href="https://simonwillison.net/2022/Sep/29/webvid/#atom-tag" rel="alternate"/><published>2022-09-29T19:31:24+00:00</published><updated>2022-09-29T19:31:24+00:00</updated><id>https://simonwillison.net/2022/Sep/29/webvid/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;a href="https://makeavideo.studio/"&gt;Make-A-Video&lt;/a&gt; is a new "state-of-the-art AI system that generates videos from text" from Meta AI. It looks incredible - it really is DALL-E / Stable Diffusion for video. And it appears to have been trained on 10m video preview clips scraped from Shutterstock.&lt;/p&gt;
&lt;p&gt;I built a new search engine to explore those ten million clips:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://webvid.datasette.io/webvid/videos"&gt;https://webvid.datasette.io/webvid/videos&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/webvid-datasette.jpg" alt="A search for mars rocks returns 33 videos, each shown with a video player" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;This is similar to &lt;a href="https://simonwillison.net/2022/Sep/5/laion-aesthetics-weeknotes/"&gt;the system I built with Andy Baio&lt;/a&gt; a few weeks ago to explore the LAION data used to train Stable Diffusion.&lt;/p&gt;
&lt;h4&gt;Make-A-Video training data&lt;/h4&gt;
&lt;p&gt;Meta AI's &lt;a href="https://makeavideo.studio/Make-A-Video.pdf"&gt;paper describing the model&lt;/a&gt; includes this section about the training data:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Datasets.&lt;/strong&gt; To train the image models, we use a 2.3B subset of the dataset from (Schuhmann et al.) where the text is English. We filter out sample pairs with NSFW images 2, toxic words in the text, or images with a watermark probability larger than 0.5.&lt;/p&gt;
&lt;p&gt;We use WebVid-10M (Bain et al., 2021) and a 10M subset from HD-VILA-100M (Xue et al., 2022) 3 to train our video generation models. Note that only the videos (no aligned text) are used.&lt;/p&gt;
&lt;p&gt;The decoder Dt and the interpolation model is trained on WebVid-10M. SRt l is trained on both WebVid-10M and HD-VILA-10M. While prior work (Hong et al., 2022; Ho et al., 2022) have collected private text-video pairs for T2V generation, we use only public datasets (and no paired text for videos). We conduct automatic evaluation on UCF-101 (Soomro et al., 2012) and MSR-VTT (Xu et al., 2016) in a zero-shot setting.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That 2.3B subset of images is the same LAION data &lt;a href="https://simonwillison.net/2022/Sep/5/laion-aesthetics-weeknotes/"&gt;I explored previously&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/microsoft/XPretrain/tree/main/hd-vila-100m"&gt;HD-VILA-100M&lt;/a&gt; was collected by Microsoft Research Asia - &lt;a href="https://twitter.com/waxpancake/status/1575564723546103808"&gt;Andy Baio notes&lt;/a&gt; that these were scraped from YouTube.&lt;/p&gt;
&lt;p&gt;I decided to take a look at the &lt;a href="https://m-bain.github.io/webvid-dataset/"&gt;WebVid-10M&lt;/a&gt; data.&lt;/p&gt;
&lt;h4&gt;WebVid-10M&lt;/h4&gt;
&lt;p&gt;The &lt;a href="https://m-bain.github.io/webvid-dataset/"&gt;WebVid-10M site&lt;/a&gt; describes the data like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;WebVid-10M is a large-scale dataset of short videos with textual descriptions sourced from the web. The videos are diverse and rich in their content.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The &lt;a href="https://www.arxiv-vanity.com/papers/2104.00650/"&gt;accompanying paper&lt;/a&gt; provides a little bit more detail:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We scrape the web for a new dataset of videos with textual description annotations, called WebVid-2M. Our dataset consists of 2.5M video-text pairs, which is an order of magnitude larger than existing video captioning datasets (see Table 1).&lt;/p&gt;
&lt;p&gt;The data was scraped from the web following a similar procedure to Google Conceptual Captions [55] (CC3M). We note that more than 10% of CC3M images are in fact thumbnails from videos, which motivates us to use such video sources to scrape a total of 2.5M text-video pairs. The use of data collected for this study is authorised via the &lt;a href="https://www.gov.uk/guidance/exceptions-to-copyright/"&gt;Intellectual Property Office’s Exceptions to Copyright for Non-Commercial Research and Private Study&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I'm presuming that Web-10M is a larger version of the WebVid-2M dataset described in the paper.&lt;/p&gt;
&lt;p&gt;Most importantly though, the website includes a link to a 2.7GB CSV file - &lt;code&gt;results_10M_train.csv&lt;/code&gt; - containing the full WebVid-10M dataset. The CSV file looks like this:&lt;/p&gt;
&lt;pre lang="csv"&gt;&lt;code&gt;videoid,contentUrl,duration,page_dir,name
21179416,https://ak.picdn.net/shutterstock/videos/21179416/preview/stock-footage-aerial-shot-winter-forest.mp4,PT00H00M11S,006001_006050,Aerial shot winter forest
5629184,https://ak.picdn.net/shutterstock/videos/5629184/preview/stock-footage-senior-couple-looking-through-binoculars-on-sailboat-together-shot-on-red-epic-for-high-quality-k.mp4,PT00H00M29S,071501_071550,"Senior couple looking through binoculars on sailboat together. shot on red epic for high quality 4k, uhd, ultra hd resolution."
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I &lt;a href="https://til.simonwillison.net/sqlite/import-csv"&gt;loaded it into SQLite&lt;/a&gt; and started digging around.&lt;/p&gt;
&lt;h4&gt;It's all from Shutterstock!&lt;/h4&gt;
&lt;p&gt;The big surprise for me when I started exploring the data was this: every single one of the 10,727,582 videos linked in the Datasette started with the same URL prefix:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;https://ak.picdn.net/shutterstock/videos/&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;They're all from Shutterstock. The paper talks about "scraping the web", but it turns out there was only one scraped website involved.&lt;/p&gt;
&lt;p&gt;Here's that first row from the CSV file on Shutterstock itself:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.shutterstock.com/video/clip-21179416-aerial-shot-winter-forest"&gt;https://www.shutterstock.com/video/clip-21179416-aerial-shot-winter-forest&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As far as I can tell, the training set used here isn't even full Shutterstock videos: it's the free, watermarked preview clips that Shutterstock makes available.&lt;/p&gt;
&lt;p&gt;I guess Shutterstock have really high quality captions for their videos, perfect for training a model on.&lt;/p&gt;
&lt;h4&gt;Implementation notes&lt;/h4&gt;
&lt;p&gt;My &lt;a href="https://github.com/simonw/webvid-datasette"&gt;simonw/webvid-datasette&lt;/a&gt; repository contains the code I used to build the Datasette instance.&lt;/p&gt;
&lt;p&gt;I built a SQLite database with full-text search enabled using &lt;a href="https://sqlite-utils.datasette.io/"&gt;sqlite-utils&lt;/a&gt;. I deployed it directly to Fly by building a Docker image that bundled the 2.5G SQLite database, taking advantage of the &lt;a href="https://simonwillison.net/2021/Jul/28/baked-data/"&gt;Baked Data architectural pattern&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The most interesting custom piece of implementation is the plugin I wrote to add a video player to each result. Here's the &lt;a href="https://github.com/simonw/webvid-datasette/blob/8b8b56b9a8fa4913520fe11c11f45920f6737a86/plugins/render_cell.py"&gt;implementation of that plugin&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;datasette&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;hookimpl&lt;/span&gt;
&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;markupsafe&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-v"&gt;Markup&lt;/span&gt;

&lt;span class="pl-v"&gt;TEMPLATE&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;"""&lt;/span&gt;
&lt;span class="pl-s"&gt;&amp;lt;video controls width="400" preload="none" poster="{poster}"&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;  &amp;lt;source src="{url}" type="video/mp4"&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&amp;lt;/video&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&amp;lt;p&amp;gt;{filename}&amp;lt;br&amp;gt;On &amp;lt;a href="https://www.shutterstock.com/video/clip-{id}"&amp;gt;Shutterstock&amp;lt;/a&amp;gt;&amp;lt;/p&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;"""&lt;/span&gt;.&lt;span class="pl-en"&gt;strip&lt;/span&gt;()
&lt;span class="pl-v"&gt;VIDEO_URL&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;"https://ak.picdn.net/shutterstock/videos/{id}/preview/{filename}"&lt;/span&gt;
&lt;span class="pl-v"&gt;POSTER_URL&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;"https://ak.picdn.net/shutterstock/videos/{id}/thumb/1.jpg?ip=x480"&lt;/span&gt;


&lt;span class="pl-en"&gt;@&lt;span class="pl-s1"&gt;hookimpl&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;render_cell&lt;/span&gt;(&lt;span class="pl-s1"&gt;row&lt;/span&gt;, &lt;span class="pl-s1"&gt;column&lt;/span&gt;, &lt;span class="pl-s1"&gt;value&lt;/span&gt;):
    &lt;span class="pl-k"&gt;if&lt;/span&gt; &lt;span class="pl-s1"&gt;column&lt;/span&gt; &lt;span class="pl-c1"&gt;!=&lt;/span&gt; &lt;span class="pl-s"&gt;"filename"&lt;/span&gt;:
        &lt;span class="pl-k"&gt;return&lt;/span&gt;
    &lt;span class="pl-s1"&gt;id&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;row&lt;/span&gt;[&lt;span class="pl-s"&gt;"id"&lt;/span&gt;]
    &lt;span class="pl-s1"&gt;url&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-v"&gt;VIDEO_URL&lt;/span&gt;.&lt;span class="pl-en"&gt;format&lt;/span&gt;(&lt;span class="pl-s1"&gt;id&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s1"&gt;id&lt;/span&gt;, &lt;span class="pl-s1"&gt;filename&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s1"&gt;value&lt;/span&gt;)
    &lt;span class="pl-s1"&gt;poster&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-v"&gt;POSTER_URL&lt;/span&gt;.&lt;span class="pl-en"&gt;format&lt;/span&gt;(&lt;span class="pl-s1"&gt;id&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s1"&gt;id&lt;/span&gt;)
    &lt;span class="pl-k"&gt;return&lt;/span&gt; &lt;span class="pl-v"&gt;Markup&lt;/span&gt;(&lt;span class="pl-v"&gt;TEMPLATE&lt;/span&gt;.&lt;span class="pl-en"&gt;format&lt;/span&gt;(&lt;span class="pl-s1"&gt;url&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s1"&gt;url&lt;/span&gt;, &lt;span class="pl-s1"&gt;poster&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s1"&gt;poster&lt;/span&gt;, &lt;span class="pl-s1"&gt;filename&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s1"&gt;value&lt;/span&gt;, &lt;span class="pl-s1"&gt;id&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s1"&gt;id&lt;/span&gt;))&lt;/pre&gt;
&lt;p&gt;I'm using the new &lt;code&gt;render_cell(row)&lt;/code&gt; argument added &lt;a href="https://docs.datasette.io/en/stable/changelog.html#v0-62"&gt;in Datasette 0.62&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The plugin outputs a &lt;code&gt;&amp;lt;video&amp;gt;&lt;/code&gt; element with &lt;code&gt;preload="none"&lt;/code&gt; to avoid the browser downloading the video until the user clicks play (see &lt;a href="https://til.simonwillison.net/html/video-preload-none"&gt;this TIL&lt;/a&gt;). I set the &lt;code&gt;poster&lt;/code&gt; attribute to a thumbnail image from Shutterstock.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ethics"&gt;ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/facebook"&gt;facebook&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/training-data"&gt;training-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/text-to-video"&gt;text-to-video&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ethics"/><category term="facebook"/><category term="machine-learning"/><category term="projects"/><category term="ai"/><category term="datasette"/><category term="generative-ai"/><category term="training-data"/><category term="text-to-video"/><category term="ai-ethics"/></entry><entry><title>Quoting Linden Li</title><link href="https://simonwillison.net/2022/Sep/24/linden-li/#atom-tag" rel="alternate"/><published>2022-09-24T16:03:07+00:00</published><updated>2022-09-24T16:03:07+00:00</updated><id>https://simonwillison.net/2022/Sep/24/linden-li/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/lindensli/status/1558182270162255873"&gt;&lt;p&gt;Running training jobs across multiple nodes scales really well. A common assumption is that scale inevitably means slowdowns: more GPUs means more synchronization overhead, especially with multiple nodes communicating across a network. But we observed that the performance penalty isn’t as harsh as what you might think. Instead, we found near-linear strong scaling: fixing the global batch size and training on more GPUs led to proportional increases in training throughput. On a 1.3B parameter model, 4 nodes means a 3.9x gain over one node. On 16 nodes, it’s 14.4x. This is largely thanks to the super fast interconnects that major cloud providers have built in: @awscloud EC2 P4d instances provide 400 Gbps networking bandwidth, @Azure provides 1600 Gbps, and @OraclePaaS provides 800 Gbps.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/lindensli/status/1558182270162255873"&gt;Linden Li&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpus"&gt;gpus&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="ai"/><category term="gpus"/></entry><entry><title>I Resurrected "Ugly Sonic" with Stable Diffusion Textual Inversion</title><link href="https://simonwillison.net/2022/Sep/20/ugly-sonic/#atom-tag" rel="alternate"/><published>2022-09-20T03:35:28+00:00</published><updated>2022-09-20T03:35:28+00:00</updated><id>https://simonwillison.net/2022/Sep/20/ugly-sonic/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://minimaxir.com/2022/09/stable-diffusion-ugly-sonic/"&gt;I Resurrected &amp;quot;Ugly Sonic&amp;quot; with Stable Diffusion Textual Inversion&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
“I trained an Ugly Sonic object concept on 5 image crops from the movie trailer, with 6,000 steps [...] (on a T4 GPU, this took about 1.5 hours and cost about $0.21 on a GCP Spot instance)”

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/minimaxir/status/1571891826457710593"&gt;@minimaxir&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/max-woolf"&gt;max-woolf&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/stable-diffusion"&gt;stable-diffusion&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/text-to-image"&gt;text-to-image&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="ai"/><category term="max-woolf"/><category term="stable-diffusion"/><category term="generative-ai"/><category term="text-to-image"/></entry><entry><title>An introduction to XGBoost regression</title><link href="https://simonwillison.net/2022/Sep/18/an-introduction-to-xgboost-regression/#atom-tag" rel="alternate"/><published>2022-09-18T13:42:24+00:00</published><updated>2022-09-18T13:42:24+00:00</updated><id>https://simonwillison.net/2022/Sep/18/an-introduction-to-xgboost-regression/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.kaggle.com/code/carlmcbrideellis/an-introduction-to-xgboost-regression"&gt;An introduction to XGBoost regression&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I hadn’t realized what a wealth of high quality tutorial material could be found in Kaggle notebooks. Here Carl McBride Ellis provides a very approachable and practical introduction to XGBoost, one of the leading techniques for building machine learning models against tabular data.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="ai"/></entry><entry><title>Quoting roon</title><link href="https://simonwillison.net/2022/Sep/12/roon/#atom-tag" rel="alternate"/><published>2022-09-12T16:57:14+00:00</published><updated>2022-09-12T16:57:14+00:00</updated><id>https://simonwillison.net/2022/Sep/12/roon/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://scale.com/blog/text-universal-interface"&gt;&lt;p&gt;In a previous iteration of the machine learning paradigm, researchers were obsessed with cleaning their datasets and ensuring that every data point seen by their models is pristine, gold-standard, and does not disturb the fragile learning process of billions of parameters finding their home in model space. Many began to realize that data scale trumps most other priorities in the deep learning world; utilizing general methods that allow models to scale in tandem with the complexity of the data is a superior approach. Now, in the era of LLMs, researchers tend to dump whole mountains of barely filtered, mostly unedited scrapes of the internet into the eager maw of a hungry model.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://scale.com/blog/text-universal-interface"&gt;roon&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/></entry><entry><title>karpathy/minGPT</title><link href="https://simonwillison.net/2022/Sep/6/karpathymingpt/#atom-tag" rel="alternate"/><published>2022-09-06T14:52:32+00:00</published><updated>2022-09-06T14:52:32+00:00</updated><id>https://simonwillison.net/2022/Sep/6/karpathymingpt/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/karpathy/minGPT"&gt;karpathy/minGPT&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
A “minimal PyTorch re-implementation” of the OpenAI GPT training and inference model, by Andrej Karpathy. It’s only a few hundred lines of code and includes extensive comments, plus notebook demos.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=32736044"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-3"&gt;gpt-3&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/andrej-karpathy"&gt;andrej-karpathy&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="ai"/><category term="gpt-3"/><category term="andrej-karpathy"/><category term="generative-ai"/><category term="llms"/></entry><entry><title>r/MachineLearning: What is the SOTA explanation for why deep learning works?</title><link href="https://simonwillison.net/2022/Sep/5/deep-learning/#atom-tag" rel="alternate"/><published>2022-09-05T17:46:21+00:00</published><updated>2022-09-05T17:46:21+00:00</updated><id>https://simonwillison.net/2022/Sep/5/deep-learning/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.reddit.com/r/MachineLearning/comments/x5gnyw/d_what_is_the_sota_explanation_for_why_deep/"&gt;r/MachineLearning: What is the SOTA explanation for why deep learning works?&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The thing I find fascinating about this Reddit conversation is that it makes it clear that the machine learning research community has very little agreement on WHY the state of the art techniques that are being used today actually work as well as they do.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/reddit"&gt;reddit&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="reddit"/><category term="ai"/><category term="generative-ai"/></entry><entry><title>Run Stable Diffusion on your M1 Mac’s GPU</title><link href="https://simonwillison.net/2022/Sep/1/run-stable-diffusion-on-your-m1-macs-gpu/#atom-tag" rel="alternate"/><published>2022-09-01T17:41:35+00:00</published><updated>2022-09-01T17:41:35+00:00</updated><id>https://simonwillison.net/2022/Sep/1/run-stable-diffusion-on-your-m1-macs-gpu/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://replicate.com/blog/run-stable-diffusion-on-m1-mac"&gt;Run Stable Diffusion on your M1 Mac’s GPU&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Ben Firshman provides detailed instructions for getting Stable Diffusion running on an M1 Mac.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ben-firshman"&gt;ben-firshman&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/macos"&gt;macos&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/stable-diffusion"&gt;stable-diffusion&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/text-to-image"&gt;text-to-image&lt;/a&gt;&lt;/p&gt;



</summary><category term="ben-firshman"/><category term="machine-learning"/><category term="macos"/><category term="ai"/><category term="stable-diffusion"/><category term="generative-ai"/><category term="text-to-image"/></entry><entry><title>Exploring 12 Million of the 2.3 Billion Images Used to Train Stable Diffusion’s Image Generator</title><link href="https://simonwillison.net/2022/Aug/31/laion-aesthetics/#atom-tag" rel="alternate"/><published>2022-08-31T02:10:26+00:00</published><updated>2022-08-31T02:10:26+00:00</updated><id>https://simonwillison.net/2022/Aug/31/laion-aesthetics/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://waxy.org/2022/08/exploring-12-million-of-the-images-used-to-train-stable-diffusions-image-generator/"&gt;Exploring 12 Million of the 2.3 Billion Images Used to Train Stable Diffusion’s Image Generator&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Andy Baio and I collaborated on an investigation into the training set used for Stable Diffusion. I built a Datasette instance with 12m image records sourced from the LAION-Aesthetics v2 6+ aesthetic score data used as part of the training process, and built a tool so people could run searches and explore the data. Andy did some extensive analysis of things like the domains scraped for the images and names of celebrities and artists represented in the data. His write-up here explains our project in detail and some of the patterns we’ve uncovered so far.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/stable-diffusion"&gt;stable-diffusion&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/laion"&gt;laion&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/training-data"&gt;training-data&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="ai"/><category term="stable-diffusion"/><category term="generative-ai"/><category term="laion"/><category term="training-data"/></entry><entry><title>Stable Diffusion is a really big deal</title><link href="https://simonwillison.net/2022/Aug/29/stable-diffusion/#atom-tag" rel="alternate"/><published>2022-08-29T01:09:04+00:00</published><updated>2022-08-29T01:09:04+00:00</updated><id>https://simonwillison.net/2022/Aug/29/stable-diffusion/#atom-tag</id><summary type="html">
    &lt;p&gt;If you haven't been paying attention to what's going on with Stable Diffusion, you really should be.&lt;/p&gt;
&lt;p&gt;Stable Diffusion is a new "text-to-image diffusion model" that was &lt;a href="https://stability.ai/blog/stable-diffusion-public-release"&gt;released to the public&lt;/a&gt; by &lt;a href="https://stability.ai/"&gt;Stability.ai&lt;/a&gt; six days ago, on August 22nd.&lt;/p&gt;
&lt;p&gt;It's similar to models like Open AI's DALL-E, but with one crucial difference: they released the whole thing.&lt;/p&gt;
&lt;p&gt;You can try it out online at &lt;a href="https://beta.dreamstudio.ai/"&gt;beta.dreamstudio.ai&lt;/a&gt; (currently for free). Type in a text prompt and the model will generate an image.&lt;/p&gt;
&lt;p&gt;You can download and run the model on your own computer (if you have a powerful enough graphics card). Here's &lt;a href="https://www.reddit.com/r/StableDiffusion/comments/wuyu2u/how_do_i_run_stable_diffusion_and_sharing_faqs/"&gt;an FAQ&lt;/a&gt; on how to do that.&lt;/p&gt;
&lt;p&gt;You can use it for commercial and non-commercial purposes, under the terms of the &lt;a href="https://huggingface.co/spaces/CompVis/stable-diffusion-license"&gt;Creative ML OpenRAIL-M license&lt;/a&gt; - which lists some usage restrictions that include avoiding using it to break applicable laws, generate false information, discriminate against individuals or provide medical advice.&lt;/p&gt;
&lt;p&gt;In just a few days, there has been an &lt;strong&gt;explosion&lt;/strong&gt; of innovation around it. The things people are building are absolutely astonishing.&lt;/p&gt;
&lt;p&gt;I've been tracking the &lt;a href="https://www.reddit.com/r/StableDiffusion/"&gt;r/StableDiffusion&lt;/a&gt; subreddit and following Stability.ai founder &lt;a href="https://twitter.com/EMostaque"&gt;Emad Mostaque&lt;/a&gt; on Twitter.&lt;/p&gt;
&lt;h4&gt;img2img&lt;/h4&gt;
&lt;p&gt;Generating images from text is one thing, but generating images from other images is a whole new ballgame.&lt;/p&gt;
&lt;p&gt;My favourite example so far comes &lt;a href="https://www.reddit.com/r/StableDiffusion/comments/wzlmty/its_some_kind_of_black_magic_i_swear/"&gt;from Reddit user argaman123&lt;/a&gt;. They created this image:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/stable-diffusion-img2img.png" alt="A simple looking Microsoft Paint style image made of flat colours: a sky blue background, a rough yellow desert in the foreground, a semi-circle black line representing a half dome over five shapes in two shades of grey representing buildings inside the dome. A yellow circle represents the sun in the top right of the image, above the dome." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;And added this prompt (or "&lt;a href="https://www.reddit.com/r/StableDiffusion/comments/wzlmty/comment/im35vwf/?utm_source=reddit&amp;amp;utm_medium=web2x&amp;amp;context=3"&gt;something along those lines&lt;/a&gt;"):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A distant futuristic city full of tall buildings inside a huge transparent glass dome, In the middle of a barren desert full of large dunes, Sun rays, Artstation, Dark sky full of stars with a shiny sun, Massive scale, Fog, Highly detailed, Cinematic, Colorful&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The model produced the following two images:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/stable-diffusion-1.jpg" alt="A gorgeous image of a futuristic city under a glass domb, in front of a wind-swept desert. The composition matches the Microsoft Paint input, but everything is rendered in great detail." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/stable-diffusion-2.jpg" alt="A second image, similar to the first but this time the domb is composed of triangle sheets of glass. The composition remains the same." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;These are amazing. In my &lt;a href="https://simonwillison.net/2022/Jun/23/dall-e/"&gt;previous experiments with DALL-E&lt;/a&gt; I've tried to recreate photographs I have taken, but getting the exact composition I wanted has always proved impossible using just text. With this new capability I feel like I could get the AI to do pretty much exactly what I have in my mind.&lt;/p&gt;
&lt;p&gt;Imagine having an on-demand concept artist that can generate anything you can imagine, and can iterate with you towards your ideal result. For free (or at least for very-cheap).&lt;/p&gt;
&lt;p&gt;You can run this today on your own computer, if you can figure out how to set it up. You can &lt;a href="https://replicate.com/stability-ai/stable-diffusion"&gt;try it in your browser&lt;/a&gt; using Replicate, &lt;a href="https://huggingface.co/spaces/huggingface/diffuse-the-rest"&gt;or Hugging Face&lt;/a&gt;. This capability is apparently coming to the DreamStudio interface &lt;a href="https://twitter.com/EMostaque/status/1563632874091421697"&gt;next week&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There's so much more going on.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/hlky/stable-diffusion-webui"&gt;stable-diffusion-webui&lt;/a&gt; is an open source UI you can run on your own machine providing a powerful interface to the model. &lt;a href="https://twitter.com/altryne/status/1563452692399214594"&gt;Here's a Twitter thread&lt;/a&gt; showing what it can do.&lt;/p&gt;
&lt;p&gt;Reddit user alpacaAI shared &lt;a href="https://old.reddit.com/r/StableDiffusion/comments/wyduk1/show_rstablediffusion_integrating_sd_in_photoshop/"&gt;a video demo&lt;/a&gt; of a Photoshop plugin they are developing which has to be seen to be believed. They have a registration form up on &lt;a href="https://www.getalpaca.io/"&gt;getalpaca.io&lt;/a&gt; for people who want to try it out once it's ready.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/stable-diffusion-alpaca.jpg" alt="A screenshot of Photoshop - a complex image on multiple layers is shown in the background. The user has open a dialog where they have entered the prompt &amp;quot;a dog seating on a path going up in a hill&amp;quot; - with modifiers of &amp;quot;studio ghibli::3&amp;quot;, &amp;quot;highly detailed::1&amp;quot;, &amp;quot;mang anime::1&amp;quot;, &amp;quot;cel-shading::1&amp;quot; and &amp;quot;game characters::1&amp;quot;." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Reddit user Hoppss ran a 2D animated clip from Disney's Aladdin through &lt;code&gt;img2img&lt;/code&gt; frame-by frame, using the following parameters:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;--prompt "3D render" --strength 0.15 --seed 82345912 --n_samples 1 --ddim_steps 100 --n_iter 1 --scale 30.0 --skip_grid&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;The result was &lt;a href="https://www.reddit.com/r/StableDiffusion/comments/wys3w5/applying_img2img_to_video_3d_aladdin/"&gt;a 3D animated video&lt;/a&gt;. Not a great quality one, but pretty stunning for a shell script and a two word prompt!&lt;/p&gt;
&lt;p&gt;The best description I've seen so far of an iterative process to build up an image using Stable Diffusion comes from Andy Salerno: &lt;a href="https://andys.page/posts/how-to-draw/"&gt;4.2 Gigabytes, or: How to Draw Anything&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Ben Firshman has published detailed instructions on how to &lt;a href="https://replicate.com/blog/run-stable-diffusion-on-m1-mac"&gt;Run Stable Diffusion on your M1 Mac’s GPU&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;And there's so much more to come&lt;/h4&gt;
&lt;p&gt;All of this happened in just six days since the model release. Emad Mostaque &lt;a href="https://twitter.com/EMostaque/status/1564011883572633600"&gt;on Twitter&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We use as much compute as stable diffusion used every 36 hours for our upcoming open source models&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This made me think of &lt;a href="https://parti.research.google/"&gt;Google's Parti paper&lt;/a&gt;, which included a demonstration that showed that once the model was trained to 200bn parameters it could generate images with correctly spelled text!&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/stable-diffusion-parti.jpg" alt="Four images of a kangaroo holding a sign generated by Parti. In the 350M and 750M parameter images the text on the sign is garbage symbols. At 3B parameters it does at least look like words, but is still not correct. At 20B parametecs the sign reads &amp;quot;Welcome friends&amp;quot;." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4 id="ai-vegan"&gt;Ethics: will you be an AI vegan?&lt;/h4&gt;
&lt;p&gt;I'm finding the ethics of all of this extremely difficult.&lt;/p&gt;
&lt;p&gt;Stable Diffusion has been trained on millions of copyrighted images scraped from the web.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://github.com/CompVis/stable-diffusion/blob/69ae4b35e0a0f6ee1af8bb9a5d0016ccb27e36dc/Stable_Diffusion_v1_Model_Card.md"&gt;Stable Diffusion v1 Model Card&lt;/a&gt; has the full details, but the short version is that it uses &lt;a href="https://laion.ai/blog/laion-5b/"&gt;LAION-5B&lt;/a&gt; (5.85 billion image-text pairs) and its &lt;a href="https://laion.ai/blog/laion-aesthetics/"&gt;laion-aesthetics v2 5+&lt;/a&gt; subset (which I think is ~600M pairs filtered for aesthetics). These images were scraped from the web.&lt;/p&gt;
&lt;p&gt;I'm not qualified to speak to the legality of this. I'm personally more concerned with the morality.&lt;/p&gt;
&lt;p&gt;The final model is I believe around 4.2GB of data - a binary blob of floating point numbers. The fact that it can compress such an enormous quantity of visual information into such a small space is itself a fascinating detail.&lt;/p&gt;
&lt;p&gt;As such, each image in the training set contributes only a tiny amount of information - a few tweaks to some numeric weights spread across the entire network.&lt;/p&gt;
&lt;p&gt;But... the people who created these images did not give their consent. And the model can be seen as a direct threat to their livelihoods. No-one expected creative AIs to come for the artist jobs first, but here we are!&lt;/p&gt;
&lt;p&gt;I'm still thinking through this, and I'm eager to consume more commentary about it. But my current mental model is to think about this in terms of veganism, as an analogy for people making their own personal ethical decisions.&lt;/p&gt;
&lt;p&gt;I know many vegans. They have access to the same information as I do about the treatment of animals, and they have made informed decisions about their lifestyle, which I fully respect.&lt;/p&gt;
&lt;p&gt;I myself remain a meat-eater.&lt;/p&gt;
&lt;p&gt;There will be many people who will decide that the  AI models trained on copyrighted images are incompatible with their values. I understand and respect that decision.&lt;/p&gt;
&lt;p&gt;But when I look at that img2img example of the futuristic city in the dome, I can't resist imagining what I could do with that capability.&lt;/p&gt;
&lt;p&gt;If someone were to create &lt;a href="https://twitter.com/simonw/status/1563201333821288452"&gt;a vegan model&lt;/a&gt;, trained entirely on out-of-copyright images, I would be delighted to promote it and try it out. If its results were good enough, I might even switch to it entirely.&lt;/p&gt;
&lt;h4&gt;Understanding the training data&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: 30th August 2022. Andy Baio and I worked together on a deep dive into the training data behind Stable Diffusion. Andy wrote up some of our findings in &lt;a href="https://waxy.org/2022/08/exploring-12-million-of-the-images-used-to-train-stable-diffusions-image-generator/"&gt;Exploring 12 Million of the 2.3 Billion Images Used to Train Stable Diffusion’s Image Generator&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;Indistinguishable from magic&lt;/h4&gt;
&lt;p&gt;Just a few months ago, if I'd seen someone on a fictional TV show using an interface like that Photoshop plugin I'd have grumbled about how that was a step too far even by the standards of American network TV dramas.&lt;/p&gt;
&lt;p&gt;Science fiction is real now. Machine learning generative models are here, and the rate with which they are improving is unreal. It's worth paying real attention to what they can do and how they are developing.&lt;/p&gt;
&lt;p&gt;I'm tweeting about this stuff a lot these days. &lt;a href="https://twitter.com/simonw"&gt;Follow @simonw on Twitter&lt;/a&gt; for more.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ethics"&gt;ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/dalle"&gt;dalle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/stable-diffusion"&gt;stable-diffusion&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/laion"&gt;laion&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/text-to-image"&gt;text-to-image&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ethics"/><category term="machine-learning"/><category term="ai"/><category term="dalle"/><category term="stable-diffusion"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="laion"/><category term="text-to-image"/><category term="ai-ethics"/></entry></feed>