Simon Willison’s Weblog

Subscribe

675 items tagged “generative-ai”

2023

Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs (via) There’s a lot to absorb about this one. Mosaic trained this model from scratch on 1 trillion tokens, at a cost of $200,000 taking 9.5 days. It’s Apache-2.0 licensed and the model weights are available today.

They’re accompanying the base model with an instruction-tuned model called MPT-7B-Instruct (licensed for commercial use) and a non-commercially licensed MPT-7B-Chat trained using OpenAI data. They also announced MPT-7B-StoryWriter-65k+—“a model designed to read and write stories with super long context lengths”—with a previously unheard of 65,000 token context length.

They’re releasing these models mainly to demonstrate how inexpensive and powerful their custom model training service is. It’s a very convincing demo!

# 5th May 2023, 7:05 pm / open-source, generative-ai, ai, homebrew-llms, llms

No Moat: Closed AI gets its Open Source wakeup call — ft. Simon Willison (via) I joined the Latent Space podcast yesterday (on short notice, so I was out and about on my phone) to talk about the leaked Google memo about open source LLMs. This was a Twitter Space, but swyx did an excellent job of cleaning up the audio and turning it into a podcast.

# 5th May 2023, 6:17 pm / homebrew-llms, generative-ai, ai, speaking, llms, podcasts

Leaked Google document: “We Have No Moat, And Neither Does OpenAI”

Visit Leaked Google document: "We Have No Moat, And Neither Does OpenAI"

SemiAnalysis published something of a bombshell leaked document this morning: Google “We Have No Moat, And Neither Does OpenAI”.

[... 1,073 words]

Midjourney 5.1

Visit Midjourney 5.1

Midjourney released version 5.1 of their image generation model on Tuesday. Here’s their announcement on Twitter—if you have a Discord account there’s a more detailed Discord announcement here.

[... 396 words]

At this point the lawsuits seem a bit far-fetched: “You should have warned us months ago that artificial intelligence would hurt your business” is unfair given how quickly ChatGPT has exploded from nowhere to become a cultural and business phenomenon. But now everyone is on notice! If you are not warning your shareholders now about how AI could hurt your business, and then it does hurt your business, you’re gonna get sued.

Matt Levine

# 3rd May 2023, 9:04 pm / chatgpt, ai, generative-ai, matt-levine

OpenLLaMA. The first openly licensed model I’ve seen trained on the RedPajama dataset. This initial release is a 7B model trained on 200 billion tokens, but the team behind it are promising a full 1 trillion token model in the near future. I haven’t found a live demo of this one running anywhere yet.

# 3rd May 2023, 8:58 pm / generative-ai, llama, ai, homebrew-llms, llms, redpajama

replit-code-v1-3b (via) As promised last week, Replit have released their 2.7b “Causal Language Model”, a foundation model trained from scratch in partnership with MosaicML with a focus on code completion. It’s licensed CC BY-SA-4.0 and is available for commercial use. They repo includes a live demo and initial experiments with it look good—you could absolutely run a local GitHub Copilot style editor on top of this model.

# 3rd May 2023, 8:09 pm / llms, ai, homebrew-llms, generative-ai

We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. [...] We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time.

SparseGPT, by Elias Frantar and Dan Alistarh

# 3rd May 2023, 7:48 pm / llms, ai, generative-ai, bloom, homebrew-llms

Prompt injection explained, with video, slides, and a transcript

Visit Prompt injection explained, with video, slides, and a transcript

I participated in a webinar this morning about prompt injection, organized by LangChain and hosted by Harrison Chase, with Willem Pienaar, Kojin Oshiba (Robust Intelligence), and Jonathan Cohen and Christopher Parisien (Nvidia Research).

[... 3,120 words]

Amnesty Uses Warped, AI-Generated Images to Portray Police Brutality in Colombia. I saw massive backlash against Amnesty Norway for this on Twitter, where people argued that using AI-generated images to portray human rights violations like this undermines Amnesty’s credibility. I agree: I think this is a very risky move. An Amnesty spokesperson told VICE Motherboard that they did this to provide coverage “without endangering anyone who was present”, since many protestors who participated in the national strike covered their faces to avoid being identified.

# 1st May 2023, 9:32 pm / ai, ethics, generative-ai

Let’s be bear or bunny

Visit Let's be bear or bunny

The Machine Learning Compilation group (MLC) are my favourite team of AI researchers at the moment.

[... 599 words]

Enriching data with GPT3.5 and SQLite SQL functions

Visit Enriching data with GPT3.5 and SQLite SQL functions

I shipped openai-to-sqlite 0.3 yesterday with a fun new feature: you can now use the command-line tool to enrich data in a SQLite database by running values through an OpenAI model and saving the results, all in a single SQL query.

[... 1,219 words]

MLC LLM (via) From MLC, the team that gave us Web LLM and Web Stable Diffusion. “MLC LLM is a universal solution that allows any language model to be deployed natively on a diverse set of hardware backends and native applications”. I installed their iPhone demo from TestFlight this morning and it does indeed provide an offline LLM that runs on my phone. It’s reasonably capable—the underlying model for the app is vicuna-v1-7b, a LLaMA derivative.

# 29th April 2023, 5:43 pm / iphone, generative-ai, llama, ai, homebrew-llms, llms, mlc

IF by DeepFloyd Lab (via) New image generation AI model, financially backed by StabilityAI but based on the Google Imagen paper. Claims to be much better at following complex prompts, including being able to generate text! I tried the Colab notebook with “a photograph of raccoon in the woods holding a sign that says ’I will eat your trash’” and it didn’t quite get the text right, see via link for the result.

# 28th April 2023, 7:34 pm / stable-diffusion, ai, generative-ai

How prompt injection attacks hijack today’s top-end AI – and it’s really tough to fix. Thomas Claburn interviewed me about prompt injection for the Register. Lots of direct quotes from our phone call in here—we went pretty deep into why it’s such a difficult problem to address.

# 26th April 2023, 6:04 pm / interviews, prompt-engineering, prompt-injection, security, llms, ai, generative-ai

The Dual LLM pattern for building AI assistants that can resist prompt injection

I really want an AI assistant: a Large Language Model powered chatbot that can answer questions and perform actions for me based on access to my private data and tools.

[... 2,547 words]

A lot of people who claim to be doing prompt engineering today are actually just blind prompting. "Blind Prompting" is a term I am using to describe the method of creating prompts with a crude trial-and-error approach paired with minimal or no testing and a very surface level knowedge of prompting. Blind prompting is not prompt engineering. [...] In this blog post, I will make the argument that prompt engineering is a real skill that can be developed based on real experimental methodologies.

Mitchell Hashimoto

# 23rd April 2023, 4:08 am / prompt-engineering, llms, ai, generative-ai

Other tech-friendly journalists I know have been going through something similar: Suddenly, we’ve got something like a jetpack to strap to our work. Sure, the jetpack is kinda buggy. Yes, sometimes it crashes and burns. And the rules for its use aren’t clear, so you’ve got to be super careful with it. But sometimes it soars, shrinking tasks that would have taken hours down to mere minutes, sometimes minutes to seconds.

Farhad Manjoo

# 21st April 2023, 8:41 pm / chatgpt, journalism, ai, generative-ai

The AI Writing thing is just pivot to video all over again, a bunch of dead-eyed corporate types willing to listen to any snake oil salesman who offers them higher potential profits. It'll crash in a year but scuttle hundreds of livelihoods before it does.

Dan Sheehan

# 21st April 2023, 4:38 pm / ai, ethics, generative-ai

Bard now helps you code (via) Google have enabled Bard’s code generation abilities—these were previously only available through jailbreaking. It’s pretty good—I got it to write me code to download a CSV file and insert it into a SQLite database—though when I challenged it to protect against SQL injection it hallucinated a non-existent “cursor.prepare()” method. Generated code can be exported to a Colab notebook with a click.

# 21st April 2023, 3:32 pm / google, generative-ai, bard, ai, llms

Stability AI Launches the First of its StableLM Suite of Language Models (via) 3B and 7B base models, with 15B and 30B are on the way. CC BY-SA-4.0. “StableLM is trained on a new experimental dataset built on The Pile, but three times larger with 1.5 trillion tokens of content. We will release details on the dataset in due course.”

# 19th April 2023, 3:47 pm / stable-diffusion, generative-ai, ai, homebrew-llms, llms

Inside the secret list of websites that make AI chatbots sound smart. Washington Post story digging into the C4 dataset—Colossal Clean Crawled Corpus, a filtered version of Common Crawl that’s often used for training large language models. They include a neat interactive tool for searching a domain to see if it’s included—TIL that simonwillison.net is the 106,649th ranked site in C4 by number of tokens, 189,767 total—0.0001% of the total token volume in C4.

# 19th April 2023, 1:35 pm / washington-post, llms, ai, generative-ai, training-data

LLaVA: Large Language and Vision Assistant (via) Yet another multi-modal model combining a vision model (pre-trained CLIP ViT-L/14) and a LLaMA derivative model (Vicuna). The results I get from their demo are even more impressive than MiniGPT-4. Also includes a new training dataset, LLaVA-Instruct-150K, derived from GPT-4 and subject to the same warnings about the OpenAI terms of service.

# 19th April 2023, 1:14 am / generative-ai, llama, computer-vision, ai, llms, vicuna

What’s in the RedPajama-Data-1T LLM training set

Visit What's in the RedPajama-Data-1T LLM training set

RedPajama is “a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens”. It’s a collaboration between Together, Ontocord.ai, ETH DS3Lab, Stanford CRFM, Hazy Research, and MILA Québec AI Institute.

[... 1,077 words]

RedPajama, a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens. With the amount of projects that have used LLaMA as a foundation model since its release two months ago—despite its non-commercial license—it’s clear that there is a strong desire for a fully openly licensed alternative.

RedPajama is a collaboration between Together, Ontocord.ai, ETH DS3Lab, Stanford CRFM, Hazy Research, and MILA Québec AI Institute aiming to build exactly that.

Step one is gathering the training data: the LLaMA paper described a 1.2 trillion token training set gathered from sources that included Wikipedia, Common Crawl, GitHub, arXiv, Stack Exchange and more.

RedPajama-Data-1T is an attempt at recreating that training set. It’s now available to download, as 2,084 separate multi-GB jsonl files—2.67TB total.

Even without a trained model, this is a hugely influential contribution to the world of open source LLMs. Any team looking to build their own LLaMA from scratch can now jump straight to the next stage, training the model.

# 17th April 2023, 5:13 pm / open-source, generative-ai, llama, ai, homebrew-llms, llms, redpajama, training-data

Latest Twitter search results for “as an AI language model” (via) Searching for “as an AI language model” on Twitter reveals hundreds of bot accounts which are clearly being driven by GPT models and have been asked to generate content which occasionally trips the ethical guidelines trained into the OpenAI models.

If Twitter still had an affordable search API someone could do some incredible disinformation research on top of this, looking at which accounts are implicated, what kinds of things they are tweeting about, who they follow and retweet and so-on.

# 17th April 2023, 2:28 pm / twitter, ethics, generative-ai, openai, disinformation, ai

MiniGPT-4 (via) An incredible project with a poorly chosen name. A team from King Abdullah University of Science and Technology in Saudi Arabia combined Vicuna-13B (a model fine-tuned on top of Facebook’s LLaMA) with the BLIP-2 vision-language model to create a model that can conduct ChatGPT-style conversations around an uploaded image. The demo is very impressive, and the weights are available to download—45MB for MiniGPT-4, but you’ll need the much larger Vicuna and LLaMA weights as well.

# 17th April 2023, 2:21 pm / llms, ai, generative-ai, homebrew-llms, computer-vision, vicuna

How I Used Stable Diffusion and Dreambooth to Create A Painted Portrait of My Dog (via) I like posts like this that go into detail in terms of how much work it takes to deliberately get the kind of result you really want using generative AI tools. Jake Dahn trained a Dreambooth model from 40 photos of Queso—his photogenic Golden Retriever—using Replicate, then gathered the prompts from ten images he liked on Lexica and generated over 1,000 different candidate images, picked his favourite, used Draw Things img2img resizing to expand the image beyond the initial crop, then Automatic1111 inpainting to tweak the ears, then Real-ESRGAN 4x+ to upscale for the final print.

# 16th April 2023, 7:57 pm / stable-diffusion, ai, generative-ai, replicate

Web LLM runs the vicuna-7b Large Language Model entirely in your browser, and it’s very impressive

Visit Web LLM runs the vicuna-7b Large Language Model entirely in your browser, and it's very impressive

A month ago I asked Could you train a ChatGPT-beating model for $85,000 and run it in a browser?. $85,000 was a hypothetical training cost for LLaMA 7B plus Stanford Alpaca. “Run it in a browser” was based on the fact that Web Stable Diffusion runs a 1.9GB Stable Diffusion model in a browser, so maybe it’s not such a big leap to run a small Large Language Model there as well.

[... 2,276 words]

Although fine-tuning can feel like the more natural option—training on data is how GPT learned all of its other knowledge, after all—we generally do not recommend it as a way to teach the model knowledge. Fine-tuning is better suited to teaching specialized tasks or styles, and is less reliable for factual recall. [...] In contrast, message inputs are like short-term memory. When you insert knowledge into a message, it's like taking an exam with open notes. With notes in hand, the model is more likely to arrive at correct answers.

Ted Sanders, OpenAI

# 15th April 2023, 1:44 pm / prompt-engineering, gpt-3, generative-ai, openai, gpt-4, ai, llms, fine-tuning