Simon Willison’s Weblog


363 items tagged “llms”


Prompt Engineering. Extremely detailed introduction to the field of prompt engineering by Lilian Weng, who leads applied research at OpenAI. # 21st March 2023, 5:12 pm

OpenAI to discontinue support for the Codex API (via) OpenAI shutting off access to their Codex model—a GPT3 variant fine-tuned for code related tasks, but that was being used for all sorts of other purposes—partly because it had been in a beta phase for over a year where OpenAI didn’t charge anything for it. This feels to me like a major strategic misstep for OpenAI: they’re only giving three days notice, which is shaking people’s confidence in them as a stable platform for building on at the very moment when competition from other vendors (and open source alternatives) is heating up. # 21st March 2023, 5:04 pm

Was on a plane yesterday, studying some physics; got confused about something and I was able to solve my problem by just asking alpaca-13B—running locally on my machine—for an explanation. Felt straight-up spooky.

Andy Matuschak # 21st March 2023, 2:45 pm

A conversation about prompt engineering with CBC Day 6

I’m on Canadian radio this morning! I was interviewed by Peter Armstrong for CBC Day 6 about the developing field of prompt engineering.

[... 1742 words]

Fine-tune LLaMA to speak like Homer Simpson. Replicate spent 90 minutes fine-tuning LLaMA on 60,000 lines of dialog from the first 12 seasons of the Simpsons, and now it can do a good job of producing invented dialog from any of the characters from the series. This is a really interesting result: I’ve been skeptical about how much value can be had from fine-tuning large models on just a tiny amount of new data, assuming that the new data would be statistically irrelevant compared to the existing model. Clearly my mental model around this was incorrect. # 17th March 2023, 11:08 pm

The Unpredictable Abilities Emerging From Large AI Models (via) Nice write-up of the most interesting aspect of large language models: the fact that they gain emergent abilities at certain “breakthrough” size points, and no-one is entirely sure they understand why. # 17th March 2023, 10:54 pm

Could you train a ChatGPT-beating model for $85,000 and run it in a browser?

I think it’s now possible to train a large language model with similar functionality to GPT-3 for $85,000. And I think we might soon be able to run the resulting model entirely in the browser, and give it capabilities that leapfrog it ahead of ChatGPT.

[... 1751 words]

The surprising ease and effectiveness of AI in a loop (via) Matt Webb on the langchain Python library and the ReAct design pattern, where you plug additional tools into a language model by teaching it to work in a “Thought... Act... Observation” loop where the Act specifies an action it wishes to take (like searching Wikipedia) and an extra layer of software than carries out that action and feeds back the result as the Observation. Matt points out that the ChatGPT 1/10th price drop makes this kind of model usage enormously more cost effective than it was before. # 17th March 2023, 12:04 am

Transformers.js. Hugging Face Transformers is a library of Transformer machine learning models plus a Python package for loading and running them. Transformers.js provides a JavaScript alternative interface which runs in your browser, thanks to a set of precompiled WebAssembly binaries for a selection of models. This interactive demo is incredible: in particular, try running the Image classification with google/vit-base-patch16-224 (91MB) model against any photo to get back labels representing that photo. Dropping one of these models onto a page is as easy as linking to a hosted CDN script and running a few lines of JavaScript. # 16th March 2023, 11:41 pm

Train and run Stanford Alpaca on your own machine. The team at Replicate managed to train their own copy of Stanford’s Alpaca—a fine-tuned version of LLaMA that can follow instructions like ChatGPT. Here they provide step-by-step instructions for recreating Alpaca yourself—running the training needs one or more A100s for a few hours, which you can rent through various cloud providers. # 16th March 2023, 4:10 pm

As an NLP researcher I’m kind of worried about this field after 10-20 years. Feels like these oversized LLMs are going to eat up this field and I’m sitting in my chair thinking, “What’s the point of my research when GPT-4 can do it better?”

Jeonghwan Kim # 16th March 2023, 5:39 am

I expect GPT-4 will have a LOT of applications in web scraping

The increased 32,000 token limit will be large enough to send it the full DOM of most pages, serialized to HTML—then ask questions to extract data

Or... take a screenshot and use the GPT4 image input mode to ask questions about the visually rendered page instead!

Might need to dust off all of those old semantic web dreams, because the world’s information is rapidly becoming fully machine readable

Me # 16th March 2023, 1:09 am

bloomz.cpp (via) Nouamane Tazi Adapted the llama.cpp project to run against the BLOOM family of language models, which were released in July 2022 and trained in France on 45 natural languages and 12 programming languages using the Jean Zay Public Supercomputer, provided by the French government and powered using mostly nuclear energy.

It’s under the RAIL license which allows (limited) commercial use, unlike LLaMA.

Nouamane reports getting 16 tokens/second from BLOOMZ-7B1 running on an M1 Pro laptop. # 16th March 2023, 12:24 am

“AI” has for recent memory been a marketing term anyway. Deep learning and variations have had a good run at being what people mean when they refer to AI, probably overweighting towards big convolution based computer vision models.

Now, “AI” in people’s minds means generative models.

That’s it, it doesn’t mean generative models are replacing CNNs, just like CNNs don’t replace SVMs or regression or whatever. It’s just that pop culture has fallen in love with something else.

version_five # 15th March 2023, 9:05 pm

We call on the field to recognize that applications that aim to believably mimic humans bring risk of extreme harms. Work on synthetic human behavior is a bright line in ethical Al development, where downstream effects need to be understood and modeled in order to block foreseeable harm to society and different social groups.

Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell # 15th March 2023, 3:30 pm

GPT-4 Developer Livestream. 25 minutes of live demos from OpenAI co-founder Greg Brockman at the GPT-4 launch. These demos are all fascinating, including code writing and multimodal vision inputs. The one that really struck me is when Greg pasted in a copy of the tax code and asked GPT-4 to answer some sophisticated tax questions, involving step-by-step calculations that cited parts of the tax code it was working with. # 15th March 2023, 12:20 am

GPT-4 Technical Report (PDF). 98 pages of much more detailed information about GPT-4. The appendices are particularly interesting, including examples of advanced prompt engineering as well as examples of harmful outputs before and after tuning attempts to try and suppress them. # 14th March 2023, 9:39 pm

We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks. [...] We’ve spent 6 months iteratively aligning GPT-4 using lessons from our adversarial testing program as well as ChatGPT, resulting in our best-ever results (though far from perfect) on factuality, steerability, and refusing to go outside of guardrails.

OpenAI # 14th March 2023, 5:02 pm

Int-4 LLaMa is not enough—Int-3 and beyond (via) The Nolano team are experimenting with reducing the size of the LLaMA models even further than the 4bit quantization popularized by llama.cpp. # 13th March 2023, 11:55 pm

Stanford Alpaca, and the acceleration of on-device large language model development

On Saturday 11th March I wrote about how Large language models are having their Stable Diffusion moment. Today is Monday. Let’s look at what’s happened in the past three days.

[... 2055 words]

We introduce Alpaca 7B, a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. Alpaca behaves similarly to OpenAI’s text-davinci-003, while being surprisingly small and easy/cheap to reproduce (<600$).

Alpaca: A Strong Open-Source Instruction-Following Model # 13th March 2023, 6:18 pm

I’ve successfully run LLaMA 7B model on my 4GB RAM Raspberry Pi 4. It’s super slow about 10sec/token. But it looks we can run powerful cognitive pipelines on a cheap hardware.

Artem Andreenko # 12th March 2023, 6:22 pm

ChatGPT’s API is So Good and Cheap, It Makes Most Text Generating AI Obsolete (via) Max Woolf on the quite frankly weird economics of the ChatGPT API: it’s 1/10th the price of GPT-3 Da Vinci and appears to be equivalent (if not more) capable. “But it is very hard to economically justify not using ChatGPT as a starting point for a business need and migrating to a more bespoke infrastructure later as needed, and that’s what OpenAI is counting on. [...] I don’t envy startups whose primary business is text generation right now.” # 11th March 2023, 11:05 pm

Large language models are having their Stable Diffusion moment

The open release of the Stable Diffusion image generation model back in August 2022 was a key moment. I wrote how Stable Diffusion is a really big deal at the time.

[... 1810 words]

Running LLaMA 7B on a 64GB M2 MacBook Pro with llama.cpp. I got Facebook’s LLaMA 7B to run on my MacBook Pro using llama.cpp (a “port of Facebook’s LLaMA model in C/C++”) by Georgi Gerganov. It works! I’ve been hoping to run a GPT-3 class language model on my own hardware for ages, and now it’s possible to do exactly that. The model itself ends up being just 4GB after applying Georgi’s script to “quantize the model to 4-bits”. # 11th March 2023, 4:19 am

What could I do with a universal function — a tool for turning just about any X into just about any Y with plain language instructions?

Robin Sloan # 10th March 2023, 6:02 pm

ChatGPT can’t access the internet, even though it really looks like it can

A really common misconception about ChatGPT is that it can access URLs. I’ve seen many different examples of people pasting in a URL and asking for a summary, or asking it to make use of the content on that page in some way.

[... 1678 words]

Weeknotes: NICAR, and an appearance on KQED Forum

I spent most of this week at NICAR 2023, the data journalism conference hosted this year in Nashville, Tennessee.

[... 1941 words]

Since November, OpenAI has already updated ChatGPT several times. The researchers are using a technique called adversarial training to stop ChatGPT from letting users trick it into behaving badly (known as jailbreaking). This work pits multiple chatbots against each other: one chatbot plays the adversary and attacks another chatbot by generating text to force it to buck its usual constraints and produce unwanted responses. Successful attacks are added to ChatGPT’s training data in the hope that it learns to ignore them.

The inside story of how ChatGPT was built # 5th March 2023, 10:04 pm

How to Wrap Our Heads Around These New Shockingly Fluent Chatbots. I was a guest on KQED Forum this morning, a live radio documentary and call-in show hosted by Alexis Madrigal. Ted Chiang and Claire Leibowicz were the other guests: we talked about ChatGPT and and the new generation of AI-powered tools. # 3rd March 2023, 4:59 am