Simon Willison’s Weblog

Subscribe

Items tagged llms in Jan, 2024

Filters: Year: 2024 × Month: Jan × llms × Sorted by date


Simon Willison interview: AI software still needs the human touch. Thomas Claburn interviewed me for The Resister. We talked about AI training copyright, applications of AI for programming, AI security and a whole bunch of other topics. # 27th January 2024, 10:08 pm

Danielle Del, a spokeswoman for Sasso, said Dudesy is not actually an A.I.

“It’s a fictional podcast character created by two human beings, Will Sasso and Chad Kultgen,” Del wrote in an email. “The YouTube video ‘I’m Glad I’m Dead’ was completely written by Chad Kultgen.”

George Carlin’s Estate Sues Podcasters Over A.I. Episode # 27th January 2024, 5:52 pm

The Articulation Barrier: Prompt-Driven AI UX Hurts Usability. Jakob Nielsen: “Generative AI systems like ChatGPT use prose prompts for intent-based outcomes, requiring users to be articulate in writing prose, which is a challenge for half of the population in rich countries.” # 27th January 2024, 3:49 pm

If you have had any prior experience with personal computers, what you might expect to see is some sort of opaque code, called a “prompt,” consisting of phosphorescent green or white letters on a murky background. What you see with Macintosh is the Finder. On a pleasant, light background (you can later change the background to any of a number of patterns, if you like), little pictures called “icons” appear, representing choices available to you.

Steven Levy (in 1984) # 27th January 2024, 1:33 am

LLM 0.13: The annotated release notes

I just released LLM 0.13, the latest version of my LLM command-line tool for working with Large Language Models—both via APIs and running models locally using plugins.

[... 1278 words]

Did an AI write that hour-long “George Carlin” special? I’m not convinced. Two weeks ago “Dudesy”, a comedy podcast which claims to be controlled and written by an AI, released an extremely poor taste hour long YouTube video called “George Carlin: I’m Glad I’m Dead”. They used voice cloning to produce a stand-up comedy set featuring the late George Carlin, claiming to also use AI to write all of the content after training it on everything in the Carlin back catalog.

Unsurprisingly this has resulted in a massive amount of angry coverage, including from Carlin’s own daughter (the Carlin estate have filed a lawsuit). Resurrecting people without their permission is clearly abhorrent.

But... did AI even write this? The author of this piece, Kyle Orland, started digging in.

It turns out the Dudesy podcast has been running with this premise since it launched in early 2022—long before any LLM was capable of producing a well-crafted joke. The structure of the Carlin set goes way beyond anything I’ve seen from even GPT-4. And in a follow-up podcast episode, Dudesy co-star Chad Kultgen gave an O. J. Simpson-style “if I did it” semi-confession that described a much more likely authorship process.

I think this is a case of a human-pretending-to-be-an-AI—an interesting twist, given that the story started out being about an-AI-imitating-a-human.

I consulted with Kyle on this piece, and got a couple of neat quotes in there:

“Either they have genuinely trained a custom model that can generate jokes better than any model produced by any other AI researcher in the world... or they’re still doing the same bit they started back in 2022”

“The real story here is… everyone is ready to believe that AI can do things, even if it can’t. In this case, it’s pretty clear what’s going on if you look at the wider context of the show in question. But anyone without that context, [a viewer] is much more likely to believe that the whole thing was AI-generated… thanks to the massive ramp up in the quality of AI output we have seen in the past 12 months.”

Update 27th January 2024: The NY Times confirmed via a spokesperson for the podcast that the entire special had been written by Chad Kultgen, not by an AI. # 26th January 2024, 4:52 am

Django Chat: Datasette, LLMs, and Django. I’m the guest on the latest episode of the Django Chat podcast. We talked about Datasette, LLMs, the New York Times OpenAI lawsuit, the Python Software Foundation and all sorts of other topics. # 24th January 2024, 8:41 pm

Prompt Lookup Decoding (via) Really neat LLM optimization trick by Apoorv Saxena, who observed that it’s common for sequences of tokens in LLM input to be reflected by the output—snippets included in a summarization, for example.

Apoorv’s code performs a simple search for such prefixes and uses them to populate a set of suggested candidate IDs during LLM token generation.

The result appears to provide around a 2.4x speed-up in generating outputs! # 23rd January 2024, 2:14 am

AWS Fixes Data Exfiltration Attack Angle in Amazon Q for Business. An indirect prompt injection (where the AWS Q bot consumes malicious instructions) could result in Q outputting a markdown link to a malicious site that exfiltrated the previous chat history in a query string.

Amazon fixed it by preventing links from being output at all—apparently Microsoft 365 Chat uses the same mitigation. # 19th January 2024, 12:02 pm

Talking about Open Source LLMs on Oxide and Friends

I recorded an episode of the Oxide and Friends podcast on Monday, talking with Bryan Cantrill and Adam Leventhal about Open Source LLMs.

[... 1995 words]

Open Source LLMs with Simon Willison. I was invited to the Oxide and Friends weekly audio show (previously on Twitter Spaces, now using broadcast using Discord) to talk about open source LLMs, and to respond to a very poorly considered op-ed calling for them to be regulated as “uniquely dangerous”. It was a really fun conversation, now available to listen to as a podcast or YouTube audio-only video. # 17th January 2024, 8:53 pm

More than an OpenAI Wrapper: Perplexity Pivots to Open Source. I’m increasingly impressed with Perplexity.ai—I’m using it on a daily basis now. It’s by far the best implementation I’ve seen of LLM-assisted search—beating Microsoft Bing and Google Bard at their own game.

A year ago it was implemented as a GPT 3.5 powered wrapper around Microsoft Bing. To my surprise they’ve now evolved way beyond that: Perplexity has their own search index now and is running their own crawlers, and they’re using variants of Mistral 7B and Llama 70B as their models rather than continuing to depend on OpenAI. # 13th January 2024, 6:12 am

Budgeting with ChatGPT (via) Jon Callahan describes an ingenious system he set up to categorize his credit card transactions using GPT 3.5. He has his bank email him details of any transaction over $0, then has an email filter to forward those to Postmark, which sends them via a JSON webhook to a custom Deno Deploy app which cleans the transaction up with a GPT 3.5 prompt (including guessing the merchant) and submits the results to a base in Airtable. # 11th January 2024, 4:40 am

AI versus old-school creativity: a 50-student, semester-long showdown (via) An interesting study in which 50 university students “wrote, coded, designed, modeled, and recorded creations with and without AI, then judged the results”.

This study seems to explore the approach of incremental prompting to produce an AI-driven final results. I use GPT-4 on a daily basis but my usage patterns are quite different: I very rarely let it actually write anything for me, instead using it as brainstorming partner, or to provide feedback, or as API reference or a thesaurus. # 10th January 2024, 11:49 pm

You Can Build an App in 60 Minutes with ChatGPT, with Geoffrey Litt (via) YouTube interview between Dan Shipper and Geoffrey Litt. They talk about how ChatGPT can build working React applications and how this means you can build extremely niche applications that you woudn’t have considered working on before—then to demonstrate that idea, they collaborate to build a note-taking app to be used just during that specific episode recording, pasting React code from ChatGPT into Replit.

Geoffrey: “I started wondering what if we had a world where everybody could craft software tools that match the workflows they want to have, unique to themselves and not just using these pre-made tools. That’s what malleable software means to me.” # 10th January 2024, 11:41 pm

The Random Transformer (via) “Understand how transformers work by demystifying all the math behind them”—Omar Sanseviero from Hugging Face meticulously implements the transformer architecture behind LLMs from scratch using Python and numpy. There’s a lot to take in here but it’s all very clearly explained. # 10th January 2024, 5:09 am

WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia. This paper describes a really interesting LLM system that runs Retrieval Augmented Generation against Wikipedia to help answer questions, but includes a second step where facts in the answer are fact-checked against Wikipedia again before returning an answer to the user. They claim “97.3% factual accuracy of its claims in simulated conversation” on a GPT-4 backed version, and also see good results when backed by LLaMA 7B.

The implementation is mainly through prompt engineering, and detailed examples of the prompts they used are included at the end of the paper. # 9th January 2024, 9:30 pm

What I should have said about the term Artificial Intelligence

With the benefit of hindsight, I did a bad job with my post, It’s OK to call it Artificial Intelligence a few days ago.

[... 376 words]

Mixtral of Experts. The Mixtral paper is out, exactly a month after the release of the Mixtral 8x7B model itself. Thanks to the paper I now have a reasonable understanding of how a mixture of experts model works: each layer has 8 available blocks, but a router model selects two out of those eight for each token passing through that layer and combines their output. “As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference.”

The Mixtral token context size is an impressive 32k, and it compares extremely well against the much larger Llama 70B across a whole array of benchmarks.

Unsurprising but disappointing: there’s nothing in the paper at all about what it was trained on. # 9th January 2024, 4:03 am

OpenAI and journalism. Bit of a misleading title here: this is OpenAI’s first public response to the lawsuit filed by the New York Times concerning their use of unlicensed NYT content to train their models. # 8th January 2024, 6:33 pm

We believe that AI tools are at their best when they incorporate and represent the full diversity and breadth of human intelligence and experience. [...] Because copyright today covers virtually every sort of human expression– including blog posts, photographs, forum posts, scraps of software code, and government documents–it would be impossible to train today’s leading AI models without using copyrighted materials. Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.

OpenAI to the Lords Select Committee on LLMs # 8th January 2024, 5:33 pm

Does GPT-2 Know Your Phone Number? (via) This report from Berkeley Artificial Intelligence Research in December 2020 showed GPT-3 outputting a full page of chapter 3 of Harry Potter and the Philosopher’s Stone—similar to how the recent suit from the New York Times against OpenAI and Microsoft demonstrates memorized news articles from that publication as outputs from GPT-4. # 8th January 2024, 5:26 am

It’s OK to call it Artificial Intelligence

Update 9th January 2024: This post was clumsily written and failed to make the point I wanted it to make. I’ve published a follow-up, What I should have said about the term Artificial Intelligence which you should read instead.

[... 1818 words]

GPT in 500 lines of SQL (via) Utterly brilliant piece of PostgreSQL hackery by Alex Bolenok, who implements a full GPT-2 style language model in SQL on top of pg_vector. The final inference query is 498 lines long! # 6th January 2024, 10:55 pm

Microsoft Research relicense Phi-2 as MIT (via) Phi-2 was already an interesting model—really strong results for its size—made available under a non-commercial research license. It just got significantly more interesting: Microsoft relicensed it as MIT open source. # 6th January 2024, 6:06 am

Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations (via) NIST—the National Institute of Standards and Technology, a US government agency, released a 106 page report on attacks against modern machine learning models, mostly covering LLMs.

Prompt injection gets two whole sections, one on direct prompt injection (which incorporates jailbreaking as well, which they misclassify as a subset of prompt injection) and one on indirect prompt injection.

They talk a little bit about mitigations, but for both classes of attack conclude: “Unfortunately, there is no comprehensive or foolproof solution for protecting models against adversarial prompting, and future work will need to be dedicated to investigating suggested defenses for their efficacy.” # 6th January 2024, 4:08 am

My blog’s year archive pages now have tag clouds (via) Inspired by the tag cloud I used in my recent 2023 AI roundup post, I decided to add a tag cloud to the top of every one of my archive-by-year pages showing what topics I had spent the most time with that year.

I already had old code for this, so I pasted it into GPT-4 along with an example of the output of my JSON endpoint from Django SQL Dashboard and had it do most of the work for me. # 4th January 2024, 9:02 pm

Since the advent of ChatGPT, and later by using LLMs that operate locally, I have made extensive use of this new technology. The goal is to accelerate my ability to write code, but that’s not the only purpose. There’s also the intent to not waste mental energy on aspects of programming that are not worth the effort.

[...] Current LLMs will not take us beyond the paths of knowledge, but if we want to tackle a topic we do not know well, they can often lift us from our absolute ignorance to the point where we know enough to move forward on our own.

Salvatore Sanfilippo # 2nd January 2024, 2:50 pm