Simon Willison’s Weblog

Subscribe
Atom feed for generative-ai

1,109 posts tagged “generative-ai”

Machine learning systems that can generate new content: text, images, audio, video and more.

2024

GPUs on Fly.io are available to everyone! We’ve been experimenting with GPUs on Fly for a few months for Datasette Cloud. They’re well documented and quite easy to use—any example Python code you find that uses NVIDIA CUDA stuff generally Just Works. Most interestingly of all, Fly GPUs can scale to zero—so while they cost $2.50/hr for a A100 40G (VRAM) and $3.50/hr for a A100 80G you can configure them to stop running when the machine runs out of things to do.

We’ve successfully used them to run Whisper and to experiment with running various Llama 2 LLMs as well.

To look forward to: “We are working on getting some lower-cost A10 GPUs in the next few weeks”.

# 14th February 2024, 4:28 am / fly, datasette-cloud, generative-ai, whisper, ai, llms, nvidia, gpus

Aya (via) “A global initiative led by Cohere For AI involving over 3,000 independent researchers across 119 countries. Aya is a state-of-art model and dataset, pushing the boundaries of multilingual AI for 101 languages through open science.”

Both the model and the training data are released under Apache 2. The training data looks particularly interesting: “513 million instances through templating and translating existing datasets across 114 languages”—suggesting the data is mostly automatically generated.

# 13th February 2024, 5:14 pm / open-source, llms, ai, generative-ai, cohere, training-data, llm-release

The unsettling scourge of obituary spam (via) Well this is particularly grim. Apparently “obituary aggregator” sites have been an SEO trick for at least 15 years, and now they’re using generative AI to turn around junk rewritten (and frequently inaccurate) obituaries even faster.

# 13th February 2024, 12:36 am / llms, ai, ethics, generative-ai, ai-ethics

One consideration is that such a deep ML system could well be developed outside of Google-- at Microsoft, Baidu, Yandex, Amazon, Apple, or even a startup. My impression is that the Translate team experienced this. Deep ML reset the translation game; past advantages were sort of wiped out. Fortunately, Google's huge investment in deep ML largely paid off, and we excelled in this new game. Nevertheless, our new ML-based translator was still beaten on benchmarks by a small startup. The risk that Google could similarly be beaten in relevance by another company is highlighted by a startling conclusion from BERT: huge amounts of user feedback can be largely replaced by unsupervised learning from raw text. That could have heavy implications for Google.

Eric Lehman, internal Google email in 2018

# 11th February 2024, 10:59 pm / machine-learning, translation, google, generative-ai, ai, llms, bert

Reality is that LLMs are not AGI -- they're a big curve fit to a very large dataset. They work via memorization and interpolation. But that interpolative curve can be tremendously useful, if you want to automate a known task that's a match for its training data distribution.

Memorization works, as long as you don't need to adapt to novelty. You don't need intelligence to achieve usefulness across a set of known, fixed scenarios.

François Chollet

# 10th February 2024, 6:39 am / llms, ai, generative-ai, francois-chollet

Google’s Gemini Advanced: Tasting Notes and Implications. Ethan Mollick reviews the new Google Gemini Advanced—a rebranded Bard, released today, that runs on the GPT-4 competitive Gemini Ultra model.

“GPT-4 [...] has been the dominant AI for well over a year, and no other model has come particularly close. Prior to Gemini, we only had one advanced AI model to look at, and it is hard drawing conclusions with a dataset of one. Now there are two, and we can learn a few things.”

I like Ethan’s use of the term “tasting notes” here. Reminds me of how Matt Webb talks about being a language model sommelier.

# 8th February 2024, 3:10 pm / ethan-mollick, google, generative-ai, gpt-4, bard, ai, llms, gemini

If your only way of making a painting is to actually dab paint laboriously onto a canvas, then the result might be bad or good, but at least it’s the result of a whole lot of micro-decisions you made as an artist. You were exercising editorial judgment with every paint stroke. That is absent in the output of these programs.

Neal Stephenson

# 7th February 2024, 5:04 pm / neal-stephenson, generative-ai

Open Language Models (OLMos) and the LLM landscape (via) OLMo is a newly released LLM from the Allen Institute for AI (AI2) currently available in 7b and 1b parameters (OLMo-65b is on the way) and trained on a fully openly published dataset called Dolma.

The model and code are Apache 2, while the data is under the “AI2 ImpACT license”.

From the benchmark scores shared here by Nathan Lambert it looks like this may be the highest performing model currently available that was built using a fully documented training set.

What’s in Dolma? It’s mainly Common Crawl, Wikipedia, Project Gutenberg and the Stack.

# 2nd February 2024, 4:11 am / llms, ai, generative-ai, training-data, ai2, llm-release

LLMs may offer immense value to society. But that does not warrant the violation of copyright law or its underpinning principles. We do not believe it is fair for tech firms to use rightsholder data for commercial purposes without permission or compensation, and to gain vast financial rewards in the process. There is compelling evidence that the UK benefits economically, politically and societally from upholding a globally respected copyright regime.

UK House of Lords report on Generative AI

# 2nd February 2024, 3:54 am / politics, ethics, generative-ai, ai, llms, ai-ethics

For many people in many organizations, their measurable output is words - words in emails, in reports, in presentations. We use words as proxy for many things: the number of words is an indicator of effort, the quality of the words is an indicator of intelligence, the degree to which the words are error-free is an indicator of care.

[...] But now every employee with Copilot can produce work that checks all the boxes of a formal report without necessarily representing underlying effort.

Ethan Mollick

# 2nd February 2024, 3:34 am / ethan-mollick, ethics, generative-ai, ai, llms, ai-ethics

teknium/OpenHermes-2.5 (via) The Nous-Hermes and Open Hermes series of LLMs, fine-tuned on top of base models like Llama 2 and Mistral, have an excellent reputation and frequently rank highly on various leaderboards.

The developer behind them, Teknium, just released the full set of fine-tuning data that they curated to build these models. It’s a 2GB JSON file with over a million examples of high quality prompts, responses and some multi-prompt conversations, gathered from a number of different sources and described in the data card.

# 1st February 2024, 4:18 am / llms, ai, fine-tuning, generative-ai, nous-research, llm-release

Simon Willison interview: AI software still needs the human touch. Thomas Claburn interviewed me for The Register. We talked about AI training copyright, applications of AI for programming, AI security and a whole bunch of other topics.

# 27th January 2024, 10:08 pm / generative-ai, interviews, the-register, ai, llms

Danielle Del, a spokeswoman for Sasso, said Dudesy is not actually an A.I.

“It’s a fictional podcast character created by two human beings, Will Sasso and Chad Kultgen,” Del wrote in an email. “The YouTube video ‘I’m Glad I’m Dead’ was completely written by Chad Kultgen.”

George Carlin’s Estate Sues Podcasters Over A.I. Episode

# 27th January 2024, 5:52 pm / llms, ai, ethics, generative-ai, ai-ethics

The Articulation Barrier: Prompt-Driven AI UX Hurts Usability. Jakob Nielsen: “Generative AI systems like ChatGPT use prose prompts for intent-based outcomes, requiring users to be articulate in writing prose, which is a challenge for half of the population in rich countries.”

# 27th January 2024, 3:49 pm / jakob-nielsen, generative-ai, ai, usability, llms

LLM 0.13: The annotated release notes

I just released LLM 0.13, the latest version of my LLM command-line tool for working with Large Language Models—both via APIs and running models locally using plugins.

[... 1,278 words]

Did an AI write that hour-long “George Carlin” special? I’m not convinced. Two weeks ago "Dudesy", a comedy podcast which claims to be controlled and written by an AI, released an extremely poor taste hour long YouTube video called "George Carlin: I’m Glad I’m Dead". They used voice cloning to produce a stand-up comedy set featuring the late George Carlin, claiming to also use AI to write all of the content after training it on everything in the Carlin back catalog.

Unsurprisingly this has resulted in a massive amount of angry coverage, including from Carlin's own daughter (the Carlin estate have filed a lawsuit). Resurrecting people without their permission is clearly abhorrent.

But... did AI even write this? The author of this Ars Technica piece, Kyle Orland, started digging in.

It turns out the Dudesy podcast has been running with this premise since it launched in early 2022 - long before any LLM was capable of producing a well-crafted joke. The structure of the Carlin set goes way beyond anything I've seen from even GPT-4. And in a follow-up podcast episode, Dudesy co-star Chad Kultgen gave an O. J. Simpson-style "if I did it" semi-confession that described a much more likely authorship process.

I think this is a case of a human-pretending-to-be-an-AI - an interesting twist, given that the story started out being about an-AI-imitating-a-human.

I consulted with Kyle on this piece, and got a couple of neat quotes in there:

Either they have genuinely trained a custom model that can generate jokes better than any model produced by any other AI researcher in the world... or they're still doing the same bit they started back in 2022 [...]

The real story here is… everyone is ready to believe that AI can do things, even if it can't. In this case, it's pretty clear what's going on if you look at the wider context of the show in question. But anyone without that context, [a viewer] is much more likely to believe that the whole thing was AI-generated… thanks to the massive ramp up in the quality of AI output we have seen in the past 12 months.

Update 27th January 2024: The NY Times confirmed via a spokesperson for the podcast that the entire special had been written by Chad Kultgen, not by an AI.

# 26th January 2024, 4:52 am / ai, ethics, llms, comedy, arstechnica, generative-ai, ai-ethics

Fairly Trained launches certification for generative AI models that respect creators’ rights. I've been using the term "vegan models" for a while to describe machine learning models that have been trained in a way that avoids using unlicensed, copyrighted data. Fairly Trained is a new non-profit initiative that aims to encourage such models through a "certification" stamp of approval.

The team is lead by Ed Newton-Rex, who was previously VP of Audio at Stability AI before leaving over ethical concerns with the way models were being trained.

# 25th January 2024, 4:29 am / ai, ethics, generative-ai, ai-ethics

Django Chat: Datasette, LLMs, and Django. I’m the guest on the latest episode of the Django Chat podcast. We talked about Datasette, LLMs, the New York Times OpenAI lawsuit, the Python Software Foundation and all sorts of other topics.

# 24th January 2024, 8:41 pm / llms, generative-ai, ai, psf, django, podcasts, podcast-appearances

Google Research: Lumiere. The latest in text-to-video from Google Research, described as “a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion”.

Most existing text-to-video models generate keyframes and then use other models to fill in the gaps, which frequently leads to a lack of coherency. Lumiere “generates the full temporal duration of the video at once”, which avoids this problem.

Disappointingly but unsurprisingly the paper doesn’t go into much detail on the training data, beyond stating “We train our T2V model on a dataset containing 30M videos along with their text caption. The videos are 80 frames long at 16 fps (5 seconds)”.

The examples of “stylized generation” which combine a text prompt with a single reference image for style are particularly impressive.

# 24th January 2024, 7:58 pm / generative-ai, ai, google, video, text-to-video

Prompt Lookup Decoding (via) Really neat LLM optimization trick by Apoorv Saxena, who observed that it’s common for sequences of tokens in LLM input to be reflected by the output—snippets included in a summarization, for example.

Apoorv’s code performs a simple search for such prefixes and uses them to populate a set of suggested candidate IDs during LLM token generation.

The result appears to provide around a 2.4x speed-up in generating outputs!

# 23rd January 2024, 2:14 am / llms, ai, generative-ai

AWS Fixes Data Exfiltration Attack Angle in Amazon Q for Business. An indirect prompt injection (where the AWS Q bot consumes malicious instructions) could result in Q outputting a markdown link to a malicious site that exfiltrated the previous chat history in a query string.

Amazon fixed it by preventing links from being output at all—apparently Microsoft 365 Chat uses the same mitigation.

# 19th January 2024, 12:02 pm / prompt-injection, security, generative-ai, aws, ai, llms, exfiltration-attacks

Talking about Open Source LLMs on Oxide and Friends

Visit Talking about Open Source LLMs on Oxide and Friends

I recorded an episode of the Oxide and Friends podcast on Monday, talking with Bryan Cantrill and Adam Leventhal about Open Source LLMs.

[... 1,995 words]

Open Source LLMs with Simon Willison. I was invited to the Oxide and Friends weekly audio show (previously on Twitter Spaces, now using broadcast using Discord) to talk about open source LLMs, and to respond to a very poorly considered op-ed calling for them to be regulated as “uniquely dangerous”. It was a really fun conversation, now available to listen to as a podcast or YouTube audio-only video.

# 17th January 2024, 8:53 pm / podcasts, open-source, generative-ai, ai, llms, oxide, podcast-appearances

On being listed in the court document as one of the artists whose work was used to train Midjourney, alongside 4,000 of my closest friends (via) Poignant webcomic from Cat and Girl.

I want to make my little thing and put it out in the world and hope that sometimes it means something to somebody else.

Without exploiting anyone.

And without being exploited.

# 16th January 2024, 7:02 pm / midjourney, ai, ethics, generative-ai, text-to-image, ai-ethics

More than an OpenAI Wrapper: Perplexity Pivots to Open Source. I’m increasingly impressed with Perplexity.ai—I’m using it on a daily basis now. It’s by far the best implementation I’ve seen of LLM-assisted search—beating Microsoft Bing and Google Bard at their own game.

A year ago it was implemented as a GPT 3.5 powered wrapper around Microsoft Bing. To my surprise they’ve now evolved way beyond that: Perplexity has their own search index now and is running their own crawlers, and they’re using variants of Mistral 7B and Llama 70B as their models rather than continuing to depend on OpenAI.

# 13th January 2024, 6:12 am / perplexity, generative-ai, search, ai, llms, crawling, ai-assisted-search

AI versus old-school creativity: a 50-student, semester-long showdown (via) An interesting study in which 50 university students “wrote, coded, designed, modeled, and recorded creations with and without AI, then judged the results”.

This study seems to explore the approach of incremental prompting to produce an AI-driven final results. I use GPT-4 on a daily basis but my usage patterns are quite different: I very rarely let it actually write anything for me, instead using it as brainstorming partner, or to provide feedback, or as API reference or a thesaurus.

# 10th January 2024, 11:49 pm / generative-ai, chatgpt, education, ai, llms

You Can Build an App in 60 Minutes with ChatGPT, with Geoffrey Litt (via) YouTube interview between Dan Shipper and Geoffrey Litt. They talk about how ChatGPT can build working React applications and how this means you can build extremely niche applications that you woudn’t have considered working on before—then to demonstrate that idea, they collaborate to build a note-taking app to be used just during that specific episode recording, pasting React code from ChatGPT into Replit.

Geoffrey: “I started wondering what if we had a world where everybody could craft software tools that match the workflows they want to have, unique to themselves and not just using these pre-made tools. That’s what malleable software means to me.”

# 10th January 2024, 11:41 pm / react, generative-ai, chatgpt, ai, llms, geoffrey-litt

The Random Transformer (via) “Understand how transformers work by demystifying all the math behind them”—Omar Sanseviero from Hugging Face meticulously implements the transformer architecture behind LLMs from scratch using Python and numpy. There’s a lot to take in here but it’s all very clearly explained.

# 10th January 2024, 5:09 am / generative-ai, numpy, python, transformers, ai, llms

WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia. This paper describes a really interesting LLM system that runs Retrieval Augmented Generation against Wikipedia to help answer questions, but includes a second step where facts in the answer are fact-checked against Wikipedia again before returning an answer to the user. They claim “97.3% factual accuracy of its claims in simulated conversation” on a GPT-4 backed version, and also see good results when backed by LLaMA 7B.

The implementation is mainly through prompt engineering, and detailed examples of the prompts they used are included at the end of the paper.

# 9th January 2024, 9:30 pm / prompt-engineering, generative-ai, wikipedia, ai, llms, rag

Mixtral of Experts. The Mixtral paper is out, exactly a month after the release of the Mixtral 8x7B model itself. Thanks to the paper I now have a reasonable understanding of how a mixture of experts model works: each layer has 8 available blocks, but a router model selects two out of those eight for each token passing through that layer and combines their output. “As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference.”

The Mixtral token context size is an impressive 32k, and it compares extremely well against the much larger Llama 70B across a whole array of benchmarks.

Unsurprising but disappointing: there’s nothing in the paper at all about what it was trained on.

# 9th January 2024, 4:03 am / mistral, llms, ai, generative-ai, local-llms