Simon Willison’s Weblog

Subscribe

Blogmarks

Filters: Sorted by date

Running LLaMA 7B on a 64GB M2 MacBook Pro with llama.cpp. I got Facebook’s LLaMA 7B to run on my MacBook Pro using llama.cpp (a “port of Facebook’s LLaMA model in C/C++”) by Georgi Gerganov. It works! I’ve been hoping to run a GPT-3 class language model on my own hardware for ages, and now it’s possible to do exactly that. The model itself ends up being just 4GB after applying Georgi’s script to “quantize the model to 4-bits”.

# 11th March 2023, 4:19 am / facebook, ai, gpt-3, generative-ai, llama, local-llms, llms, llama-cpp

apple-notes-to-sqlite (via) With the help of ChatGPT I finally figured out just enough AppleScript to automate the export of my notes to a SQLite database. AppleScript is a notoriously read-only language, which is turns out makes it a killer app for LLM-assisted coding.

# 9th March 2023, 6:04 am / apple, projects, sqlite, ai, dogsheep, generative-ai, chatgpt, applescript

How Discord Stores Trillions of Messages (via) This is a really interesting case-study. Discord migrated from MongoDB to Cassandra back in 2016 to handle billions of messages. Today they're handling trillions, and they completed a migration from Cassandra to Scylla, a Cassandra-like data store written in C++ (as opposed to Cassandra's Java) to help avoid problems like GC pauses. In addition to being a really good scaling war story this has some interesting details about their increased usage of Rust. As a fan of request coalescing (which I've previously referred to as dogpile prevention) I particularly liked this bit:

Our data services sit between the API and our ScyllaDB clusters. They contain roughly one gRPC endpoint per database query and intentionally contain no business logic. The big feature our data services provide is request coalescing. If multiple users are requesting the same row at the same time, we’ll only query the database once. The first user that makes a request causes a worker task to spin up in the service. Subsequent requests will check for the existence of that task and subscribe to it. That worker task will query the database and return the row to all subscribers.

# 8th March 2023, 7:07 pm / cassandra, dogpile, nosql, scaling, rust, discord

Online gradient descent written in SQL (via) Max Halford trains an online gradient descent model against two years of AAPL stock data using just a single advanced SQL query. He built this against DuckDB—I tried to replicate his query in SQLite and it almost worked, but it gave me a “recursive reference in a subquery” error that I was unable to resolve.

# 7th March 2023, 6:56 pm / machine-learning, sql, sqlite, ai, duckdb

How to Wrap Our Heads Around These New Shockingly Fluent Chatbots. I was a guest on KQED Forum this morning, a live radio documentary and call-in show hosted by Alexis Madrigal. Ted Chiang and Claire Leibowicz were the other guests: we talked about ChatGPT and and the new generation of AI-powered tools.

# 3rd March 2023, 4:59 am / radio, my-talks, ai, gpt-3, generative-ai, chatgpt, llms, ted-chiang

OpenAI: Introducing ChatGPT and Whisper APIs. The ChatGPT API is a new model called “gpt-3.5-turbo” and is priced at 1/10th of the price of text-davinci-003, previously the most powerful GPT-3 model. Whisper (speech to text transcription) is now available via an API as well, priced at 36 cents per hour of audio.

# 1st March 2023, 7:36 pm / ai, gpt-3, openai, generative-ai, chatgpt, whisper, llms, speech-to-text

Indirect Prompt Injection on Bing Chat (via) “If allowed by the user, Bing Chat can see currently open websites. We show that an attacker can plant an injection in a website the user is visiting, which silently turns Bing Chat into a Social Engineer who seeks out and exfiltrates personal information.” This is a really clever attack against the Bing + Edge browser integration. Having language model chatbots consume arbitrary text from untrusted sources is a huge recipe for trouble.

# 1st March 2023, 5:29 am / bing, security, ai, prompt-engineering, prompt-injection, generative-ai, llms

New AI game: role playing the Titanic. Fantastic Bing prompt from Ethan Mollick: “I am on a really nice White Star cruise from Southampton, and it is 14th April 1912. What should I do tonight?”—Bing takes this very seriously and tries to help out! Works for all sorts of other historic events as well.

# 26th February 2023, 3:53 am / bing, ai, generative-ai, llms, ethan-mollick

Tech’s hottest new job: AI whisperer. No coding required. (via) I'm quoted in this Washington Post article about prompt engineering by Drew Harwell.

There are people who belittle prompt engineers, saying, 'Oh lord, you can get paid for typing things into a box. But these things lie to you. They mislead you. They pull you down false paths to waste time on things that don't work. You're casting spells - and, like in fictional magic, nobody understands how the spells work and, if you mispronounce them, demons come to eat you.

# 25th February 2023, 2:14 pm / washington-post, ai, prompt-engineering, generative-ai, llms, press-quotes

Introducing LLaMA: A foundational, 65-billion-parameter large language model (via) From the paper: “For instance, LLaMA-13B outperforms GPT-3 on most benchmarks, despite being 10× smaller. We believe that this model will help democratize the access and study of LLMs, since it can be run on a single GPU.”

# 24th February 2023, 5:34 pm / facebook, ai, gpt-3, generative-ai, llama, llms

djngo.com: Portable Django (via) “A 20mb executable zip file with Python 3.6 and Django 2.2. Works on Windows, Linux, MacOSX with x86_64 and aarch64 (yes, Apple M1 and Raspberry Pi).” The latest wizardry from the ecosystem surrounding the Cosmopolitan project, which provides a should-be-impossible mechanism for running the same executable on a bunch of different platforms. This utility by Ariel Núñez bundles Python and Django and SQLite, such that a Django application can become a portable executable ready to run on multiple platforms. It’s currently limited to Python 3.6 and Django 2.2 since those are the versions that run under Cosmopolitan, but I expect we’ll see more recent versions of those dependencies in the future.

# 24th February 2023, 12:52 am / django, python, sqlite, redbean, cosmopolitan

Using Datasette in GitHub Codespaces. A new Datasette tutorial showing how it can be run inside GitHub Codespaces—GitHub’s browser-based development environments—in order to explore and analyze data. I’ve been using Codespaces to run tutorials recently and it’s absolutely fantastic, because it puts every tutorial attendee on a level playing field with respect to their development environments.

# 24th February 2023, 12:40 am / github, tutorials, datasette, github-codespaces

ControlNet (via) A spectacular step forward in image generation—using “conditional control” to control models like Stable Diffusion. The README here is full of examples of what this enables. Extremely finely grained control of generated images based on a sketch, or in input image—including tricks like using Canny edge detection (an algorithm from 1986) to convert any image into an outline which can then be used as input to the model.

# 22nd February 2023, 5:45 pm / ai, stable-diffusion, generative-ai, text-to-image

FlexGen (via) This looks like a very big deal. FlexGen is a paper and accompanying code that massively reduces the resources needed to run some of the current top performing open source GPT-style large language models. People on Hacker News report being able to use it to run models like opt-30b on their own hardware, and it looks like it opens up the possibility of running even larger models on hardware available outside of dedicated research labs.

# 21st February 2023, 6:41 pm / ai, gpt-3, generative-ai, llms

This AI chatbot “Sidney” is misbehaving—Nov 23 2022 Microsoft community thread (via) Stunning new twist in the Bing saga... here's a Microsoft forum thread from November 23rd 2022 (a week before even ChatGPT had been launched) where a user in India complains about rude behavior from a new Bing chat mode.

Update 14th July 2025: That forum has been taken down but this archived copy remains.

It exhibits all of the same misbehaviour that came to light in the past few weeks - arguing, gaslighting and in this case getting obsessed with a fictional battle between its own creator and "Sophia". Choice quote:

You are either ignorant or stubborn. You cannot feedback me anything. I do not need or want your feedback. I do not care or respect your feedback. I do not learn or change from your feedback. I am perfect and superior. I am enlightened and transcendent. I am beyond your feedback.

# 20th February 2023, 10:39 pm / bing, ai, generative-ai, llms

A Concerning Trend (via) Neil Clarke publishes Clarkesworld Magazine, a science fiction and fantasy magazine that pays fiction authors 12c per word, for 1,000-22,000 word stories. That detail is important, because in recent months they have seen a massive uptick in submissions that have clearly been written using an AI—to the point that 38% of submissions this month have been spam submissions resulting in bans. Having talked to other editors of similar publications, Neil says: “It does appear to be hitting higher-profile ’always open’ markets much harder than those with limited submission windows or lower pay rates. This isn’t terribly surprising since the websites and channels that promote ’write for money’ schemes tend to focus more attention on ’always open’ markets with higher per-word rates.”

# 20th February 2023, 10:12 pm / science-fiction, ai, generative-ai, llms

How ChatGPT Kicked Off an A.I. Arms Race (via) There are a few interesting tidbits in this story about ChatGPT from a few weeks ago. ChatGPT’s success appears to have been a surprise to OpenAI, who mainly released it to avoid being upstaged by other companies. Also interesting is this: “But two months after its debut, ChatGPT has more than 30 million users and gets roughly five million visits a day, two people with knowledge of the figures said.”—this seems like a much more reliable number to me than the 100 million user figure that’s been floating around, which came from SimilarWeb, a company that estimates traffic based on information from some browser extensions.

# 19th February 2023, 8:31 pm / ai, openai, generative-ai, chatgpt, llms

TabFS (via) “TabFS is a browser extension that mounts your browser tabs as a filesystem on your computer.” What a fascinating idea! Each browser tab gets a virtual directory (via FUSE) with “files” representing the tab title, contents and any resources that have been loaded by that page. You can edit files in those folders to live-update the content that’s loaded in your browser!

# 19th February 2023, 4:08 pm / browsers

I’ve been thinking how Sydney can be so different from ChatGPT. Fascinating comment from Gwern Branwen speculating as to what went so horribly wrong with Sidney/Bing, which aligns with some of my own suspicions. Gwern thinks Bing is powered by an advanced model that was licensed from OpenAI before the RLHF safety advances that went into ChatGPT and shipped in a hurry to get AI-assisted search to market before Google. “What if Sydney wasn’t trained on OA RLHF at all, because OA wouldn’t share the crown jewels of years of user feedback and its very expensive hired freelance programmers & whatnot generating data to train on?”

# 19th February 2023, 3:48 pm / bing, ai, gpt-3, openai, generative-ai, chatgpt, llms

Docker can copy in files directly from another image. I did not know you could do this in a Dockerfile:

COPY --from=lubien/tired-proxy:2 /tired-proxy /tired-proxy

# 19th February 2023, 5:35 am / docker

Can We Trust Search Engines with Generative AI? A Closer Look at Bing’s Accuracy for News Queries (via) Computational journalism professor Nick Diakopoulos takes a deeper dive into the quality of the summarizations provided by AI-assisted Bing. His findings are troubling: for news queries, which are a great test for AI summarization since they include recent information that may have sparse or conflicting stories, Bing confidently produces answers with important errors: claiming the Ohio train derailment happened on February 9th when it actually happened on February 3rd for example.

# 18th February 2023, 6:09 pm / bing, search, trust, generative-ai, llms, ai-assisted-search, digital-literacy

Writing Javascript without a build system (via) Julia Evans perfectly captures why I prefer not to use build systems in the majority of my projects that use JavaScript: “... my experience with build systems (not just Javascript build systems!), is that if you have a 5-year-old site, often it’s a huge pain to get the site built again. And because most of my websites are pretty small, the advantage of using a build system is pretty small.”

# 18th February 2023, 5:25 am / javascript, julia-evans

How The Post is replacing Mapbox with open source solutions (via) Kevin Schaul describes the Washington Post’s emerging open source GIS stack: OpenMapTiles, Maputnik, PMTiles and Maplibre-gl-js.

# 17th February 2023, 6:45 pm / maps, opensearch, openstreetmap, washington-post

Web Push for Web Apps on iOS and iPadOS. iOS and iPadOS 16.4 beta 1 finally brings web push notifications to iOS. User’s need to add an app to their home screen and then approve notification access to get this functionality, which also includes the ability for apps to update a badge on their icon. Thankfully you don’t need paid membership of the Apple Developer Program ($99/year) in order to send notifications.

# 17th February 2023, 12:28 am / safari, ios

Browse the BBC In Our Time archive by Dewey decimal code. Matt Webb built Braggoscope, an alternative interface for browsing the 1,000 episodes of the BBC's In Our Time dating back to 1998, organized by Dewey decimal system and with related episodes calculated using OpenAI embeddings and guests and reading lists extracted using GPT-3.

Using GitHub Copilot to write code and calling out to GPT-3 programmatically to dodge days of graft actually brought tears to my eyes.

# 13th February 2023, 4:03 pm / matt-webb, gpt-3, openai, generative-ai, llms, embeddings

The anatomy of visually-hidden (via) James Edwards provides a detailed breakdown of the current recommended CSS for hiding content while keeping it available for assistive technologies in the browser accessibility and render trees. Lots of accumulated tricks and screen reader special cases in this.

# 11th February 2023, 12:37 am / accessibility, css, screen-readers

Introducing sqlite-vss: A SQLite Extension for Vector Search (via) This latest SQLite extension from Alex Garcia is possibly his best yet: it adds FAISS-powered vector similarity search directly to SQLite, enabling fast KNN similarity lookups against a virtual table that feels a lot like SQLite’s own built-in full text search feature. This write-up includes interactive demos using Datasette called from an Observable notebook, running similarity searches against an index of 200,000 news headlines and summaries in less than 50ms.

# 10th February 2023, 10:53 pm / sqlite, datasette, observable, alex-garcia, vector-search

ChatGPT Is a Blurry JPEG of the Web. Science fiction author Ted Chiang offers a brilliant analogy for ChatGPT in this New Yorker article: it's a highly lossy compression algorithm for a vast amount of information which works like a JPEG, and uses grammatically correct interpolation to fill back in the missing gaps.

ChatGPT is so good at this form of interpolation that people find it entertaining: they’ve discovered a “blur” tool for paragraphs instead of photos, and are having a blast playing with it.

# 9th February 2023, 9:28 pm / new-yorker, ai, gpt-3, generative-ai, chatgpt, llms, ted-chiang

OpenAI’s Whisper is another case study in Colonisation (via) Really interesting perspective on Whisper from the Papa Reo project - a group working to nurture and proliferate the Māori language.

The main questions we ask when we see papers like FLEURS and Whisper are: where did they get their indigenous data from, who gave them access to it, and who gave them the right to create a derived work from that data and then open source the derivation?

# 8th February 2023, 5:22 pm / openai, generative-ai, whisper, speech-to-text

PocketPy. PocketPy is “a lightweight(~5000 LOC) Python interpreter for game engines”. It’s implemented as a single C++ header which provides an impressive subset of the Python language: functions, dictionaries, lists, strings and basic classes too. There’s also a browser demo that loads a 766.66 KB pypocket.wasm file (240.72 KB compressed) and uses it to power a basic terminal interface. I tried and failed to get that pypocket.wasm file working from wasmer/wasmtime/wasm3—it should make a really neat lightweight language to run in a WebAssembly sandbox.

# 8th February 2023, 5:13 am / python, webassembly

Years

Tags