Simon Willison’s Weblog

Subscribe
Atom feed

Blogmarks

Filters: Sorted by date

AWS Fixes Data Exfiltration Attack Angle in Amazon Q for Business. An indirect prompt injection (where the AWS Q bot consumes malicious instructions) could result in Q outputting a markdown link to a malicious site that exfiltrated the previous chat history in a query string.

Amazon fixed it by preventing links from being output at all—apparently Microsoft 365 Chat uses the same mitigation.

# 19th January 2024, 12:02 pm / aws, security, ai, prompt-injection, generative-ai, llms, exfiltration-attacks

Open Source LLMs with Simon Willison. I was invited to the Oxide and Friends weekly audio show (previously on Twitter Spaces, now using broadcast using Discord) to talk about open source LLMs, and to respond to a very poorly considered op-ed calling for them to be regulated as “uniquely dangerous”. It was a really fun conversation, now available to listen to as a podcast or YouTube audio-only video.

# 17th January 2024, 8:53 pm / open-source, podcasts, ai, generative-ai, llms, oxide, podcast-appearances

On being listed in the court document as one of the artists whose work was used to train Midjourney, alongside 4,000 of my closest friends (via) Poignant webcomic from Cat and Girl.

I want to make my little thing and put it out in the world and hope that sometimes it means something to somebody else.

Without exploiting anyone.

And without being exploited.

# 16th January 2024, 7:02 pm / ethics, ai, generative-ai, midjourney, text-to-image, ai-ethics

Daniel Situnayake explains TinyML in a Hacker News comment. Daniel worked on TensorFlow Lite at Google and co-wrote the TinyML O’Reilly book. He just posted a multi-paragraph comment on Hacker News explaining the term and describing some of the recent innovations in that space.

“TinyML means running machine learning on low power embedded devices, like microcontrollers, with constrained compute and memory.”

# 16th January 2024, 6:46 pm / machine-learning, ai, tinyml

Slashing Data Transfer Costs in AWS by 99% (via) Brilliant trick by Daniel Kleinstein. If you have data in two availability zones in the same AWS region, transferring a TB will cost you $10 in ingress and $10 in egress at the inter-zone rates charged by AWS.

But... transferring data to an S3 bucket in that same region is free (aside from S3 storage costs). And buckets are available with free transfer to all availability zones in their region, which means that TB of data can be transferred between availability zones for mere cents of S3 storage costs provided you delete the data as soon as it’s transferred.

# 15th January 2024, 10:22 pm / aws, s3

SQLite 3.45. Released today. The big new feature is JSONB support, a new, specific-to-SQLite binary internal representation of JSON which can provide up to a 3x performance improvement for JSON-heavy operations, plus a 5-10% saving it terms of bytes stored on disk.

# 15th January 2024, 8:15 pm / databases, json, sqlite

Making a Discord bot with PHP (via) Building bots for Discord used to require a long-running process that stayed connected, but a more recent change introduced slash commands via webhooks, making it much easier to write a bot that is backed by a simple request/response HTTP endpoint. Stuart Langridge explores how to build these in PHP here, but the same pattern in Python should be quite straight-forward.

# 14th January 2024, 10:56 pm / php, stuart-langridge, webhooks, discord

How We Executed a Critical Supply Chain Attack on PyTorch (via) Report on a now handled supply chain attack reported against PyTorch which took advantage of GitHub Actions, stealing credentials from some self-hosted task runners.

The researchers first submitted a typo fix to the PyTorch repo, which gave them status as a “contributor” to that repo and meant that their future pull requests would have workflows executed without needing manual approval.

Their mitigation suggestion is to switch the option from ’Require approval for first-time contributors’ to ‘Require approval for all outside collaborators’.

I think GitHub could help protect against this kind of attack by making it more obvious when you approve a PR to run workflows in a way that grants that contributor future access rights. I’d like a “approve this time only” button separate from “approve this run and allow future runs from user X”.

# 14th January 2024, 7:38 pm / github, security, pytorch, supply-chain

More than an OpenAI Wrapper: Perplexity Pivots to Open Source. I’m increasingly impressed with Perplexity.ai—I’m using it on a daily basis now. It’s by far the best implementation I’ve seen of LLM-assisted search—beating Microsoft Bing and Google Bard at their own game.

A year ago it was implemented as a GPT 3.5 powered wrapper around Microsoft Bing. To my surprise they’ve now evolved way beyond that: Perplexity has their own search index now and is running their own crawlers, and they’re using variants of Mistral 7B and Llama 70B as their models rather than continuing to depend on OpenAI.

# 13th January 2024, 6:12 am / crawling, search, ai, generative-ai, llms, perplexity, ai-assisted-search

Marimo (via) This is a really interesting new twist on Python notebooks.

The most powerful feature is that these notebooks are reactive: if you change the value or code in a cell (or change the value in an input widget) every other cell that depends on that value will update automatically. It’s the same pattern implemented by Observable JavaScript notebooks, but now it works for Python.

There are a bunch of other nice touches too. The notebook file format is a regular Python file, and those files can be run as “applications” in addition to being edited in the notebook interface. The interface is very nicely built, especially for such a young project—they even have GitHub Copilot integration for their CodeMirror cell editors.

# 12th January 2024, 9:17 pm / open-source, python, jupyter, observable, github-copilot, marimo

Where is all of the fediverse? (via) Neat piece of independent research by Ben Cox, who used the /api/v1/instance/peers Mastodon API endpoint to get a list of “peers” (instances his instance knows about), then used their DNS records to figure out which hosting provider they were running on.

Next Ben combined that with active users from the /nodeinfo/2.0 API on each instance to figure out the number of users on each of those major hosting providers.

Cloudflare and Fastly were heavily represented, but it turns out you can unveil the underlying IP for most instances by triggering an HTTP Signature exchange with them and logging the result.

Ben’s conclusion: Hertzner and OVH are responsible for hosting a sizable portion of the fediverse as it exists today.

# 12th January 2024, 6:54 pm / dns, hosting, mastodon, fediverse

Budgeting with ChatGPT (via) Jon Callahan describes an ingenious system he set up to categorize his credit card transactions using GPT 3.5. He has his bank email him details of any transaction over $0, then has an email filter to forward those to Postmark, which sends them via a JSON webhook to a custom Deno Deploy app which cleans the transaction up with a GPT 3.5 prompt (including guessing the merchant) and submits the results to a base in Airtable.

# 11th January 2024, 4:40 am / email, ai, airtable, openai, llms

AI versus old-school creativity: a 50-student, semester-long showdown (via) An interesting study in which 50 university students “wrote, coded, designed, modeled, and recorded creations with and without AI, then judged the results”.

This study seems to explore the approach of incremental prompting to produce an AI-driven final results. I use GPT-4 on a daily basis but my usage patterns are quite different: I very rarely let it actually write anything for me, instead using it as brainstorming partner, or to provide feedback, or as API reference or a thesaurus.

# 10th January 2024, 11:49 pm / education, ai, generative-ai, chatgpt, llms

You Can Build an App in 60 Minutes with ChatGPT, with Geoffrey Litt (via) YouTube interview between Dan Shipper and Geoffrey Litt. They talk about how ChatGPT can build working React applications and how this means you can build extremely niche applications that you woudn’t have considered working on before—then to demonstrate that idea, they collaborate to build a note-taking app to be used just during that specific episode recording, pasting React code from ChatGPT into Replit.

Geoffrey: “I started wondering what if we had a world where everybody could craft software tools that match the workflows they want to have, unique to themselves and not just using these pre-made tools. That’s what malleable software means to me.”

# 10th January 2024, 11:41 pm / ai, react, generative-ai, chatgpt, llms, geoffrey-litt

The Random Transformer (via) “Understand how transformers work by demystifying all the math behind them”—Omar Sanseviero from Hugging Face meticulously implements the transformer architecture behind LLMs from scratch using Python and numpy. There’s a lot to take in here but it’s all very clearly explained.

# 10th January 2024, 5:09 am / python, transformers, ai, numpy, generative-ai, llms

ooh.directory: A page for every blog. I hadn’t checked in on Phil Gyford’s ooh.directory blog directory since it first launched in November 2022. I’m delighted to see that it’s thriving—2,117 blogs have now been carefully curated, and the latest feature is a page for each blog showing its categories, description, an activity graph and the most recent posts syndicated via RSS/Atom.

# 9th January 2024, 10:15 pm / atom, blogs, phil-gyford, rss, syndication

The Eight Golden Rules of Interface Design (via) By HCI researcher Ben Shneiderman. I particularly like number 4, “Design dialogs to yield closure”, which encourages feedback at the completion of a group of actions that “gives users the satisfaction of accomplishment, a sense of relief.”

# 9th January 2024, 9:37 pm / usability, ux

WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia. This paper describes a really interesting LLM system that runs Retrieval Augmented Generation against Wikipedia to help answer questions, but includes a second step where facts in the answer are fact-checked against Wikipedia again before returning an answer to the user. They claim “97.3% factual accuracy of its claims in simulated conversation” on a GPT-4 backed version, and also see good results when backed by LLaMA 7B.

The implementation is mainly through prompt engineering, and detailed examples of the prompts they used are included at the end of the paper.

# 9th January 2024, 9:30 pm / wikipedia, ai, prompt-engineering, generative-ai, llms, rag, hallucinations

Python 3.13 gets a JIT. “In late December 2023 (Christmas Day to be precise), CPython core developer Brandt Bucher submitted a little pull-request to the Python 3.13 branch adding a JIT compiler.”

Anthony Shaw does a deep dive into this new experimental JIT, explaining how it differs from other JITs. It’s an implementation of a copy-and-patch JIT, an idea that only emerged in 2021. This makes it architecturally much simpler than a traditional JIT, allowing it to compile faster and take advantage of existing LLVM tools on different architectures.

So far it’s providing a 2-9% performance improvement, but the real impact will be from the many future optimizations it enables.

# 9th January 2024, 9:25 pm / jit, llvm, python, anthony-shaw

Mixtral of Experts. The Mixtral paper is out, exactly a month after the release of the Mixtral 8x7B model itself. Thanks to the paper I now have a reasonable understanding of how a mixture of experts model works: each layer has 8 available blocks, but a router model selects two out of those eight for each token passing through that layer and combines their output. “As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference.”

The Mixtral token context size is an impressive 32k, and it compares extremely well against the much larger Llama 70B across a whole array of benchmarks.

Unsurprising but disappointing: there’s nothing in the paper at all about what it was trained on.

# 9th January 2024, 4:03 am / ai, generative-ai, local-llms, llms, mistral

OpenAI and journalism. Bit of a misleading title here: this is OpenAI’s first public response to the lawsuit filed by the New York Times concerning their use of unlicensed NYT content to train their models.

# 8th January 2024, 6:33 pm / copyright, new-york-times, ai, openai, generative-ai, llms

Does GPT-2 Know Your Phone Number? (via) This report from Berkeley Artificial Intelligence Research in December 2020 showed GPT-3 outputting a full page of chapter 3 of Harry Potter and the Philosopher’s Stone—similar to how the recent suit from the New York Times against OpenAI and Microsoft demonstrates memorized news articles from that publication as outputs from GPT-4.

# 8th January 2024, 5:26 am / microsoft, new-york-times, ai, gpt-3, openai, generative-ai, llms, gpt-2

Text Embeddings Reveal (Almost) As Much As Text. Embeddings of text—where a text string is converted into a fixed-number length array of floating point numbers—are demonstrably reversible: “a multi-step method that iteratively corrects and re-embeds text is able to recover 92% of 32-token text inputs exactly”.

This means that if you’re using a vector database for embeddings of private data you need to treat those embedding vectors with the same level of protection as the original text.

# 8th January 2024, 5:22 am / privacy, security, ai, embeddings

GPT in 500 lines of SQL (via) Utterly brilliant piece of PostgreSQL hackery by Alex Bolenok, who implements a full GPT-2 style language model in SQL on top of pg_vector. The final inference query is 498 lines long!

# 6th January 2024, 10:55 pm / postgresql, sql, ai, generative-ai, llms, gpt-2

Microsoft Research relicense Phi-2 as MIT (via) Phi-2 was already an interesting model—really strong results for its size—made available under a non-commercial research license. It just got significantly more interesting: Microsoft relicensed it as MIT open source.

# 6th January 2024, 6:06 am / microsoft, open-source, mitlicense, ai, generative-ai, llms, phi

Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations (via) NIST—the National Institute of Standards and Technology, a US government agency, released a 106 page report on attacks against modern machine learning models, mostly covering LLMs.

Prompt injection gets two whole sections, one on direct prompt injection (which incorporates jailbreaking as well, which they misclassify as a subset of prompt injection) and one on indirect prompt injection.

They talk a little bit about mitigations, but for both classes of attack conclude: “Unfortunately, there is no comprehensive or foolproof solution for protecting models against adversarial prompting, and future work will need to be dedicated to investigating suggested defenses for their efficacy.”

# 6th January 2024, 4:08 am / jailbreaking, ai, prompt-injection, generative-ai, llms

My blog’s year archive pages now have tag clouds (via) Inspired by the tag cloud I used in my recent 2023 AI roundup post, I decided to add a tag cloud to the top of every one of my archive-by-year pages showing what topics I had spent the most time with that year.

I already had old code for this, so I pasted it into GPT-4 along with an example of the output of my JSON endpoint from Django SQL Dashboard and had it do most of the work for me.

# 4th January 2024, 9:02 pm / projects, ai, django-sql-dashboard, chatgpt, llms

container2wasm (via) “Converts a container to WASM with emulation by Bochs (for x86_64 containers) and TinyEMU (for riscv64 containers)”—effectively letting you take a Docker container and turn it into a WebAssembly blob that can then run in any WebAssembly host environment, including the browser.

Run “c2w ubuntu:22.04 out.wasm” to output a WASM binary for the Ubuntu 22:04 container from Docker Hub, then “wasmtime out.wasm uname -a” to run a command.

Even better, check out the live browser demos linked fro the README, which let you do things like run a Python interpreter in a Docker container directly in your browser.

# 3rd January 2024, 11:21 pm / docker, webassembly

Fastest Way to Read Excel in Python (via) Haki Benita produced a meticulously researched and written exploration of the options for reading a large Excel spreadsheet into Python. He explored Pandas, Tablib, Openpyxl, shelling out to LibreOffice, DuckDB and python-calamine (a Python wrapper of a Rust library). Calamine was the winner, taking 3.58s to read 500,00 rows—compared to Pandas in last place at 32.98s.

# 3rd January 2024, 8:04 pm / excel, pandas, python, rust, duckdb, haki-benita

NPM: modele-social (via) This is a fascinating open source package: it’s an NPM module containing an implementation of the rules for calculating social security contributions in France, maintained by a team at Urssaf, the not-quite-government organization in France that manages the collection of social security contributions there.

The rules themselves can be found in the associated GitHub repository, encoded in a YAML-like declarative language called Publicodes that was developed by the French government for this and similar purposes.

# 2nd January 2024, 5:55 pm / government, open-source, npm

Years

Tags