Simon Willison’s Weblog

Subscribe

Blogmarks

Filters: Sorted by date

mistralai/mistral-common. New from Mistral: mistral-common, an open source Python library providing "a set of tools to help you work with Mistral models".

So far that means a tokenizer! This is similar to OpenAI's tiktoken library in that it lets you run tokenization in your own code, which crucially means you can count the number of tokens that you are about to use - useful for cost estimates but also for cramming the maximum allowed tokens in the context window for things like RAG.

Mistral's library is better than tiktoken though, in that it also includes logic for correctly calculating the tokens needed for conversation construction and tool definition. With OpenAI's APIs you're currently left guessing how many tokens are taken up by these advanced features.

Anthropic haven't published any form of tokenizer at all - it's the feature I'd most like to see from them next.

Here's how to explore the vocabulary of the tokenizer:

MistralTokenizer.from_model(
    "open-mixtral-8x22b"
).instruct_tokenizer.tokenizer.vocab()[:12]

['<unk>', '<s>', '</s>', '[INST]', '[/INST]', '[TOOL_CALLS]', '[AVAILABLE_TOOLS]', '[/AVAILABLE_TOOLS]', '[TOOL_RESULTS]', '[/TOOL_RESULTS]']

# 18th April 2024, 12:39 am / python, ai, openai, prompt-engineering, generative-ai, llms, anthropic, mistral, rag, tokenization

Scammers are targeting teenage boys on social media—and driving some to suicide. (via) Horrifying in depth report describing sextortion scams: a scammer tricks a teenage boy into sending them reciprocal nude photos, then instantly starts blackmailing them by threatening to forward those photos to their friends and family members. Most online scams take weeks or even months to play out—these scams can turn to blackmail within minutes.

# 17th April 2024, 3:10 am / scams

Google NotebookLM Data Exfiltration (via) NotebookLM is a Google Labs product that lets you store information as sources (mainly text files in PDF) and then ask questions against those sources—effectively an interface for building your own custom RAG (Retrieval Augmented Generation) chatbots.

Unsurprisingly for anything that allows LLMs to interact with untrusted documents, it’s susceptible to prompt injection.

Johann Rehberger found some classic prompt injection exfiltration attacks: you can create source documents with instructions that cause the chatbot to load a Markdown image that leaks other private data to an external domain as data passed in the query string.

Johann reported this privately in the December but the problem has not yet been addressed. UPDATE: The NotebookLM team deployed a fix for this on 18th April.

A good rule of thumb is that any time you let LLMs see untrusted tokens there is a risk of an attack like this, so you should be very careful to avoid exfiltration vectors like Markdown images or even outbound links.

# 16th April 2024, 9:28 pm / google, security, ai, prompt-injection, generative-ai, llms, rag, exfiltration-attacks, johann-rehberger, notebooklm

inline-snapshot. I'm a big fan of snapshot testing, where expected values are captured the first time a test suite runs and then asserted against in future runs. It's a very productive way to build a robust test suite.

inline-snapshot by Frank Hoffmann is a particularly neat implementation of the pattern. It defines a snapshot() function which you can use in your tests:

assert 1548 * 18489 == snapshot()

When you run that test using pytest --inline-snapshot=create the snapshot() function will be replaced in your code (using AST manipulation) with itself wrapping the repr() of the expected result:

assert 1548 * 18489 == snapshot(28620972)

If you modify the code and need to update the tests you can run pytest --inline-snapshot=fix to regenerate the recorded snapshot values.

# 16th April 2024, 4:04 pm / python, testing, pytest

OpenAI Batch API (via) OpenAI are now offering a 50% discount on batch chat completion API calls if you submit them in bulk and allow for up to 24 hours for them to be run.

Requests are sent as a newline-delimited JSON file, with each line looking something like this:

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is 2+2?"}]}}

You upload a file for the batch, kick off a batch request and then poll for completion.

This makes GPT-3.5 Turbo cheaper than Claude 3 Haiku - provided you're willing to wait a few hours for your responses.

# 15th April 2024, 5:58 pm / ai, openai, generative-ai, llms

redka (via) Anton Zhiyanov’s new project to build a subset of Redis (including protocol support) using Go and SQLite. Also works as a Go library.

The guts of the SQL implementation are in the internal/sqlx folder.

# 14th April 2024, 3:21 pm / go, redis, sqlite, anton-zhiyanov

Lessons after a half-billion GPT tokens (via) Ken Kantzer presents some hard-won experience from shipping real features on top of OpenAI’s models.

They ended up settling on a very basic abstraction over the chat API—mainly to handle automatic retries on a 500 error. No complex wrappers, not even JSON mode or function calling or system prompts.

Rather than counting tokens they estimate tokens as 3 times the length in characters, which works well enough.

One challenge they highlight for structured data extraction (one of my favourite use-cases for LLMs): “GPT really cannot give back more than 10 items. Trying to have it give you back 15 items? Maybe it does it 15% of the time.”

(Several commenters on Hacker News report success in getting more items back by using numbered keys or sequence IDs in the returned JSON to help the model keep count.)

# 13th April 2024, 8:54 pm / ai, openai, prompt-engineering, generative-ai, llms

How we built JSR (via) Really interesting deep dive by Luca Casonato into the engineering behind the new JSR alternative JavaScript package registry launched recently by Deno.

The backend uses PostgreSQL and a Rust API server hosted on Google Cloud Run.

The frontend uses Fresh, Deno’s own server-side JavaScript framework which leans heavily in the concept of “islands”—a progressive enhancement technique where pages are rendered on the server and small islands of interactivity are added once the page has loaded.

# 12th April 2024, 3:49 pm / javascript, postgresql, progressive-enhancement, rust, deno

3Blue1Brown: Attention in transformers, visually explained. Grant Sanderson publishes animated explainers of mathematical topics on YouTube, to over 6 million subscribers. His latest shows how the attention mechanism in transformers (the algorithm behind most LLMs) works and is by far the clearest explanation I’ve seen of the topic anywhere.

I was intrigued to find out what tool he used to produce the visualizations. It turns out Grant built his own open source Python animation library, manim, to enable his YouTube work.

# 11th April 2024, 4:12 pm / ai, generative-ai, llms

Use an llm to automagically generate meaningful git commit messages. Neat, thoroughly documented recipe by Harper Reed using my LLM CLI tool as part of a scheme for if you’re feeling too lazy to write a commit message—it uses a prepare-commit-msg Git hook which runs any time you commit without a message and pipes your changes to a model along with a custom system prompt.

# 11th April 2024, 4:06 am / cli, git, ai, generative-ai, llm

Shell History Is Your Best Productivity Tool (via) Martin Heinz drops a wealth of knowledge about ways to configure zsh (the default shell on macOS these days) to get better utility from your shell history.

# 10th April 2024, 11:17 pm / productivity, shell, zsh

Notes on how to use LLMs in your product. A whole bunch of useful observations from Will Larson here. I love his focus on the key characteristic of LLMs that “you cannot know whether a given response is accurate”, nor can you calculate a dependable confidence score for a response—and as a result you need to either “accept potential inaccuracies (which makes sense in many cases, humans are wrong sometimes too) or keep a Human-in-the-Loop (HITL) to validate the response.”

# 10th April 2024, 11:14 pm / will-larson, ai, generative-ai, llms

Gemini 1.5 Pro public preview (via) Huge release from Google: Gemini 1.5 Pro—the GPT-4 competitive model with the incredible 1 million token context length—is now available without a waitlist in 180+ countries (including the USA but not Europe or the UK as far as I can tell)... and the API is free for 50 requests/day (rate limited to 2/minute).

Beyond that you’ll need to pay—$7/million input tokens and $21/million output tokens, which is slightly less than GPT-4 Turbo and a little more than Claude 3 Sonnet.

They also announced audio input (up to 9.5 hours in a single prompt), system instruction support and a new JSON mode.

# 10th April 2024, 2:38 am / google, ai, generative-ai, llms, gemini, vision-llms, llm-pricing, llm-release

Mistral tweet a magnet link for mixtral-8x22b. Another open model release from Mistral using their now standard operating procedure of tweeting out a raw torrent link.

This one is an 8x22B Mixture of Experts model. Their previous most powerful openly licensed release was Mixtral 8x7B, so this one is a whole lot bigger (a 281GB download)—and apparently has a 65,536 context length, at least according to initial rumors on Twitter.

# 10th April 2024, 2:31 am / ai, generative-ai, local-llms, llms, mistral, llm-release

Extracting data from unstructured text and images with Datasette and GPT-4 Turbo. Datasette Extract is a new Datasette plugin that uses GPT-4 Turbo (released to general availability today) and GPT-4 Vision to extract structured data from unstructured text and images.

I put together a video demo of the plugin in action today, and posted it to the Datasette Cloud blog along with screenshots and a tutorial describing how to use it.

# 9th April 2024, 11:03 pm / projects, ai, datasette, datasette-cloud, openai, generative-ai, gpt-4, llms, vision-llms, structured-extraction

A solid pattern to build LLM Applications (feat. Claude) (via) Hrishi Olickel is one of my favourite prompt whisperers. In this YouTube video he walks through his process for building quick interactive applications with the assistance of Claude 3, spinning up an app that analyzes his meeting transcripts to extract participants and mentioned organisations, then presents a UI for exploring the results built with Next.js and shadcn/ui.

An interesting tip I got from this: use the weakest, not the strongest models to iterate on your prompts. If you figure out patterns that work well with Claude 3 Haiku they will have a significantly lower error rate with Sonnet or Opus. The speed of the weaker models also means you can iterate much faster, and worry less about the cost of your experiments.

# 9th April 2024, 6:39 pm / ai, generative-ai, llms, ai-assisted-programming, claude

Command R+ now ranked 6th on the LMSYS Chatbot Arena. The LMSYS Chatbot Arena Leaderboard is one of the most interesting approaches to evaluating LLMs because it captures their ever-elusive “vibes”—it works by users voting on the best responses to prompts from two initially hidden models

Big news today is that Command R+—the brand new open weights model (Creative Commons non-commercial) by Cohere—is now the highest ranked non-proprietary model, in at position six and beating one of the GPT-4s.

(Linking to my screenshot on Mastodon.)

# 9th April 2024, 4:19 pm / ai, generative-ai, llms, cohere, command-r, chatbot-arena

llm.c (via) Andrej Karpathy implements LLM training—initially for GPT-2, other architectures to follow—in just over 1,000 lines of C on top of CUDA. Includes a tutorial about implementing LayerNorm by porting an implementation from Python.

# 9th April 2024, 3:24 pm / c, ai, andrej-karpathy, generative-ai, llms, gpt-2

Hello World (via) Lennon McLean dives deep down the rabbit hole of what happens when you execute the binary compiled from “Hello world” in C on a Linux system, digging into the details of ELF executables, objdump disassembly, the C standard library, stack frames, null-terminated strings and taking a detour through musl because it’s easier to read than Glibc.

# 9th April 2024, 1:06 am / c, linux

Introducing Enhance WASM (via) “Backend agnostic server-side rendering (SSR) for Web Components”—fascinating new project from Brian LeRoux and Begin.

The idea here is to provide server-side rendering of Web Components using WebAssembly that can run on any platform that is supported within the Extism WASM ecosystem.

The key is the enhance-ssr.wasm bundle, a 4.1MB WebAssembly version of the enhance-ssr JavaScript library, compiled using the Extism JavaScript PDK (Plugin Development Kit) which itself bundles a WebAssembly version of QuickJS.

# 8th April 2024, 7:44 pm / javascript, web-components, webassembly, quickjs

The lifecycle of a code AI completion (via) Philipp Spiess provides a deep dive into how Sourcegraph's Cody code completion assistant works. Lots of fascinating details in here:

"One interesting learning was that if a user is willing to wait longer for a multi-line request, it usually is worth it to increase latency slightly in favor of quality. For our production setup this means we use a more complex language model for multi-line completions than we do for single-line completions."

This article is from October 2023 and talks about Claude Instant. The code for Cody is open source so I checked to see if they have switched to Haiku yet and found a commit from March 25th that adds Haiku as an A/B test.

# 7th April 2024, 7:37 pm / ai, generative-ai, llms, ai-assisted-programming, anthropic, claude

datasette-import. A new plugin for importing data into Datasette. This is a replacement for datasette-paste, duplicating and extending its functionality. datasette-paste had grown beyond just dealing with pasted CSV/TSV/JSON data—it handles file uploads as well now—which inspired the new name.

# 6th April 2024, 10:40 pm / plugins, projects, datasette

Everything I Know About the XZ Backdoor (via) Evan Boehs provides the most detailed timeline I’ve seen of the recent xz story, where a backdoor was inserted into the xz compression library in an attempt to compromise OpenSSH.

# 5th April 2024, 10:58 pm / security

s3-credentials 0.16. I spent entirely too long this evening trying to figure out why files in my new supposedly public S3 bucket were unavailable to view. It turns out these days you need to set a PublicAccessBlockConfiguration of {"BlockPublicAcls": false, "IgnorePublicAcls": false, "BlockPublicPolicy": false, "RestrictPublicBuckets": false}.

The s3-credentials --create-bucket --public option now does that for you. I also added a s3-credentials debug-bucket name-of-bucket command to help figure out why a bucket isn't working as expected.

# 5th April 2024, 5:35 am / aws, projects, s3, s3-credentials

llm-command-r. Cohere released Command R Plus today—an open weights (non commercial/research only) 104 billion parameter LLM, a big step up from their previous 35 billion Command R model.

Both models are fine-tuned for both tool use and RAG. The commercial API has features to expose this functionality, including a web-search connector which lets the model run web searches as part of answering the prompt and return documents and citations as part of the JSON response.

I released a new plugin for my LLM command line tool this morning adding support for the Command R models.

In addition to the two models it also adds a custom command for running prompts with web search enabled and listing the referenced documents.

# 4th April 2024, 5:38 pm / plugins, projects, ai, generative-ai, llms, llm, cohere, command-r, rag, llm-tool-use, llm-release

The cost of AI reasoning over time (via) Karina Nguyen from Anthropic provides a fascinating visualization illustrating the cost of different levels of LLM over the past few years, plotting their cost-per-token against their scores on the MMLU benchmark.

Claude 3 Haiku currently occupies the lowest cost to score ratio, over on the lower right hand side of the chart.

# 4th April 2024, 12:51 pm / ai, generative-ai, llms, anthropic, claude

Kobold letters (via) Konstantin Weddige explains a sophisticated HTML email phishing vector he calls Kobold emails.

When you forward a message, most HTML email clients will indent the forward by nesting it inside another element.

This means CSS rules within the email can be used to cause an element that was invisible in the original email to become visible when it is forwarded—allowing tricks like a forwarded innocuous email from your boss adding instructions for wiring money from the company bank account.

Gmail strips style blocks before forwarding—which it turns out isn’t protection against this, because you can put a style block in the original email to hide the attack text which will then be stripped for you when the email is forwarded.

# 4th April 2024, 12:43 pm / css, email, security

Enforcing conventions in Django projects with introspection (via) Luke Plant shows how to use the Django system checks framework to introspect models on startup and warn if a DateTime or Date model field has been added that doesn’t conform to a specific naming convention.

Luke also proposes “*_at” as a convention for DateTimes, contrasting with “*_on” or “*_date” (I prefer the latter) for Dates.

# 3rd April 2024, 2:58 pm / django, luke-plant

Cally: Accessibility statement (via) Cally is a neat new open source date (and date range) picker Web Component by Nick Williams.

It’s framework agnostic and weighs less than 9KB grilled, but the best feature is this detailed page of documentation covering its accessibility story, including how it was tested—in JAWS, NVDA and VoiceOver.

I’d love to see other open source JavaScript libraries follow this example.

# 2nd April 2024, 7:38 pm / accessibility, javascript, open-source, web-components

Bringing Python to Workers using Pyodide and WebAssembly (via) Cloudflare Workers is Cloudflare’s serverless hosting tool for deploying server-side functions to edge locations in their CDN.

They just released Python support, accompanied by an extremely thorough technical explanation of how they got that to work. The details are fascinating.

Workers runs on V8 isolates, and the new Python support was implemented using Pyodide (CPython compiled to WebAssembly) running inside V8.

Getting this to work performantly and ergonomically took a huge amount of work.

There are too many details in here to effectively summarize, but my favorite detail is this one:

“We scan the Worker’s code for import statements, execute them, and then take a snapshot of the Worker’s WebAssembly linear memory. Effectively, we perform the expensive work of importing packages at deploy time, rather than at runtime.”

# 2nd April 2024, 4:09 pm / python, serverless, cloudflare, webassembly, pyodide

Years

Tags