Simon Willison’s Weblog

On javascript 647 security 457 ethics 103 llms 527 andrejkarpathy 16 ...

 

Recent

June 22, 2024

In our “who validates the validators” user studies, we found that people expected—and also desired—for the LLM to learn from *any* human interaction. That too, “as efficiently as possible” (ie after 1-2 demonstrations, the LLM should “get it”)

Shreya Shankar # 6 pm

Wikipedia Manual of Style: Linking (via) I started a conversation on Mastodon about the grammar of linking: how to decide where in a phrase an inline link should be placed.

Lots of great (and varied) replies there. The most comprehensive style guide I've seen so far is this one from Wikipedia, via Tom Morris. # 2:15 pm

June 21, 2024

Datasette 0.64.8. A very small Datasette release, fixing a minor potential security issue where the name of missing databases or tables was reflected on the 404 page in a way that could allow an attacker to present arbitrary text to a user who followed a link. Not an XSS attack (no code could be executed) but still a potential vector for confusing messages. # 11:48 pm

Building search-based RAG using Claude, Datasette and Val Town

Visit Building search-based RAG using Claude, Datasette and Val Town

Retrieval Augmented Generation (RAG) is a technique for adding extra “knowledge” to systems built on LLMs, allowing them to answer questions against custom information not included in their training data. A common way to implement this is to take a question from a user, translate that into a set of search queries, run those against a search engine and then feed the results back into the LLM to generate an answer.

[... 3362 words]

OpenAI was founded to build artificial general intelligence safely, free of outside commercial pressures. And now every once in a while it shoots out a new AI firm whose mission is to build artificial general intelligence safely, free of the commercial pressures at OpenAI.

Matt Levine # 5:40 am

Val Vibes: Semantic search in Val Town. A neat case-study by JP Posma on how Val Town's developers can use Val Town Vals to build prototypes of new features that later make it into Val Town core.

This one explores building out semantic search against Vals using OpenAI embeddings and the PostgreSQL pgvector extension. # 2:16 am

It is in the public good to have AI produce quality and credible (if ‘hallucinations’ can be overcome) output. It is in the public good that there be the creation of original quality, credible, and artistic content. It is not in the public good if quality, credible content is excluded from AI training and output OR if quality, credible content is not created.

Jeff Jarvis # 2:04 am

June 20, 2024

llm-claude-3 0.4. LLM plugin release adding support for the new Claude 3.5 Sonnet model:

pipx install llm
llm install -U llm-claude-3
llm keys set claude
# paste AP| key here
llm -m claude-3.5-sonnet \
  'a joke about a pelican and a walrus having lunch'
# 11:04 pm

One of the core constitutional principles that guides our AI model development is privacy. We do not train our generative models on user-submitted data unless a user gives us explicit permission to do so. To date we have not used any customer or user-submitted data to train our generative models.

Anthropic # 7:19 pm

Claude 3.5 Sonnet. Anthropic released a new model this morning, and I think it's likely now the single best available LLM. Claude 3 Opus was already mostly on-par with GPT-4o, and the new 3.5 Sonnet scores higher than Opus on almost all of Anthropic's internal evals.

It's also twice the speed and one fifth of the price of Opus (it's the same price as the previous Claude 3 Sonnet). To compare:

  • gpt-4o: $5/million input tokens and $15/million output
  • Claude 3.5 Sonnet: $3/million input, $15/million output
  • Claude 3 Opus: $15/million input, $75/million output

Similar to Claude 3 Haiku then, which both under-cuts and out-performs OpenAI's GPT-3.5 model.

In addition to the new model, Anthropic also added a "artifacts" feature to their Claude web interface. The most exciting part of this is that any of the Claude models can now build and then render web pages and SPAs, directly in the Claude interface.

This means you can prompt them to e.g. "Build me a web app that teaches me about mandelbrot fractals, with interactive widgets" and they'll do exactly that - I tried that prompt on Claude 3.5 Sonnet earlier and the results were spectacular (video demo).

An unsurprising note at the end of the post:

To complete the Claude 3.5 model family, we’ll be releasing Claude 3.5 Haiku and Claude 3.5 Opus later this year.

If the pricing stays consistent with Claude 3, Claude 3.5 Haiku is going to be a very exciting model indeed. # 6:01 pm

[...] And then some absolute son of a bitch created ChatGPT, and now look at us. Look at us, resplendent in our pauper’s robes, stitched from corpulent greed and breathless credulity, spending half of the planet’s engineering efforts to add chatbot support to every application under the sun when half of the industry hasn’t worked out how to test database backups regularly.

Nikhil Suresh # 5:50 am

State-of-the-art music scanning by Soundslice. It's been a while since I checked in on Soundslice, Adrian Holovaty's beautiful web application focused on music education.

The latest feature is spectacular. The Soundslice music editor - already one of the most impressive web applications I've ever experienced - can now import notation directly from scans or photos of sheet music.

The attention to detail is immaculate. The custom machine learning model can handle a wide variety of notation details, and the system asks the user to verify or correct details that it couldn't perfectly determine using a neatly designed flow.

Free accounts can scan two single page documents a month, and paid plans get a much higher allowance. I tried it out just now on a low resolution image I found on Wikipedia and it did a fantastic job, even allowing me to listen to a simulated piano rendition of the music once it had finished processing.

It's worth spending some time with the release notes for the feature to appreciate how much work they've out into improving it since the initial release.

If you're new to Soundslice, here's an example of their core player interface which syncs the display of music notation to an accompanying video.

Adrian wrote up some detailed notes on the machine learning behind the feature when they first launched it in beta back in November 2022.

OMR [Optical Music Recognition] is an inherently hard problem, significantly more difficult than text OCR. For one, music symbols have complex spatial relationships, and mistakes have a tendency to cascade. A single misdetected key signature might result in multiple incorrect note pitches. And there’s a wide diversity of symbols, each with its own behavior and semantics — meaning the problems and subproblems aren’t just hard, there are many of them.

# 4:37 am

June 19, 2024

Civic Band. Exciting new civic tech project from Philip James: 30 (and counting) Datasette instances serving full-text search enabled collections of OCRd meeting minutes for different civic governments. Includes 20,000 pages for Alameda, 17,000 for Pittsburgh, 3,567 for Baltimore and an enormous 117,000 for Maui County.

Philip includes some notes on how they're doing it. They gather PDF minute notes from anywhere that provides API access to them, then run local Tesseract for OCR (the cost of cloud-based OCR proving prohibitive given the volume of data). The collection is then deployed to a single VPS running multiple instances of Datasette via Caddy, one instance for each of the covered regions. # 9:30 pm

Weeknotes: Datasette Studio and a whole lot of blogging

Visit Weeknotes: Datasette Studio and a whole lot of blogging

I’m still spinning back up after my trip back to the UK, so actual time spent building things has been less than I’d like. I presented an hour long workshop on command-line LLM usage, wrote five full blog entries (since my last weeknotes) and I’ve also been leaning more into short-form link blogging—a lot more prominent on this site now since my homepage redesign last week.

[... 735 words]

About the Lawrence Times (via) The town of Lawrence, Kansas is where Django was born. I'm delighted to learn that it has a new independent online news publication as-of March 2021 - the Lawrence Times.

It's always exciting to see local media startups like this one, and they've been publishing for three years now supported by both advertiser revenue and optional paid subscriptions. # 3:53 am

I’ve stopped using box plots. Should you? (via) Nick Desbarats explains box plots (including with this excellent short YouTube video) and then discusses why he thinks "typically less than 20 percent" of participants in his workshops already understand how to read them.

A key problem is that they are unintuitive: a box plot has four sections, two thin lines (the top and bottom whisker segments) and two larger boxes, joined around the median. Each of these elements represents the same number of samples (one quartile each) but the thin lines v.s. thick boxes imply that the whiskers contain less samples than the boxes. # 12:22 am

June 18, 2024

Tags with descriptions. Tiny new feature on my blog: I can now add optional descriptions to my tag pages, for example on datasette and sqliteutils and promptinjection.

I built this feature on a live call this morning as an unplanned demonstration of GitHub's new Copilot Workspace feature, where you can run a prompt against a repository and have it plan, implement and file a pull request implementing a change to the code.

My prompt was:

Add a feature that lets me add a description to my tag pages, stored in the database table for tags and visible on the /tags/x/ page at the top

It wasn't as compelling a demo as I expected: Copilot Workspace currently has to stream an entire copy of each file it modifies, which can take a long time if your codebase includes several large files that need to be changed.

It did create a working implementation on its first try, though I had given it an extra tip not to forget the database migration. I ended up making a bunch of changes myself before I shipped it, listed in the pull request.

I've been using Copilot Workspace quite a bit recently as a code explanation tool - I'll prompt it to e.g. "add architecture documentation to the README" on a random repository not owned by me, then read its initial plan to see what it's figured out without going all the way through to the implementation and PR phases. Example in this tweet where I figured out the rough design of the Jina AI Reader API for this post. # 4:50 pm

Claude: Building evals and test cases. More documentation updates from Anthropic: this section on writing evals for Claude is new today and includes Python code examples for a number of different evaluation techniques.

Included are several examples of the LLM-as-judge pattern, plus an example using cosine similarity and another that uses the new-to-me Rouge Python library that implements the ROUGE metric for evaluating the quality of summarized text. # 4:28 pm

Anthropic release notes (via) Anthropic have started publishing release notes! Currently available for their API and their apps (mobile and web).

What I'd really like to see are release notes for the models themselves, though as far as I can tell there haven't been any updates to those since the Claude 3 models were first released (the Haiku model name in the API is still claude-3-haiku-20240307 and Anthropic say they'll change that identifier after any updates to the model). # 4:25 pm

June 17, 2024

pkgutil.resolve_name(name) (via) Adam Johnson pointed out this utility method, added to the Python standard library in Python 3.9. It lets you provide a string that specifies a Python identifier to import from a module - a pattern frequently used in things like Django's configuration.

Path = pkgutil.resolve_name("pathlib:Path")
# 8:32 pm

How researchers cracked an 11-year-old password to a crypto wallet. If you used the RoboForm password manager to generate a password prior to their 2015 bug fix that password was generated using a pseudo-random number generator based on your device’s current time—which means an attacker may be able to brute-force the password from a shorter list of options if they can derive the rough date when it was created.

(In this case the password cracking was consensual, to recover a lost wallet, but this still serves as a warning to any RoboForm users with passwords from that era.) # 5:04 pm

Language models on the command-line

Visit Language models on the command-line

I gave a talk about accessing Large Language Models from the command-line last week as part of the Mastering LLMs: A Conference For Developers & Data Scientists six week long online conference. The talk focused on my LLM Python command-line utility and ways you can use it (and its plugins) to explore LLMs and use them for useful tasks.

[... 4992 words]

Most people think that we format Go code with gofmt to make code look nicer or to end debates among team members about program layout. But the most important reason for gofmt is that if an algorithm defines how Go source code is formatted, then programs, like goimports or gorename or go fix, can edit the source code more easily, without introducing spurious formatting changes when writing the code back. This helps you maintain code over time.

Russ Cox # 10:46 am

June 16, 2024

We’re adding the human touch, but that often requires a deep, developmental edit on a piece of writing. The grammar and word choice just sound weird. You’re always cutting out flowery words like ’therefore’ and ’nevertheless’ that don’t fit in casual writing. Plus, you have to fact-check the whole thing because AI just makes things up, which takes forever because it’s not just big ideas. AI hallucinates these flippant little things in throwaway lines that you’d never notice. [...]

It’s tedious, horrible work, and they pay you next to nothing for it.

Catrina Cowart # 8:47 pm

Jina AI Reader. Jina AI provide a number of different AI-related platform products, including an excellent family of embedding models, but one of their most instantly useful is Jina Reader, an API for turning any URL into Markdown content suitable for piping into an LLM.

Add r.jina.ai to the front of a URL to get back Markdown of that page, for example https://r.jina.ai/https://simonwillison.net/2024/Jun/16/jina-ai-reader/ - in addition to converting the content to Markdown it also does a decent job of extracting just the content and ignoring the surrounding navigation.

The API is free but rate-limited (presumably by IP) to 20 requests per minute without an API key or 200 request per minute with a free API key, and you can pay to increase your allowance beyond that.

The Apache 2 licensed source code for the hosted service is on GitHub - it's written in TypeScript and uses Puppeteer to run Readabiliy.js and Turndown against the scraped page.

It can also handle PDFs, which have their contents extracted using PDF.js.

There's also a search feature, s.jina.ai/search+term+goes+here, which uses the Brave Search API. # 7:33 pm

GitHub Copilot Chat: From Prompt Injection to Data Exfiltration (via) Yet another example of the same vulnerability we see time and time again.

If you build an LLM-based chat interface that gets exposed to both private and untrusted data (in this case the code in VS Code that Copilot Chat can see) and your chat interface supports Markdown images, you have a data exfiltration prompt injection vulnerability.

The fix, applied by GitHub here, is to disable Markdown image references to untrusted domains. That way an attack can't trick your chatbot into embedding an image that leaks private data in the URL.

Previous examples: ChatGPT itself, Google Bard, Writer.com, Amazon Q, Google NotebookLM. I'm tracking them here using my new markdownexfiltration tag. # 12:35 am

June 15, 2024

Notes on upgrading by blog’s Heroku database plan. Heroku discontinued the “Basic” PostgreSQL plan I’ve been using for my blog, so I just upgraded to the new “essential-0” tier. Here are my notes as a GitHub issue—it was very straightforward, and I’m really only linking to it now to test that writes to the new database work correctly.

I try to create an issue like this any time I do even a minor ops task, mainly so I have somewhere to drop screenshots of any web UI interactions for future reference. # 10:29 pm

I understand people are upset about AI art making it to the final cut, but please try to also google artist names and compare to their portfolio before accusing them of using AI. I’m genuinely pretty upset to be accused of this. It’s no fun to work on your craft for decades and then be told by some ’detection site’ that your work is machine generated and people are spreading this around as a fact.

Johanna Tarkela # 3:28 pm

Using DuckDB for Embeddings and Vector Search (via) Sören Brunk's comprehensive tutorial combining DuckDB 1.0, a subset of German Wikipedia from Hugging Face (loaded using Parquet), the BGE M3 embedding model and DuckDB's new vss extension for implementing an HNSW vector index. # 2:39 pm

June 14, 2024

(Blaming something on “politics” is usually a way of accidentally confessing that you don’t actually understand the constraints someone is operating under, IMO.)

Charity Majors # 8:30 pm

Highlights