Simon Willison’s Weblog

Subscribe
Atom feed

Blogmarks

Filters: Sorted by date

gemma-2-27b-it-llamafile (via) Justine Tunney shipped llamafile packages of Google's new openly licensed (though definitely not open source) Gemma 2 27b model this morning.

I downloaded the gemma-2-27b-it.Q5_1.llamafile version (20.5GB) to my Mac, ran chmod 755 gemma-2-27b-it.Q5_1.llamafile and then ./gemma-2-27b-it.Q5_1.llamafile and now I'm trying it out through the llama.cpp default web UI in my browser. It works great.

It's a very capable model - currently sitting at position 12 on the LMSYS Arena making it the highest ranked open weights model - one position ahead of Llama-3-70b-Instruct and within striking distance of the GPT-4 class models.

# 2nd July 2024, 10:38 pm / google, ai, generative-ai, local-llms, llms, llamafile, justine-tunney, llama-cpp, gemma, chatbot-arena

Compare PDFs. Inspired by this thread on Hacker News about the C++ diff-pdf tool I decided to see what it would take to produce a web-based PDF diff visualization tool using Claude 3.5 Sonnet.

It took two prompts:

Build a tool where I can drag and drop on two PDF files and it uses PDF.js to turn each of their pages into canvas elements and then displays those pages side by side with a third image that highlights any differences between them, if any differences exist

That give me a React app that didn't quite work, so I followed-up with this:

rewrite that code to not use React at all

Which gave me a working tool! You can see the full Claude transcript in this Gist. Here's a screenshot of the tool in action:

Screenshot showing a page from a PDF, then that page with a red squiggle on it, then just the squiggle

Being able to knock out little custom interactive web tools like this in a couple of minutes is so much fun.

# 2nd July 2024, 7:54 pm / pdf, projects, tools, llms, ai-assisted-programming, anthropic, claude, claude-3-5-sonnet

Optimizing Large-Scale OpenStreetMap Data with SQLite (via) JT Archie describes his project to take 9GB of compressed OpenStreetMap protobufs data for the whole of the United States and load it into a queryable SQLite database.

OSM tags are key/value pairs. The trick used here for FTS-accelerated tag queries is really neat: build a SQLite FTS table containing the key/value pairs as space concatenated text, then run queries that look like this:

SELECT
    id
FROM
    entries e
    JOIN search s ON s.rowid = e.id
WHERE
    -- use FTS index to find subset of possible results
    search MATCH 'amenity cafe'
    -- use the subset to find exact matches
    AND tags->>'amenity' = 'cafe';

JT ended up building a custom SQLite Go extension, SQLiteZSTD, to further accelerate things by supporting queries against read-only zstd compresses SQLite files. Apparently zstd has a feature that allows "compressed data to be stored so that subranges of the data can be efficiently decompressed without requiring the entire document to be decompressed", which works well with SQLite's page format.

# 2nd July 2024, 2:33 pm / go, openstreetmap, sqlite, zstd

Russell Keith-Magee: Build a cross-platform app with BeeWare. The session videos from PyCon US 2024 have started showing up on YouTube. So far just for the tutorials, which gave me a chance to catch up on the BeeWare project with this tutorial run by Russell Keith-Magee.

Here are the accompanying slides (PDF), or you can work through the official tutorial in the BeeWare documentation.

The tutorial did a great job of clarifying the difference between Briefcase and Toga, the two key components of the BeeWare ecosystem - each of which can be used independently of the other.

Briefcase solves packaging and installation: it allows a Python project to be packaged as a native application across macOS, Windows, iOS, Android and various flavours of Linux.

Toga is a toolkit for building cross-platform GUI applications in Python. A UI built using Toga will render with native widgets across all of those supported platforms, and experimental new modes also allow Toga apps to run as SPA web applications and as Rich-powered terminal tools (via toga-textual).

Russell is excellent at both designing and presenting tutorial-style workshops, and I made a bunch of mental notes on the structure of this one which I hope to apply to my own in the future.

# 1st July 2024, 10:49 pm / python, russell-keith-magee, beeware

Announcing the Ladybird Browser Initiative (via) Andreas Kling's Ladybird is a really exciting project: a from-scratch implementation of a web browser, initially built as part of the Serenity OS project, which aims to provide a completely independent, open source and fully standards compliant browser.

Last month Andreas forked Ladybird away from Serenity, recognizing that the potential impact of the browser project on its own was greater than as a component of that project. Crucially, Serenity OS avoids any outside code - splitting out Ladybird allows Ladybird to add dependencies like libjpeg and ffmpeg. The Ladybird June update video talks through some of the dependencies they've been able to add since making that decision.

The new Ladybird Browser Initiative puts some financial weight behind the project: it's a US 501(c)(3) non-profit initially funded with $1m from GitHub co-founder Chris Chris Wanstrath. The money is going on engineers: Andreas says:

We are 4 full-time engineers today, and we'll be adding another 3 in the near future

Here's a 2m28s video from Chris introducing the new foundation and talking about why this project is worth supporting.

# 1st July 2024, 4:08 pm / browsers, ffmpeg, open-source, andreas-kling, ladybird

A write-ahead log is not a universal part of durability (via) Phil Eaton uses pseudo code to provide a clear description of how write-ahead logs in transactional database systems work, useful for understanding the tradeoffs they make and the guarantees they can provided.

I particularly liked the pseudo code explanation of group commits, where clients block waiting for their commit to be acknowledged as part of a batch of writes flushed to disk.

# 1st July 2024, 3:05 pm / databases, phil-eaton

The Super Effectiveness of Pokémon Embeddings Using Only Raw JSON and Images. A deep dive into embeddings from Max Woolf, exploring 1,000 different Pokémon (loaded from PokéAPI using this epic GraphQL query) and then embedding the cleaned up JSON data using nomic-embed-text-v1.5 and the official Pokémon image representations using nomic-embed-vision-v1.5.

I hadn't seen nomic-embed-vision-v1.5 before: it brings multimodality to Nomic embeddings and operates in the same embedding space as nomic-embed-text-v1.5 which means you can use it to perform CLIP-style tricks comparing text and images. Here's their announcement from June 5th:

Together, Nomic Embed is the only unified embedding space that outperforms OpenAI CLIP and OpenAI Text Embedding 3 Small on multimodal and text tasks respectively.

Sadly the new vision weights are available under a non-commercial Creative Commons license (unlike the text weights which are Apache 2), so if you want to use the vision weights commercially you'll need to access them via Nomic's paid API.

Nomic do say this though:

As Nomic releases future models, we intend to re-license less recent models in our catalogue under the Apache-2.0 license.

Update 17th January 2025: Nomic Embed Vision 1.5 is now Apache 2.0 licensed.

# 30th June 2024, 9:22 pm / ai, graphql, max-woolf, embeddings, clip

marimo.app. The Marimo reactive notebook (previously) - a Python notebook that's effectively a cross between Jupyter and Observable - now also has a version that runs entirely in your browser using WebAssembly and Pyodide. Here's the documentation.

# 29th June 2024, 11:07 pm / python, jupyter, observable, webassembly, pyodide, marimo

Accidental GPT-4o voice preview (via) Reddit user RozziTheCreator was one of a small group who were accidentally granted access to the new multimodal GPT-4o audio voice feature. They captured this video of it telling them a spooky story, complete with thunder sound effects added to the background and in a very realistic voice that clearly wasn't the one from the 4o demo that sounded similar to Scarlet Johansson.

OpenAI provided a comment for this Tom's Guide story confirming the accidental rollout so I don't think this is a faked video.

# 28th June 2024, 8:53 pm / ai, openai, generative-ai, chatgpt, gpt-4

Serving a billion web requests with boring code (via) Bill Mill provides a deep retrospective from his work helping build a relaunch of the medicare.gov/plan-compare site.

It's a fascinating case study of the choose boring technology mantra put into action. The "boring" choices here were PostgreSQL, Go and React, all three of which are so widely used and understood at this point that you're very unlikely to stumble into surprises with them.

Key goals for the site were accessibility, in terms of users, devices and performance. Despite best efforts:

The result fell prey after a few years to a common failure mode of react apps, and became quite heavy and loaded somewhat slowly.

I've seen this pattern myself many times over, and I'd love to understand why. React itself isn't a particularly large dependency but somehow it always seems to lead to architectural bloat over time. Maybe that's more of an SPA thing than something that's specific to React.

Loads of other interesting details in here. The ETL details - where brand new read-only RDS databases were spun up every morning after a four hour build process - are particularly notable.

# 28th June 2024, 4:22 pm / accessibility, go, postgresql, scaling, react, boring-technology

Django: Test for pending migrations (via) Neat recipe from Adam Johnson for adding an automated test to your Django test suite that runs manage.py makemigrations --check to ensure you don't accidentally land code that deploys with a missing migration and crashes your site. I've made this mistake before myself so I'll be adding this to my projects.

# 28th June 2024, 3:23 pm / django, adam-johnson

Python 3.12 change results in Apple App Store rejection (via) Such a frustrating demonstration of the very worst of Apple's opaque App Store review process. The Python 3.12 standard library urllib package includes the string itms-services, and after much investigation Eric Froemling managed to determine that Apple use a scanner and reject any app that has that string mentioned anywhere within their bundle.

Russell Keith-Magee has a thread on the Python forum discussing solutions. He doesn't think attempts to collaborate with Apple are likely to help:

That definitely sounds appealing as an approach - but in this case, it’s going to be screaming into the void. There’s barely even an appeals process for app rejection on Apple’s App Store. We definitely don’t have any sort of channel to raise a complaint that we could reasonably believe would result in a change of policy.

# 27th June 2024, 11:17 pm / apple, appstore, python, russell-keith-magee

picopilot (via) Kyle Carberry's "GitHub Copilot in 70 lines of JavaScript". The title is a little hyperbolic, but the code itself really does implement an OpenAI powered Visual Studio Code text completion extension in 71 lines of code. This is an excellent example for learning what a minimal VS Code extension looks like.

Here's the system prompt it uses:

You provide code completion results given a prefix and suffix. Respond with a JSON object with the key 'completion' containing a suggestion to place between the prefix and suffix. Follow existing code styles. Listen to comments at the end of the prefix. The language is "{language}".

Then it passes the prefix and suffix as two user messages, and uses the "response_format": {"type": "json_object"} option to enforce JSON output from the GPT-4o API.

The feature this is missing is the thing that makes GitHub Copilot so impressive: Copilot does a whole bunch of clever tricks to find snippets of relevant code from the current and other nearby files and includes them with the prompt, resulting in much higher quality completions.

# 26th June 2024, 12:24 am / ai, prompt-engineering, generative-ai, github-copilot, llms, vs-code

Polyfill supply chain attack hits 100K+ sites (via) Short version: if you are loading assets from the polyfill.io domain you need to remove that right now: the new owners of the domain (as of a few months ago) appear to be using it to serve malicious JavaScript.

polyfill.io was a fascinating service. It was originally developed and supported by the Financial Times, but span off as a separate project several years ago.

The key idea was to serve up a set of JavaScript polyfills - pieces of code that implemented missing web platform features for older browsers - dynamically, based on the incoming user-agent. This required a CDN that varied its output dynamically based on the user-agent, hence the popularity of the single hosted service.

Andrew Betts, the original author of the service, has been warning people to move off it since February 2024:

If your website uses polyfill.io, remove it IMMEDIATELY.

I created the polyfill service project but I have never owned the domain name and I have had no influence over its sale.

He now works for Fastly, which started offering a free polyfill-fastly.io alternative in February. Andrew says you probably don't need that either, given that modern browsers have much better compatibility than when the service was first introduced over a decade ago.

There's some interesting additional context in a now-deleted GitHub issue, preserved here by the Internet Archive.

Usually one answer to protecting against this style of CDN supply chain attack would be to use SRI hashes to ensure only the expected script can be served from the site. That doesn't work here because the whole point of the service is to serve different scripts to different browsers.

# 25th June 2024, 10:17 pm / javascript, security, supply-chain

Claude Projects. New Claude feature, quietly launched this morning for Claude Pro users. Looks like their version of OpenAI's GPTs, designed to take advantage of Claude's 200,000 token context limit:

You can upload relevant documents, text, code, or other files to a project’s knowledge base, which Claude will use to better understand the context and background for your individual chats within that project. Each project includes a 200K context window, the equivalent of a 500-page book, so users can add all of the insights needed to enhance Claude’s effectiveness.

You can also set custom instructions, which presumably get added to the system prompt.

I tried dropping in all of Datasette's existing documentation - 693KB of .rst files (which I had to rename to .rst.txt for it to let me upload them) - and it worked and showed "63% of knowledge size used".

This is a slightly different approach from OpenAI, where the GPT knowledge feature supports attaching up to 20 files each with up to 2 million tokens, which get ingested into a vector database (likely Qdrant) and used for RAG.

It looks like Claude instead handle a smaller amount of extra knowledge but paste the whole thing into the context window, which avoids some of the weirdness around semantic search chunking but greatly limits the size of the data.

My big frustration with the knowledge feature in GPTs remains the lack of documentation on what it's actually doing under the hood. Without that it's difficult to make informed decisions about how to use it - with Claude Projects I can at least develop a robust understanding of what the tool is doing for me and how best to put it to work.

No equivalent (yet) for the GPT actions feature where you can grant GPTs the ability to make API calls out to external systems.

# 25th June 2024, 4:03 pm / ai, openai, prompt-engineering, chatgpt, llms, anthropic, claude, rag

Listen to the AI-generated ripoff songs that got Udio and Suno sued. Jason Koebler reports on the lawsuit filed today by the RIAA against Udio and Suno, the two leading generative music startups.

The lawsuit includes examples of prompts that the record labels used to recreate famous songs that were almost certainly included in the (undisclosed) training data. Jason collected some of these together into a three minute video, and the result in pretty damning. Arguing "fair use" isn't going to be easy here.

# 24th June 2024, 6:33 pm / ethics, ai, generative-ai, jason-koebler, training-data, ai-ethics

New blog feature: Support for markdown in quotations. Another incremental improvement to my blog. I've been collecting quotations here since 2006 - I now render them using Markdown (previously they were just plain text). Here's one example. The full set of 920 (and counting) quotations can be explored using this search filter.

# 24th June 2024, 3:51 pm / blogging, projects, markdown, site-upgrades

Microfeatures I Love in Blogs and Personal Websites (via) This post by Daniel Fedorin (and the accompanying Hacker News thread) is a nice reminder of one of the most fun things about building your own personal website: it gives you a low-risk place to experiment with details like footnotes, tables of contents, linkable headings, code blocks, RSS feeds, link previews and more.

# 24th June 2024, 1:51 pm / blogging

llama.ttf (via) llama.ttf is "a font file which is also a large language model and an inference engine for that model".

You can see it kick into action at 8m28s in this video, where creator Søren Fuglede Jørgensen types "Once upon a time" followed by dozens of exclamation marks, and those exclamation marks then switch out to render a continuation of the story. But... when they paste the code out of the editor again it shows as the original exclamation marks were preserved - the LLM output was presented only in the way they were rendered.

The key trick here is that the font renderer library HarfBuzz (used by Firefox, Chrome, Android, GNOME and more) added a new WebAssembly extension in version 8.0 last year, which is powerful enough to run a full LLM based on the tinyllama-15M model - which fits in a 60MB font file.

(Here's a related demo from Valdemar Erk showing Tetris running in a WASM font, at 22m56s in this video.)

The source code for llama.ttf is available on GitHub.

# 23rd June 2024, 3:39 pm / fonts, ai, webassembly, generative-ai, llms

Wikipedia Manual of Style: Linking (via) I started a conversation on Mastodon about the grammar of linking: how to decide where in a phrase an inline link should be placed.

Lots of great (and varied) replies there. The most comprehensive style guide I've seen so far is this one from Wikipedia, via Tom Morris.

# 22nd June 2024, 2:15 pm / links, wikipedia, writing

Datasette 0.64.8. A very small Datasette release, fixing a minor potential security issue where the name of missing databases or tables was reflected on the 404 page in a way that could allow an attacker to present arbitrary text to a user who followed a link. Not an XSS attack (no code could be executed) but still a potential vector for confusing messages.

# 21st June 2024, 11:48 pm / projects, releases, security, datasette

Val Vibes: Semantic search in Val Town. A neat case-study by JP Posma on how Val Town's developers can use Val Town Vals to build prototypes of new features that later make it into Val Town core.

This one explores building out semantic search against Vals using OpenAI embeddings and the PostgreSQL pgvector extension.

# 21st June 2024, 2:16 am / postgresql, ai, openai, embeddings, val-town, ai-assisted-search

llm-claude-3 0.4. LLM plugin release adding support for the new Claude 3.5 Sonnet model:

pipx install llm
llm install -U llm-claude-3
llm keys set claude
# paste AP| key here
llm -m claude-3.5-sonnet \
  'a joke about a pelican and a walrus having lunch'

# 20th June 2024, 11:04 pm / projects, ai, generative-ai, llms, llm, anthropic, claude, claude-3-5-sonnet

Claude 3.5 Sonnet. Anthropic released a new model this morning, and I think it's likely now the single best available LLM. Claude 3 Opus was already mostly on-par with GPT-4o, and the new 3.5 Sonnet scores higher than Opus on almost all of Anthropic's internal evals.

It's also twice the speed and one fifth of the price of Opus (it's the same price as the previous Claude 3 Sonnet). To compare:

  • gpt-4o: $5/million input tokens and $15/million output
  • Claude 3.5 Sonnet: $3/million input, $15/million output
  • Claude 3 Opus: $15/million input, $75/million output

Similar to Claude 3 Haiku then, which both under-cuts and out-performs OpenAI's GPT-3.5 model.

In addition to the new model, Anthropic also added a "artifacts" feature to their Claude web interface. The most exciting part of this is that any of the Claude models can now build and then render web pages and SPAs, directly in the Claude interface.

This means you can prompt them to e.g. "Build me a web app that teaches me about mandelbrot fractals, with interactive widgets" and they'll do exactly that - I tried that prompt on Claude 3.5 Sonnet earlier and the results were spectacular (video demo).

An unsurprising note at the end of the post:

To complete the Claude 3.5 model family, we’ll be releasing Claude 3.5 Haiku and Claude 3.5 Opus later this year.

If the pricing stays consistent with Claude 3, Claude 3.5 Haiku is going to be a very exciting model indeed.

# 20th June 2024, 6:01 pm / mandelbrot, ai, generative-ai, llms, anthropic, claude, vision-llms, claude-artifacts, claude-3-5-sonnet, llm-release, prompt-to-app

State-of-the-art music scanning by Soundslice. It's been a while since I checked in on Soundslice, Adrian Holovaty's beautiful web application focused on music education.

The latest feature is spectacular. The Soundslice music editor - already one of the most impressive web applications I've ever experienced - can now import notation directly from scans or photos of sheet music.

The attention to detail is immaculate. The custom machine learning model can handle a wide variety of notation details, and the system asks the user to verify or correct details that it couldn't perfectly determine using a neatly designed flow.

Free accounts can scan two single page documents a month, and paid plans get a much higher allowance. I tried it out just now on a low resolution image I found on Wikipedia and it did a fantastic job, even allowing me to listen to a simulated piano rendition of the music once it had finished processing.

It's worth spending some time with the release notes for the feature to appreciate how much work they've out into improving it since the initial release.

If you're new to Soundslice, here's an example of their core player interface which syncs the display of music notation to an accompanying video.

Adrian wrote up some detailed notes on the machine learning behind the feature when they first launched it in beta back in November 2022.

OMR [Optical Music Recognition] is an inherently hard problem, significantly more difficult than text OCR. For one, music symbols have complex spatial relationships, and mistakes have a tendency to cascade. A single misdetected key signature might result in multiple incorrect note pitches. And there’s a wide diversity of symbols, each with its own behavior and semantics — meaning the problems and subproblems aren’t just hard, there are many of them.

# 20th June 2024, 4:37 am / adrian-holovaty, machine-learning, music, ocr, ai

Civic Band. Exciting new civic tech project from Philip James: 30 (and counting) Datasette instances serving full-text search enabled collections of OCRd meeting minutes for different civic governments. Includes 20,000 pages for Alameda, 17,000 for Pittsburgh, 3,567 for Baltimore and an enormous 117,000 for Maui County.

Philip includes some notes on how they're doing it. They gather PDF minute notes from anywhere that provides API access to them, then run local Tesseract for OCR (the cost of cloud-based OCR proving prohibitive given the volume of data). The collection is then deployed to a single VPS running multiple instances of Datasette via Caddy, one instance for each of the covered regions.

# 19th June 2024, 9:30 pm / data-journalism, ocr, tesseract, datasette

About the Lawrence Times (via) The town of Lawrence, Kansas is where Django was born. I'm delighted to learn that it has a new independent online news publication as-of March 2021 - the Lawrence Times.

It's always exciting to see local media startups like this one, and they've been publishing for three years now supported by both advertiser revenue and optional paid subscriptions.

# 19th June 2024, 3:53 am / kansas, news, newspapers

I’ve stopped using box plots. Should you? (via) Nick Desbarats explains box plots (including with this excellent short YouTube video) and then discusses why he thinks "typically less than 20 percent" of participants in his workshops already understand how to read them.

A key problem is that they are unintuitive: a box plot has four sections, two thin lines (the top and bottom whisker segments) and two larger boxes, joined around the median. Each of these elements represents the same number of samples (one quartile each) but the thin lines v.s. thick boxes imply that the whiskers contain less samples than the boxes.

# 19th June 2024, 12:22 am / visualization

Tags with descriptions. Tiny new feature on my blog: I can now add optional descriptions to my tag pages, for example on datasette and sqlite-utils and prompt-injection.

I built this feature on a live call this morning as an unplanned demonstration of GitHub's new Copilot Workspace feature, where you can run a prompt against a repository and have it plan, implement and file a pull request implementing a change to the code.

My prompt was:

Add a feature that lets me add a description to my tag pages, stored in the database table for tags and visible on the /tags/x/ page at the top

It wasn't as compelling a demo as I expected: Copilot Workspace currently has to stream an entire copy of each file it modifies, which can take a long time if your codebase includes several large files that need to be changed.

It did create a working implementation on its first try, though I had given it an extra tip not to forget the database migration. I ended up making a bunch of changes myself before I shipped it, listed in the pull request.

I've been using Copilot Workspace quite a bit recently as a code explanation tool - I'll prompt it to e.g. "add architecture documentation to the README" on a random repository not owned by me, then read its initial plan to see what it's figured out without going all the way through to the implementation and PR phases. Example in this tweet where I figured out the rough design of the Jina AI Reader API for this post.

# 18th June 2024, 4:50 pm / blogging, github, projects, tagging, ai, generative-ai, llms, site-upgrades

Claude: Building evals and test cases. More documentation updates from Anthropic: this section on writing evals for Claude is new today and includes Python code examples for a number of different evaluation techniques.

Included are several examples of the LLM-as-judge pattern, plus an example using cosine similarity and another that uses the new-to-me Rouge Python library that implements the ROUGE metric for evaluating the quality of summarized text.

# 18th June 2024, 4:28 pm / ai, prompt-engineering, generative-ai, llms, anthropic, claude, evals

Years

Tags