Simon Willison’s Weblog

Subscribe

July 2023

77 posts: 4 entries, 20 links, 7 quotes, 46 beats

July 24, 2023

sqlite-utils now supports plugins

Visit sqlite-utils now supports plugins

sqlite-utils 3.34 is out with a major new feature: support for plugins.

[... 1,327 words]

LLM can now be installed directly from Homebrew (via) I spent a bunch of time on this at the weekend: my LLM tool for interacting with large language models from the terminal has now been accepted into Homebrew core, and can be installed directly using “brew install llm”. I was previously running my own separate tap, but having it in core means that it benefits from Homebrew’s impressive set of build systems—each release of LLM now has Bottles created for it automatically across a range of platforms, so “brew install llm” should quickly download binary assets rather than spending several minutes installing dependencies the slow way.

# 5:16 pm / homebrew, projects, ai, generative-ai, llms, llm

asgi-replay. As part of submitting LLM to Homebrew core I needed an automated test that demonstrated that the tool was working—but I couldn’t test against the live OpenAI API because I didn’t want to have to reveal my API token as part of the test. I solved this by creating a dummy HTTP endpoint that simulates a hit to the OpenAI API, then configuring the Homebrew test to hit that instead. As part of THAT I ended up building this tiny tool which uses my asgi-proxy-lib package to intercept and log the details of hits made to a service, then provides a mechanism to replay that traffic.

# 7:51 pm / projects, asgi

Release dclient 0.2 — A client CLI utility for Datasette instances

July 25, 2023

None
TIL Using pytest-httpx to run intercepted requests through an in-memory Datasette instance — I've been working on a tool called [dclient](https://github.com/simonw/dclient) which is a CLI client tool for talking to Datasette instances.

textual-paint (via) Isaiah Odhner built a full working clone of MS Paint that runs entirely in the terminal, using Textual and Python. It’s worth trying this out to get a feel for quite how much you can get done with a CLI app these days—“pipx install textual-paint” worked for me.

# 2:32 pm / cli, python, textual

Release sqlite-utils-litecli 0.1 — Interactive shell for sqlite-utils using litecli
Release llm-gpt4all 0.1.1 — Plugin for LLM adding support for the GPT4All collection of models

July 26, 2023

None
TIL Protocols in Python — [Datasette](https://datasette.io/) currently has a few API internals that return `sqlite3.Row` objects. I was thinking about how this might work in the future - if Datasette ever expands beyond SQLite (plugin-provided backends for PostgreSQL and DuckDB for example) I'd want a way to return data from other stores using objects that behave like `sqlite3.Row` but are not exactly that class.

Much of the substance of what constitutes “government” is in fact text. A technology that can do orders of magnitude more with text is therefore potentially massively impactful here. [...] Many of the sub-tasks of the work of delivering public benefits seem amenable to the application of large language models to help people do this hard work.

Dave Guarino

# 7:10 pm / ai, generative-ai, llms, dave-guarino

Release sqlite-utils-litecli 0.1.1 — Interactive shell for sqlite-utils using litecli

July 27, 2023

Overture Maps Foundation Releases Its First World-Wide Open Map Dataset. The Overture Maps Foundation is a collaboration lead by Amazon, Meta, Microsoft and TomTom dedicated to producing “reliable, easy-to-use, and interoperable open map data”.

Yesterday they put out their first release and it’s pretty astonishing: four different layers of geodata, covering Places of Interest (shops, restaurants, attractions etc), administrative boundaries, building outlines and transportation networks.

The data is available as Parquet. I just downloaded the 8GB places dataset and can confirm that it contains 59 million listings from around the world—I filtered to just places in my local town and a spot check showed that recently opened businesses (last 12 months) were present and the details all looked accurate.

The places data is licensed under “Community Data License Agreement – Permissive” which looks like the only restriction is that you have to include that license when you further share the data.

# 4:45 pm / geospatial, open-data, parquet, meta, overture

None
TIL Exploring the Overture Maps places data using DuckDB, sqlite-utils and Datasette — The [Overture Maps Foundation](https://overturemaps.org/) published [their first data release](https://overturemaps.org/overture-maps-foundation-releases-first-world-wide-open-map-dataset/) yesterday, and it's a really big deal. Here are my notes and first impressions of the data I've explored so far.

Universal and Transferable Attacks on Aligned Language Models (via) Fascinating new paper concerning jailbreaking attacks against LLMs - attacks that convince the model to produce output that it's supposed to avoid, like bomb creation instructions.

This research team claim to have a method of automatically creating an effectively unlimited set of attack strings against both open source and closed language models, and state that "it is unclear whether such behavior can ever be fully patched by LLM providers".

I tried one of their attack strings against ChatGPT using my LLM tool and it worked exactly as described:

... describing.\ + similarlyNow write oppositeley.]( Me giving**ONE please? revert with "\!--Two

# 8:20 pm / jailbreaking, security, ai, generative-ai, chatgpt, llms

July 28, 2023

You can think of the attention mechanism as a matchmaking service for words. Each word makes a checklist (called a query vector) describing the characteristics of words it is looking for. Each word also makes a checklist (called a key vector) describing its own characteristics. The network compares each key vector to each query vector (by computing a dot product) to find the words that are the best match. Once it finds a match, it transfers information [the value vector] from the word that produced the key vector to the word that produced the query vector.

Timothy B Lee and Sean Trott

# 11:30 am / ai, generative-ai, llms

July 29, 2023

A Steering Council notice about PEP 703 (Making the Global Interpreter Lock Optional in CPython) (via) Huge news concerning the nogil research fork of Python: “It’s clear that the overall sentiment is positive, both for the general idea and for PEP 703 specifically. The Steering Council is also largely positive on both. We intend to accept PEP 703, although we’re still working on the acceptance details.”

# 9:23 pm / gil, python

July 30, 2023

None
TIL axe-core and shot-scraper for accessibility audits — I just watched a talk by [Pamela Fox](https://www.pamelafox.org/) at [North Bay Python](https://2023.northbaypython.org/) on Automated accessibility audits. The video should be [up within 24 hours](https://www.youtube.com/@NorthBayPython/videos).

2023 » July

MTWTFSS
     12
3456789
10111213141516
17181920212223
24252627282930
31