Simon Willison’s Weblog

Subscribe

September 2023

101 posts: 5 entries, 35 links, 5 quotes, 56 beats

Sept. 8, 2023

bpy—Blender on PyPI (via) TIL you can “pip install” Blender!

bpy “provides Blender as a Python module”—it’s part of the official Blender project, and ships with binary wheels ranging in size from 168MB to 319MB depending on your platform.

It only supports the version of Python used by the current Blender release though—right now that’s Python 3.10.

# 3:29 pm / pypi, python, blender

TIL Embedding paragraphs from my blog with E5-large-v2 — Xeophon [suggested](https://twitter.com/TheXeophon/status/1700203810545680542) that [E5-large-v2](https://huggingface.co/intfloat/e5-large-v2) as an embedding model that was worth a closer look.
Release llm-sentence-transformers 0.1.1 — LLM plugin for embeddings using sentence-transformers
Release datasette-llm-embed 0.1a0 — Datasette plugin adding a llm_embed(model_id, text) SQL function

Dynamic linker tricks: Using LD_PRELOAD to cheat, inject features and investigate programs (via) This tutorial by Rafał Cieślak from 2013 filled in a bunch of gaps in my knowledge about how C works on Linux.

# 10:05 pm / c, linux

Sept. 9, 2023

Release datasette-edit-schema 0.6 — Datasette plugin for modifying table schemas
Release sqlite-utils 3.35.1 — Python CLI utility and library for manipulating SQLite databases
TIL Summarizing Hacker News discussion themes with Claude and LLM — I've been experimenting with the combination of [Claude](https://claude.ai/) and my [LLM CLI tool](https://llm.datasette.io/) to give me quick summaries of long discussions on [Hacker News](https://news.ycombinator.com/).

Matthew Honnibal from spaCy on why LLMs have not solved NLP. A common trope these days is that the entire field of NLP has been effectively solved by Large Language Models. Here’s a lengthy comment from Matthew Honnibal, creator of the highly regarded spaCy Python NLP library, explaining in detail why that argument doesn’t hold up.

# 9:30 pm / nlp, ai, generative-ai, llms

Sept. 10, 2023

Release blip-caption 0.1 — Generate captions for images with Salesforce BLIP

promptfoo: How to benchmark Llama2 Uncensored vs. GPT-3.5 on your own inputs. promptfoo is a CLI and library for “evaluating LLM output quality”. This tutorial in their documentation about using it to compare Llama 2 to gpt-3.5-turbo is a good illustration of how it works: it uses YAML files to configure the prompts, and more YAML to define assertions such as “not-icontains: AI language model”.

# 4:19 pm / cli, testing, ai, generative-ai, llms

The AI-assistant wars heat up with Claude Pro, a new ChatGPT Plus rival. I'm quoted in this piece about the new Claude Pro $20/month subscription from Anthropic:

Willison has also run into problems with Claude's morality filter, which has caused him trouble by accident: "I tried to use it against a transcription of a podcast episode, and it processed most of the text before—right in front of my eyes—it deleted everything it had done! I eventually figured out that they had started talking about bomb threats against data centers towards the end of the episode, and Claude effectively got triggered by that and deleted the entire transcript."

# 5:07 pm / arstechnica, ai, generative-ai, llms, anthropic, claude, press-quotes

All models on Hugging Face, sorted by downloads (via) I realized this morning that “sort by downloads” against the list of all of the models on Hugging Face can work as a reasonably good proxy for “which of these models are easiest to get running on your own computer”.

# 5:24 pm / machine-learning, ai, hugging-face

Release datasette-sqlite-trace 0.1 — Datasette plugin that prints all executed SQL to stderr

Sept. 11, 2023

Release sqlite-chronicle 0.1 — Use triggers to track when rows in a SQLite table were updated or deleted
Release datasette-cluster-map 0.18a0 — Datasette plugin that shows a map for any data with latitude/longitude columns

Sept. 12, 2023

Release llm 0.10a1 — Access large language models from the command-line
Release llm-clip 0.1a0 — Generate embeddings for images and text using CLIP with LLM
Release llm 0.10 — Access large language models from the command-line
Release llm-clip 0.1 — Generate embeddings for images and text using CLIP with LLM

Build an image search engine with llm-clip, chat with models with llm chat

Visit Build an image search engine with llm-clip, chat with models with llm chat

LLM is my combination CLI tool and Python library for working with Large Language Models. I just released LLM 0.10 with two significant new features: embedding support for binary files and the llm chat command.

[... 1,188 words]

Sept. 13, 2023

TIL Trying out cr-sqlite on macOS — [cr-sqlite](https://github.com/vlcn-io/cr-sqlite) is fascinating. It's a loadable SQLite extension by Matt Wonlaw that "allows merging different SQLite databases together that have taken independent writes".
TIL Using llama-cpp-python grammars to generate JSON — [llama.cpp](https://github.com/ggerganov/llama.cpp) recently added the ability to control the output of any model using a grammar.

Simulating History with ChatGPT (via) Absolutely fascinating new entry in the using-ChatGPT-to-teach genre. Benjamin Breen teaches history at UC Santa Cruz, and has been developing a sophisticated approach to using ChatGPT to play out role-playing scenarios involving different periods of history. His students are challenged to participate in them, then pick them apart—fact-checking details from the scenario and building critiques of the perspectives demonstrated by the language model. There are so many quotable snippets in here, I recommend reading the whole thing.

# 3:36 am / education, teaching, ai, generative-ai, chatgpt, llms, benjamin-breen

In the long term, I suspect that LLMs will have a significant positive impact on higher education. Specifically, I believe they will elevate the importance of the humanities. [...] LLMs are deeply, inherently textual. And they are reliant on text in a way that is directly linked to the skills and methods that we emphasize in university humanities classes.

Benjamin Breen

# 3:40 am / education, ai, generative-ai, chatgpt, llms, benjamin-breen

Some notes on Local-First Development (via) Local-First is the name that has been coined by the community of people who are interested in building apps where data is manipulated in a client application first (mobile, desktop or web) and then continually synchronized with a server, rather than the other way round. This is a really useful review by Kyle Mathews of how the space is shaping up so far—lots of interesting threads to follow here.

# 3:48 am / local-first

Introducing datasette-litestream: easy replication for SQLite databases in Datasette. We use Litestream on Datasette Cloud for streaming backups of user data to S3. Alex Garcia extracted out our implementation into a standalone Datasette plugin, which bundles the Litestream Go binary (for the relevant platform) in the package you get when you run “datasette install datasette-litestream”—so now Datasette has a very robust answer to questions about SQLite disaster recovery beyond just the Datasette Cloud platform.

# 7:28 pm / plugins, sqlite, datasette, datasette-cloud, litestream, alex-garcia

Release llm-sentence-transformers 0.1.2 — LLM plugin for embeddings using sentence-transformers

Sept. 14, 2023

CAISO Grid Status (via) CAISO is the California Independent System Operator, a non-profit managing 80% of California’s electricity flow. This grid status page shows live data about the state of the grid and it’s fascinating: right now (2pm local time) California is running 71.4% on renewables, having peaked at 80% three hours ago. The current fuel mix is 52% solar, 31% natural gas, 7% each large hydro and nuclear and 2% wind. The charts on this page show how solar turns off overnight and then picks up and peaks during daylight hours.

# 9:08 pm / energy, california

Sept. 15, 2023

Release datasette-export-notebook 1.0.1 — Datasette plugin providing instructions for exporting data to Jupyter or Observable