Simon Willison’s Weblog

Subscribe

TILs

Filters: Sorted by date

TIL Use DuckDB to convert parquet to JSON and then open it in Datasette Lite — [pickapic.io](https://pickapic.io/) is a new tool funded by [stability.ai](https://stability.ai/) which asks people to generate and then vote on images in order to provide data to be used for fine tuning an open source image generation model.
TIL A simple Python implementation of the ReAct pattern for LLMs — A popular nightmare scenario for AI is giving it access to tools, so it can make API calls and execute its own code and generally break free of the constraints of its initial environment.
TIL Scraping Reddit and writing data to the Datasette write API — Today I built a system for monitoring Reddit for new posts that link to various domains that I own.
TIL How to read Hacker News threads with most recent comments first — [Hacker News](https://news.ycombinator.com/) displays comments in a tree. This can be frustrating if you want to keep track of a particular conversation, as you constantly have to seek through the tree to find the latest comment.
TIL Copy rich text to the clipboard — I've been experimenting with a tool for generating the content for a weekly Substack newsletter by querying the Datasette API for my blog and assembling HTML for the last week of content.
TIL Running LLaMA 7B and 13B on a 64GB M2 MacBook Pro with llama.cpp — See also: **[Large language models are having their Stable Diffusion moment right now](https://simonwillison.net/2023/Mar/11/llama/)**.
TIL Using SQL with GDAL — Inspired [by Brad Neuberg](https://twitter.com/bradneuberg/status/1633875601789681666) I decided to take a look at the SQL features in the GDAL family of tools.
TIL Using ChatGPT to write AppleScript — I found a killer application for ChatGPT today: writing AppleScript!
TIL Mocking subprocess with pytest-subprocess — For [apple-notes-to-sqlite](https://github.com/dogsheep/apple-notes-to-sqlite) I needed to write some tests that simulated executing the `osascript` command using the Python `subprocess` module.
TIL A simple Python wrapper for the ChatGPT API — OpenAI [released an API for ChatGPT](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) yesterday. It's 1/10th of the price of the `text-davinci-003` model!
TIL sips: Scriptable image processing system — I wanted to convert some `.webp` images to `.png` on my Mac. I asked ChatGPT:
TIL Training nanoGPT entirely on content from my blog — This is a follow-up to [Running nanoGPT on a MacBook M2 to generate terrible Shakespeare](https://til.simonwillison.net/llms/nanogpt-shakespeare-m2).
TIL Subqueries in select expressions in SQLite - also window functions — I figured out a single SQL query for the following today. Given a table of GitHub repositories, for each repository return:
TIL Avoiding "length" errors in Apache Bench with the -l option — I was using the Apache Bench `ab` command to exercise some new code I'm writing in Datasette and I noticed I was getting a lot of errors:
TIL The SQLite now argument is stable within the same query — I stumbled across an interesting little detail of SQLite today, running the following query:
TIL Building Mastodon bots with GitHub Actions and toot — Twitter [announced today](https://twitter.com/TwitterDev/status/1621026986784337922) that they'll be ending free API access for bots.
TIL Run Python code in a WebAssembly sandbox — I've been trying to figure this out for ages. Tim Bart responded to [my call for help on Hacker News](https://news.ycombinator.com/item?id=34598024) with [this extremely useful code example](https://gist.github.com/pims/711549577759ad1341f1a90860f1f3a5) showing how to run Python code in WebAssembly inside Python, using [wasmtime-py](https://github.com/bytecodealliance/wasmtime-py) and the new Python WASM build [released by VMware Wasm Labs](https://wasmlabs.dev/articles/python-wasm32-wasi/).
TIL Running nanoGPT on a MacBook M2 to generate terrible Shakespeare — [nanoGPT](https://github.com/karpathy/nanoGPT) is Andrej Karpathy's "simplest, fastest repository for training/finetuning medium-sized GPTs".
TIL Calculating embeddings with gtr-t5-large in Python — I've long wanted to run some kind of large language model on my own computer. Now that I have a M2 MacBook Pro I'm even more keen to find interesting ways to keep all of those CPU cores busy.
TIL Using recursive CTEs to explore hierarchical Twitter threads — This TIL adapted from [a Gist](https://gist.github.com/simonw/656a8c6e4688f720773c474080abe1b0) I put together in 2019, before I started tracking TILs here.
TIL Combining CTEs and VALUES in SQLite — Here's how to use SQLite's `VALUES` syntax with a CTE to create a temporary table that you can then perform joins against in a query:
TIL Installing lxml for Python on an M1/M2 Mac — I ran into this error while trying to run `pip install lxml` on an M2 Mac, inside a virtual environment I had intitially created using `pipenv shell`:
TIL SQLite pragma_function_list() — The SQLite `pragma_function_list()` table-valued function returns a list of functions that have been registered with SQLite, including functions that were added by extensions.
TIL Rewriting a Git repo to remove secrets from the history — I decided to make a GitHub repository public today that had previously been private. Unfortunately the revision history of that repository included some secret values, one of which I could not figure out a way to revoke.
TIL Upgrading a pipx application to an alpha version — I wanted to upgrade my [git-history](https://datasette.io/tools/git-history) installation to a new alpha version.
TIL Scraping the Sky News Westminster Accounts, a Flourish application — Sky News in partnership with [Tortoise](https://www.tortoisemedia.com/) published a fantastic piece of investigative data reporting: [the Westminster Accounts](https://news.sky.com/story/westminster-accounts-methodology-12764656), a database of money in UK politics that brought together data from three different sources and make it explorable.
TIL Loading SQLite extensions in Python on macOS — I finally found a workaround for this error when attempting to load a SQLite extension in Python on macOS:
TIL Geopoly in SQLite — I noticed this morning that one of my Datasette installations had the [Geopoly](https://www.sqlite.org/geopoly.html) SQLite extension enabled. I don't know how it got there - it has to be compiled specifically - but since it was there I decided to try it out.
TIL Querying the GitHub archive with the ClickHouse playground — Via [this comment](https://news.ycombinator.com/item?id=34197637) on Hacker News I started exploring the [ClickHouse Playground](https://clickhouse.com/docs/en/getting-started/playground/). It's really cool, and among other things it allows CORS-enabled API hits that can query a decade of history from the GitHub events archive in less than a second.

Years