Simon Willison’s Weblog

Subscribe

August 2022

67 posts: 5 entries, 24 links, 2 quotes, 36 beats

Aug. 14, 2022

Release datasette 0.62 — An open source multi-tool for exploring and publishing data
Release datasette-sentry 0.2a1 — Datasette plugin for configuring Sentry
Release datasette-sentry 0.2 — Datasette plugin for configuring Sentry

Aug. 16, 2022

Efficient Pagination Using Deferred Joins (via) Surprisingly simple trick for speeding up deep OFFSET x LIMIT y pagination queries, which get progressively slower as you paginate deeper into the data. Instead of applying them directly, apply them to a “select id from ...” query to fetch just the IDs, then either use a join or run a separate “select * from table where id in (...)” query to fetch the full records for that page.

# 5:35 pm / performance, sql

TIL Analyzing Google Cloud spend with Datasette — Google Cloud provide extremely finely grained billing, but you need to access it through BigQuery which I find quite inconvenient.

Aug. 17, 2022

Plugin support for Datasette Lite

Visit Plugin support for Datasette Lite

I’ve added a new feature to Datasette Lite, my distribution of Datasette that runs entirely in the browser using Python and SQLite compiled to WebAssembly. You can now install additional Datasette plugins by passing them in the URL.

[... 865 words]

Crunchy Data: Learn Postgres at the Playground (via) Crunchy Data have a new PostgreSQL tutorial series, with a very cool twist: they have a build of PostgreSQL compiled to WebAssembly which runs in the browser, so each tutorial is accompanied by a working psql terminal that lets you try out the tutorial contents interactively. It even has support for PostGIS, though that particular tutorial has to load 80MB of assets in order to get it to work!

# 6:30 pm / postgresql, webassembly

Building games and apps entirely through natural language using OpenAI’s code-davinci model. A deeply sophisticated example of using prompts to generate entire working JavaScript programs and games using the new code-davinci OpenAI model.

# 7:06 pm / game-design, ai, gpt-3, openai, prompt-engineering, generative-ai, llms, ai-assisted-programming

TIL Sort by number of JSON intersections — This [post on Reddit](https://www.reddit.com/r/sqlite/comments/wr0wp0/i_have_a_sqlite_database_of_recipes_i_would_like/) asked how to run a query that takes a list of items (in this case ingredients) as the input and returns all rows with at least one of those items in a JSON list, ordered by the most matches.

Aug. 18, 2022

TIL Trying out Quarto on macOS — I decided to try out [Quarto](https://quarto.org/), the new notebook/markdown/publishing system built on Pandoc.
Release sqlite-diffable 0.5 — Tools for dumping/loading a SQLite database to diffable directory structure

Aug. 19, 2022

The Datasette Newsletter: Datasette Lite, Datasette Tutorials, Datasette Cloud. It’s been quite a while since I’ve sent one of these out now—hoping to get this on to a more regular schedule.

# 1:20 am / datasette

Aug. 20, 2022

Release datasette-mp3-audio 0.1 — Turn .mp3 URLs into an audio player in the Datasette interface

Shoelace (via) Saw this for the first time today: it’s a relatively new library of framework-agnostic Web Components, built on lit-html and covering a huge array of common functionality: buttons and sliders and dialogs and drawer interfaces and dropdown menus and so on. The design is very clean, the documentation is superb—and it looks like you can cherry pick just the components you are using for a pretty lean addition to your page weight. So refreshing to see libraries like this that really take advantage of modern web standards.

# 8:57 pm / css, javascript, web-standards, web-components, lit-html

Show HN: A new way to use GPT-3 to generate code (and everything else). Riley Goodside is my favourite Twitter follow for GPT-3 tips. Here he describes a powerful prompt pattern he's designed which lets you generate extremely complex code output by asking GPT-3 to fill in $$areas like this$$ with different patterns, then stitch them together into full HTML or other source code files. It's really clever.

# 9:33 pm / gpt-3, prompt-engineering, generative-ai, riley-goodside, llms

Release datasette-mp3-audio 0.2 — Turn .mp3 URLs into an audio player in the Datasette interface

Aug. 21, 2022

Analyzing ScotRail audio announcements with Datasette—from prototype to production

Visit Analyzing ScotRail audio announcements with Datasette - from prototype to production

Scottish train operator ScotRail released a two-hour long MP3 file containing all of the components of its automated station announcements. Messing around with them is proving to be a huge amount of fun.

[... 4,428 words]

Turning SQLite into a distributed database (via) Heyang Zhou introduces mvSQLite, his brand new open source “SQLite-compatible distributed database” built in Rust on top of Apple’s FoundationDB. This is a very promising looking new entry into the distributed/replicated SQLite space: FoundationDB was designed to provide low-level primitives that tools like this could build on top of.

# 5:40 pm / databases, sqlite, rust

Aug. 22, 2022

Release pocket-to-sqlite 0.2.2 — Create a SQLite database containing data from your Pocket account

Digitizing 55,000 pages of civic meetings (via) Philip James has been building public, searchable archives of city council meetings for various cities—Oakland and Alamedia so far—using my s3-ocr script to run Textract OCR against the PDFs of the minutes, and deploying them to Fly using Datasette. This is a really cool project, and very much the kind of thing I’ve been hoping to support with the tools I’ve been building.

# 4:26 pm / archiving, ocr, political-hacking, datasette, fly

Stable Diffusion Public Release (via) New AI just dropped. Stable Diffusion is similar to DALL-E, but completely open source and with a CC0 license applied to everything it generates. I have a Twitter thread (the via) link of comparisons I’ve made between its output and my previous DALL-E experiments. The announcement buries the lede somewhat: to try it out, visit beta.dreamstudio.ai—which you can use for free at the moment, but it’s unclear to me how billing is supposed to work.

# 7:12 pm / machine-learning, dalle, stable-diffusion, generative-ai, text-to-image

Release datasette-edit-schema 0.5.1 — Datasette plugin for modifying table schemas

Aug. 23, 2022

TIL Seeing which functions are unique to a specific SQLite / Datasette instance — In reading [Scraping JSON, HTML, and ZIP Files with Pure SQLite](https://observablehq.com/@asg017/scrape-json-html-zip-with-sqlite) by Alex Garcia I got curious to see a full list of functions he had registered in his [sqlite-extension-examples.fly.dev](https://sqlite-extension-examples.fly.dev) Datasette instance that weren't available in a regular Datasette.

Aug. 24, 2022

How SQLite Scales Read Concurrency (via) Ben Johnson’s series on SQLite internals continues—this time with a detailed explanation of how the SQLite WAL (Write-Ahead Log) is implemented.

# 4:16 pm / databases, sqlite, ben-johnson

Release datasette-publish-vercel 0.14.2 — Datasette plugin for publishing data using Vercel

To make the analogy explicit, in Software 1.0, human-engineered source code (e.g. some .cpp files) is compiled into a binary that does useful work. In Software 2.0 most often the source code comprises 1) the dataset that defines the desirable behavior and 2) the neural net architecture that gives the rough skeleton of the code, but with many details (the weights) to be filled in. The process of training the neural network compiles the dataset into the binary — the final neural network. In most practical applications today, the neural net architectures and the training systems are increasingly standardized into a commodity, so most of the active “software development” takes the form of curating, growing, massaging and cleaning labeled datasets.

Andrej Karpathy

# 9:28 pm / data, machine-learning, ai, andrej-karpathy

Release datasette-multiline-links 0.1 — Make multiple newline separated URLs clickable in Datasette

Aug. 25, 2022

Building a searchable archive for the San Francisco Microscopical Society

Visit Building a searchable archive for the San Francisco Microscopical Society

The San Francisco Microscopical Society was founded in 1870 by a group of scientists dedicated to advancing the field of microscopy.

[... 1,845 words]

Aug. 28, 2022

Release sqlite-utils 3.29 — Python CLI utility and library for manipulating SQLite databases

Aug. 29, 2022

Stable Diffusion is a really big deal

Visit Stable Diffusion is a really big deal

If you haven’t been paying attention to what’s going on with Stable Diffusion, you really should be.

[... 1,443 words]

2022 » August

MTWTFSS
1234567
891011121314
15161718192021
22232425262728
293031