<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: litestream</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/litestream.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2026-01-15T16:08:27+00:00</updated><author><name>Simon Willison</name></author><entry><title>The Design &amp; Implementation of Sprites</title><link href="https://simonwillison.net/2026/Jan/15/the-design-implementation-of-sprites/#atom-tag" rel="alternate"/><published>2026-01-15T16:08:27+00:00</published><updated>2026-01-15T16:08:27+00:00</updated><id>https://simonwillison.net/2026/Jan/15/the-design-implementation-of-sprites/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://fly.io/blog/design-and-implementation/"&gt;The Design &amp;amp; Implementation of Sprites&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I &lt;a href="https://simonwillison.net/2026/Jan/9/sprites-dev/"&gt;wrote about Sprites last week&lt;/a&gt;. Here's Thomas Ptacek from Fly with the insider details on how they work under the hood.&lt;/p&gt;
&lt;p&gt;I like this framing of them as "disposable computers":&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Sprites are ball-point disposable computers. Whatever mark you mean to make, we’ve rigged it so you’re never more than a second or two away from having a Sprite to do it with.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I've noticed that new Fly Machines can take a while (up to around a minute) to provision. Sprites solve that by keeping warm pools of unused machines in multiple regions, which is enabled by them all using the same container:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Now, today, under the hood, Sprites are still Fly Machines. But they all run from a standard container. Every physical worker knows exactly what container the next Sprite is going to start with, so it’s easy for us to keep pools of “empty” Sprites standing by. The result: a Sprite create doesn’t have any heavy lifting to do; it’s basically just doing the stuff we do when we start a Fly Machine.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The most interesting detail is how the persistence layer works. Sprites only charge you for data you have written that differs from the base image and provide ~300ms checkpointing and restores - it turns out that's power by a custom filesystem on top of S3-compatible storage coordinated by Litestream-replicated local SQLite metadata:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We still exploit NVMe, but not as the root of storage. Instead, it’s a read-through cache for a blob on object storage. S3-compatible object stores are the most trustworthy storage technology we have. I can feel my blood pressure dropping just typing the words “Sprites are backed by object storage.” [...]&lt;/p&gt;
&lt;p&gt;The Sprite storage stack is organized around the JuiceFS model (in fact, we currently use a very hacked-up JuiceFS, with a rewritten SQLite metadata backend). It works by splitting storage into data (“chunks”) and metadata (a map of where the “chunks” are). Data chunks live on object stores; metadata lives in fast local storage. In our case, that metadata store is &lt;a href="https://litestream.io"&gt;kept durable with Litestream&lt;/a&gt;. Nothing depends on local storage.&lt;/p&gt;
&lt;/blockquote&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/tqbf/status/2011823480673624434"&gt;@tqbf&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/architecture"&gt;architecture&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sandboxing"&gt;sandboxing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/thomas-ptacek"&gt;thomas-ptacek&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/litestream"&gt;litestream&lt;/a&gt;&lt;/p&gt;



</summary><category term="architecture"/><category term="sandboxing"/><category term="sqlite"/><category term="thomas-ptacek"/><category term="fly"/><category term="litestream"/></entry><entry><title>Litestream v0.5.0 is Here</title><link href="https://simonwillison.net/2025/Oct/3/litestream/#atom-tag" rel="alternate"/><published>2025-10-03T15:10:21+00:00</published><updated>2025-10-03T15:10:21+00:00</updated><id>https://simonwillison.net/2025/Oct/3/litestream/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://fly.io/blog/litestream-v050-is-here/"&gt;Litestream v0.5.0 is Here&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I've been running &lt;a href="https://litestream.io"&gt;Litestream&lt;/a&gt; to backup SQLite databases in production for a couple of years now without incident. The new version has been a long time coming - Ben Johnson took &lt;a href="https://simonwillison.net/2022/Sep/21/introducing-litefs/"&gt;a detour&lt;/a&gt; into the FUSE-based &lt;a href="https://github.com/superfly/litefs"&gt;LiteFS&lt;/a&gt; before deciding that the single binary Litestream approach is more popular - and Litestream 0.5 just landed with this very detailed blog posts describing the improved architecture.&lt;/p&gt;
&lt;p&gt;SQLite stores data in pages - 4096 (by default) byte blocks of data. Litestream replicates modified pages to a backup location - usually object storage like S3.&lt;/p&gt;
&lt;p&gt;Most SQLite tables have an auto-incrementing primary key, which is used to decide which page the row's data should be stored in. This means sequential inserts to a small table are sent to the same page, which caused previous Litestream to replicate many slightly different copies of that page block in succession.&lt;/p&gt;
&lt;p&gt;The new LTX format - borrowed from LiteFS - addresses that by adding compaction, which Ben describes as follows:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We can use LTX compaction to compress a bunch of LTX files into a single file with no duplicated pages. And Litestream now uses this capability to create a hierarchy of compactions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;at Level 1, we compact all the changes in a 30-second time window&lt;/li&gt;
&lt;li&gt;at Level 2, all the Level 1 files in a 5-minute window&lt;/li&gt;
&lt;li&gt;at Level 3, all the Level 2’s over an hour.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Net result: we can restore a SQLite database to any point in time, &lt;em&gt;using only a dozen or so files on average&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I'm most looking forward to trying out the feature that isn't quite landed yet: read-replicas, implemented using a SQLite &lt;a href="https://www.sqlite.org/vfs.html"&gt;VFS extension&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The next major feature we’re building out is a Litestream VFS for read replicas. This will let you instantly spin up a copy of the database and immediately read pages from S3 while the rest of the database is hydrating in the background.&lt;/p&gt;
&lt;/blockquote&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=45453936"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/litestream"&gt;litestream&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ben-johnson"&gt;ben-johnson&lt;/a&gt;&lt;/p&gt;



</summary><category term="sqlite"/><category term="fly"/><category term="litestream"/><category term="ben-johnson"/></entry><entry><title>files-to-prompt 0.5</title><link href="https://simonwillison.net/2025/Feb/14/files-to-prompt/#atom-tag" rel="alternate"/><published>2025-02-14T04:14:21+00:00</published><updated>2025-02-14T04:14:21+00:00</updated><id>https://simonwillison.net/2025/Feb/14/files-to-prompt/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/files-to-prompt/releases/tag/0.5"&gt;files-to-prompt 0.5&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
My &lt;code&gt;files-to-prompt&lt;/code&gt; tool (&lt;a href="https://simonwillison.net/2024/Apr/8/files-to-prompt/"&gt;originally built using Claude 3 Opus back in April&lt;/a&gt;) had been accumulating a bunch of issues and PRs - I finally got around to spending some time with it and pushed a fresh release:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;New &lt;code&gt;-n/--line-numbers&lt;/code&gt; flag for including line numbers in the output. Thanks, &lt;a href="https://github.com/danclaytondev"&gt;Dan Clayton&lt;/a&gt;. &lt;a href="https://github.com/simonw/files-to-prompt/pull/38"&gt;#38&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Fix for utf-8 handling on Windows. Thanks, &lt;a href="https://github.com/david-jarman"&gt;David Jarman&lt;/a&gt;. &lt;a href="https://github.com/simonw/files-to-prompt/pull/36"&gt;#36&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;--ignore&lt;/code&gt; patterns are now matched against directory names as well as file names, unless you pass the new &lt;code&gt;--ignore-files-only&lt;/code&gt; flag. Thanks, &lt;a href="https://github.com/nmpowell"&gt;Nick Powell&lt;/a&gt;. &lt;a href="https://github.com/simonw/files-to-prompt/pull/30"&gt;#30&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;I use this tool myself on an almost daily basis - it's fantastic for quickly answering questions about code. Recently I've been plugging it into Gemini 2.0 with its 2 million token context length, running recipes like this one:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;git clone https://github.com/bytecodealliance/componentize-py
cd componentize-py
files-to-prompt . -c | llm -m gemini-2.0-pro-exp-02-05 \
  -s 'How does this work? Does it include a python compiler or AST trick of some sort?'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I ran that question against the &lt;a href="https://github.com/bytecodealliance/componentize-py"&gt;bytecodealliance/componentize-py&lt;/a&gt; repo - which provides a tool for turning Python code into compiled WASM - and got &lt;a href="https://gist.github.com/simonw/a9d72e7f903417fb49e1d7a531ee8f97"&gt;this really useful answer&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here's another example. I decided to have o3-mini review how Datasette handles concurrent SQLite connections from async Python code - so I ran this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;git clone https://github.com/simonw/datasette
cd datasette/datasette
files-to-prompt database.py utils/__init__.py -c | \
  llm -m o3-mini -o reasoning_effort high \
  -s 'Output in markdown a detailed analysis of how this code handles the challenge of running SQLite queries from a Python asyncio application. Explain how it works in the first section, then explore the pros and cons of this design. In a final section propose alternative mechanisms that might work better.'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here's &lt;a href="https://gist.github.com/simonw/76c8c433f4a65cf01a5c9121453683ab"&gt;the result&lt;/a&gt;. It did an extremely good job of explaining how my code works - despite being fed just the Python and none of the other documentation. Then it made some solid recommendations for potential alternatives.&lt;/p&gt;
&lt;p&gt;I added a couple of follow-up questions (using &lt;code&gt;llm -c&lt;/code&gt;) which resulted in &lt;a href="https://gist.github.com/simonw/76c8c433f4a65cf01a5c9121453683ab?permalink_comment_id=5438685#gistcomment-5438685"&gt;a full working prototype&lt;/a&gt; of an alternative threadpool mechanism, plus &lt;a href="https://gist.github.com/simonw/76c8c433f4a65cf01a5c9121453683ab?permalink_comment_id=5438691#gistcomment-5438691"&gt;some benchmarks&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;One final example: I decided to see if there were any undocumented features in &lt;a href="https://litestream.io/"&gt;Litestream&lt;/a&gt;, so I checked out the repo and ran a prompt against just the &lt;code&gt;.go&lt;/code&gt; files in that project:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;git clone https://github.com/benbjohnson/litestream
cd litestream
files-to-prompt . -e go -c | llm -m o3-mini \
  -s 'Write extensive user documentation for this project in markdown'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once again, o3-mini provided a &lt;a href="https://gist.github.com/simonw/cbf339032f99fee72af5fd5455bc7235"&gt;really impressively detailed&lt;/a&gt; set of unofficial documentation derived purely from reading the source.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/async"&gt;async&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/litestream"&gt;litestream&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gemini"&gt;gemini&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/files-to-prompt"&gt;files-to-prompt&lt;/a&gt;&lt;/p&gt;



</summary><category term="async"/><category term="projects"/><category term="python"/><category term="sqlite"/><category term="ai"/><category term="datasette"/><category term="webassembly"/><category term="litestream"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="llm"/><category term="gemini"/><category term="llm-reasoning"/><category term="files-to-prompt"/></entry><entry><title>Zero-latency SQLite storage in every Durable Object</title><link href="https://simonwillison.net/2024/Oct/13/zero-latency-sqlite-storage-in-every-durable-object/#atom-tag" rel="alternate"/><published>2024-10-13T22:26:49+00:00</published><updated>2024-10-13T22:26:49+00:00</updated><id>https://simonwillison.net/2024/Oct/13/zero-latency-sqlite-storage-in-every-durable-object/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://blog.cloudflare.com/sqlite-in-durable-objects/"&gt;Zero-latency SQLite storage in every Durable Object&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Kenton Varda introduces the next iteration of Cloudflare's &lt;a href="https://developers.cloudflare.com/durable-objects/"&gt;Durable Object&lt;/a&gt; platform, which recently upgraded from a key/value store to a full relational system based on SQLite.&lt;/p&gt;
&lt;p&gt;For useful background on the first version of Durable Objects take a look at &lt;a href="https://digest.browsertech.com/archive/browsertech-digest-cloudflares-durable/"&gt;Cloudflare's durable multiplayer moat&lt;/a&gt; by Paul Butler, who digs into its popularity for building WebSocket-based realtime collaborative applications.&lt;/p&gt;
&lt;p&gt;The new SQLite-backed Durable Objects is a fascinating piece of distributed system design, which advocates for a really interesting way to architect a large scale application.&lt;/p&gt;
&lt;p&gt;The key idea behind Durable Objects is to colocate application logic with the data it operates on. A Durable Object comprises code that executes on the same physical host as the SQLite database that it uses, resulting in blazingly fast read and write performance.&lt;/p&gt;
&lt;p&gt;How could this work at scale?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A single object is inherently limited in throughput since it runs on a single thread of a single machine. To handle more traffic, you create more objects. This is easiest when different objects can handle different logical units of state (like different documents, different users, or different "shards" of a database), where each unit of state has low enough traffic to be handled by a single object&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Kenton presents the example of a flight booking system, where each flight can map to a dedicated Durable Object with its own SQLite database - thousands of fresh databases per airline per day.&lt;/p&gt;
&lt;p&gt;Each DO has a unique name, and Cloudflare's network then handles routing requests to that object wherever it might live on their global network.&lt;/p&gt;
&lt;p&gt;The technical details are fascinating. Inspired by &lt;a href="https://litestream.io/"&gt;Litestream&lt;/a&gt;, each DO constantly streams a sequence of WAL entries to object storage - batched every 16MB or every ten seconds. This also enables point-in-time recovery for up to 30 days through replaying those logged transactions.&lt;/p&gt;
&lt;p&gt;To ensure durability within that ten second window, writes are also forwarded to five replicas in separate nearby data centers as soon as they commit, and the write is only acknowledged once three of them have confirmed it.&lt;/p&gt;
&lt;p&gt;The JavaScript API design is interesting too: it's blocking rather than async, because the whole point of the design is to provide fast single threaded persistence operations:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;let&lt;/span&gt; &lt;span class="pl-s1"&gt;docs&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;sql&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;exec&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;`&lt;/span&gt;
&lt;span class="pl-s"&gt;  SELECT title, authorId FROM documents&lt;/span&gt;
&lt;span class="pl-s"&gt;  ORDER BY lastModified DESC&lt;/span&gt;
&lt;span class="pl-s"&gt;  LIMIT 100&lt;/span&gt;
&lt;span class="pl-s"&gt;`&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;toArray&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;

&lt;span class="pl-k"&gt;for&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-k"&gt;let&lt;/span&gt; &lt;span class="pl-s1"&gt;doc&lt;/span&gt; &lt;span class="pl-k"&gt;of&lt;/span&gt; &lt;span class="pl-s1"&gt;docs&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
  &lt;span class="pl-s1"&gt;doc&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;authorName&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;sql&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;exec&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;
    &lt;span class="pl-s"&gt;"SELECT name FROM users WHERE id = ?"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
    &lt;span class="pl-s1"&gt;doc&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;authorId&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;one&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;name&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This one of their examples deliberately exhibits the N+1 query pattern, because that's something SQLite is &lt;a href="https://www.sqlite.org/np1queryprob.html"&gt;uniquely well suited to handling&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The system underlying Durable Objects is called Storage Relay Service, and it's been powering Cloudflare's existing-but-different &lt;a href="https://developers.cloudflare.com/d1/"&gt;D1 SQLite system&lt;/a&gt; for over a year.&lt;/p&gt;
&lt;p&gt;I was curious as to where the objects are created. &lt;a href="https://developers.cloudflare.com/durable-objects/reference/data-location/#provide-a-location-hint"&gt;According to this&lt;/a&gt; (via &lt;a href="https://news.ycombinator.com/item?id=41832547#41832812"&gt;Hacker News&lt;/a&gt;):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Durable Objects do not currently change locations after they are created. By default, a Durable Object is instantiated in a data center close to where the initial &lt;code&gt;get()&lt;/code&gt; request is made. [...] To manually create Durable Objects in another location, provide an optional &lt;code&gt;locationHint&lt;/code&gt; parameter to &lt;code&gt;get()&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And in a footnote:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Dynamic relocation of existing Durable Objects is planned for the future.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;a href="https://where.durableobjects.live/"&gt;where.durableobjects.live&lt;/a&gt; is a neat site that tracks where in the Cloudflare network DOs are created - I just visited it and it said:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This page tracks where new Durable Objects are created; for example, when you loaded this page from &lt;strong&gt;Half Moon Bay&lt;/strong&gt;, a worker in &lt;strong&gt;San Jose, California, United States (SJC)&lt;/strong&gt; created a durable object in &lt;strong&gt;San Jose, California, United States (SJC)&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;img alt="Where Durable Objects Live.    Created by the wonderful Jed Schmidt, and now maintained with ❤️ by Alastair. Source code available on Github.    Cloudflare Durable Objects are a novel approach to stateful compute based on Cloudflare Workers. They aim to locate both compute and state closest to end users.    This page tracks where new Durable Objects are created; for example, when you loaded this page from Half Moon Bay, a worker in San Jose, California, United States (SJC) created a durable object in Los Angeles, California, United States (LAX).    Currently, Durable Objects are available in 11.35% of Cloudflare PoPs.    To keep data fresh, this application is constantly creating/destroying new Durable Objects around the world. In the last hour, 394,046 Durable Objects have been created(and subsequently destroyed), FOR SCIENCE!    And a map of the world showing lots of dots." src="https://static.simonwillison.net/static/2024/where-durable-objects.jpg" /&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://lobste.rs/s/kjx2vk/zero_latency_sqlite_storage_every"&gt;lobste.rs&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/scaling"&gt;scaling&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/websockets"&gt;websockets&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/software-architecture"&gt;software-architecture&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cloudflare"&gt;cloudflare&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/litestream"&gt;litestream&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/kenton-varda"&gt;kenton-varda&lt;/a&gt;&lt;/p&gt;



</summary><category term="scaling"/><category term="sqlite"/><category term="websockets"/><category term="software-architecture"/><category term="cloudflare"/><category term="litestream"/><category term="kenton-varda"/></entry><entry><title>Introducing datasette-litestream: easy replication for SQLite databases in Datasette</title><link href="https://simonwillison.net/2023/Sep/13/datasette-litestream/#atom-tag" rel="alternate"/><published>2023-09-13T19:28:37+00:00</published><updated>2023-09-13T19:28:37+00:00</updated><id>https://simonwillison.net/2023/Sep/13/datasette-litestream/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.datasette.cloud/blog/2023/datasette-litestream/"&gt;Introducing datasette-litestream: easy replication for SQLite databases in Datasette&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
We use Litestream on Datasette Cloud for streaming backups of user data to S3. Alex Garcia extracted out our implementation into a standalone Datasette plugin, which bundles the Litestream Go binary (for the relevant platform) in the package you get when you run “datasette install datasette-litestream”—so now Datasette has a very robust answer to questions about SQLite disaster recovery beyond just the Datasette Cloud platform.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/plugins"&gt;plugins&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-cloud"&gt;datasette-cloud&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/litestream"&gt;litestream&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/alex-garcia"&gt;alex-garcia&lt;/a&gt;&lt;/p&gt;



</summary><category term="plugins"/><category term="sqlite"/><category term="datasette"/><category term="datasette-cloud"/><category term="litestream"/><category term="alex-garcia"/></entry><entry><title>Stringing together several free tiers to host an application with zero cost using fly.io, Litestream and Cloudflare</title><link href="https://simonwillison.net/2022/Oct/7/fly-cloudflare/#atom-tag" rel="alternate"/><published>2022-10-07T17:47:34+00:00</published><updated>2022-10-07T17:47:34+00:00</updated><id>https://simonwillison.net/2022/Oct/7/fly-cloudflare/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://blog.dahl.dev/posts/stringing-together-several-free-tiers-to-host-an-application-with-zero-cost/"&gt;Stringing together several free tiers to host an application with zero cost using fly.io, Litestream and Cloudflare&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Alexander Dahl provides a detailed description (and code) for his current preferred free hosting solution for small sites: SQLite (and a Go application) running on Fly’s free tier, with the database replicated up to Cloudflare’s R2 object storage (again on a free tier) by Litestream.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/hosting"&gt;hosting&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cloudflare"&gt;cloudflare&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/litestream"&gt;litestream&lt;/a&gt;&lt;/p&gt;



</summary><category term="hosting"/><category term="sqlite"/><category term="cloudflare"/><category term="fly"/><category term="litestream"/></entry><entry><title>Introducing LiteFS</title><link href="https://simonwillison.net/2022/Sep/21/introducing-litefs/#atom-tag" rel="alternate"/><published>2022-09-21T18:56:42+00:00</published><updated>2022-09-21T18:56:42+00:00</updated><id>https://simonwillison.net/2022/Sep/21/introducing-litefs/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://fly.io/blog/introducing-litefs/"&gt;Introducing LiteFS&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
LiteFS is the new SQLite replication solution from Fly, now ready for beta testing. It’s from the same author as Litestream but has a very different architecture; LiteFS works by implementing a custom FUSE filesystem which spies on SQLite transactions being written to the journal file and forwards them on to other nodes in the cluster, providing full read-replication. The signature Litestream feature of streaming a backup to S3 should be coming within the next few months.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=32925734"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/databases"&gt;databases&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/replication"&gt;replication&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/litestream"&gt;litestream&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ben-johnson"&gt;ben-johnson&lt;/a&gt;&lt;/p&gt;



</summary><category term="databases"/><category term="replication"/><category term="sqlite"/><category term="fly"/><category term="litestream"/><category term="ben-johnson"/></entry><entry><title>Litestream backups for Datasette Cloud (and weeknotes)</title><link href="https://simonwillison.net/2022/Aug/11/litestream-backups/#atom-tag" rel="alternate"/><published>2022-08-11T17:19:52+00:00</published><updated>2022-08-11T17:19:52+00:00</updated><id>https://simonwillison.net/2022/Aug/11/litestream-backups/#atom-tag</id><summary type="html">
    &lt;p&gt;My main focus this week has been adding robust backups to the forthcoming Datasette Cloud.&lt;/p&gt;
&lt;p&gt;Datasette Cloud is a SaaS service for &lt;a href="https://datasette.io"&gt;Datasette&lt;/a&gt;. It allows people to create a private Datasette instance where they can upload data, visualize and transform it and share it with other members of their team. You can join the waiting list to try it out using &lt;a href="https://docs.google.com/forms/d/e/1FAIpQLSeMRYHBHXlWGDkjCqhAOinDrKEVwqNR5GfPs3iEXn8LzDT2Qg/viewform?embedded=true"&gt;this form&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I'm building Datastte Cloud on &lt;a href="https://fly.io/"&gt;Fly&lt;/a&gt;, specifically on &lt;a href="https://fly.io/blog/fly-machines/"&gt;Fly Machines&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Security is a big concern for Datasette Cloud. Teams should only be able to access their own data - bugs where users accidentally (or maliciously) access data for another team should be protected against as much as possible.&lt;/p&gt;
&lt;p&gt;To help guarantee that, I've designed Datasette Cloud so that each team gets their own, dedicated instance, running in a &lt;a href="https://firecracker-microvm.github.io/"&gt;Firecracker VM&lt;/a&gt; managed by Fly. Their data lives in a dedicated volume.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://fly.io/docs/reference/volumes/"&gt;Fly volumes&lt;/a&gt; already implement snapshot backups, but I'm interested in defence in depth. This is where &lt;a href="https://litestream.io/"&gt;Litestream&lt;/a&gt; comes in (coincidentally now &lt;a href="https://fly.io/blog/all-in-on-sqlite-litestream/"&gt;part of Fly&lt;/a&gt;, although it wasn't when I first selected it as my backup strategy).&lt;/p&gt;
&lt;p&gt;I'm using Litestream to constantly backup the data for each Datasette Cloud team to an S3 bucket. In the case of a complete failure of a volume, I can restore data from a backup that should be at most a few seconds out of date. Litestream also gives me point-in-time backups, such that I can recover a previous version of the data within a configurable retention window.&lt;/p&gt;
&lt;h4&gt;Keeping backups isolated&lt;/h4&gt;
&lt;p&gt;Litestream &lt;a href="https://litestream.io/how-it-works/"&gt;works&lt;/a&gt; by writing a constant stream of pages from SQLite's WAL (Write-Ahead Log) up to an S3 bucket. It needs the ability to both read and write from S3.&lt;/p&gt;
&lt;p&gt;This requires making S3 credentials available within the containers that run Datasette and Litestream for each team account.&lt;/p&gt;
&lt;p&gt;Credentials in those containers are not visible to the users of the software, but I still wanted to be confident that if the credentials leaked in some way the isolation between teams would be maintained.&lt;/p&gt;
&lt;p&gt;Initially I thought about having a separate S3 bucket for each team, but it turns out AWS has a default limit of 100 buckets per account, and a hard limit of 1,000. I aspire to have more than 1,000 customers, so this limit makes a bucket-per-team seem like the wrong solution.&lt;/p&gt;
&lt;p&gt;I've learned an absolute ton about S3 and AWS permissions building my &lt;a href="https://github.com/simonw/s3-credentials"&gt;s3-credentials&lt;/a&gt; tool for creating credentials for accessing S3.&lt;/p&gt;
&lt;p&gt;One of the tricks I've learned is that it's possible to create temporary, time-limited credentials that &lt;a href="https://simonwillison.net/2022/Jan/18/weeknotes/"&gt;only work for a prefix&lt;/a&gt; (effectively a folder) within an S3 bucket.&lt;/p&gt;
&lt;p&gt;This means I can run Litestream with credentials that are specific to the team - that can read and write only from the &lt;code&gt;team-ID/&lt;/code&gt; prefix in the S3 bucket I am using to store the backups.&lt;/p&gt;
&lt;h4&gt;Obtaining temporary credentials&lt;/h4&gt;
&lt;p&gt;My &lt;code&gt;s3-credentials&lt;/code&gt; tool can create credentials for a prefix within an S3 bucket like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;s3-credentials create my-bucket-for-backus \
  --duration 12h \
  --prefix team-56/
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This command uses the &lt;a href="https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sts.html#STS.Client.assume_role"&gt;sts.assume_role()&lt;/a&gt; AWS method to create credentials that allow access to that bucket, attaching &lt;a href="https://github.com/simonw/s3-credentials/blob/0.12.1/README.md#--prefix-my-prefix"&gt;this generated JSON policy&lt;/a&gt; to it in order to restrict access to the provided prefix.&lt;/p&gt;
&lt;p&gt;I extracted the relevant Python code from  &lt;code&gt;s3-credentials&lt;/code&gt; and used it to create a private API endpoint in my Datasette Cloud management server which could return the temporary credentials needed by the team container.&lt;/p&gt;
&lt;p&gt;With the endpoint in place, my code for launching a team container can do this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Create the volume and machine for that team (if they do not yet exist)&lt;/li&gt;
&lt;li&gt;Generate a signed secret token that the machine container can exchange for its S3 credentials&lt;/li&gt;
&lt;li&gt;Launch the machine container, passing it the secret token&lt;/li&gt;
&lt;li&gt;On launch, the container runs a script which exchanges that secret token for its 12 hour S3 credentials, using the private API endpoint I created&lt;/li&gt;
&lt;li&gt;Those credentials are used to populate the &lt;code&gt;AWS_ACCESS_KEY_ID&lt;/code&gt;, &lt;code&gt;AWS_SECRET_ACCESS_KEY&lt;/code&gt; and &lt;code&gt;AWS_SESSION_TOKEN&lt;/code&gt; environment variables used by Litestream&lt;/li&gt;
&lt;li&gt;Start Litestream, which then starts Datasette&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Restarting every 12 hours&lt;/h4&gt;
&lt;p&gt;You may be wondering why I bothered with that initial secret token - why not just pass the temporary AWS credentials to the container when I launch it?&lt;/p&gt;
&lt;p&gt;The reason for this is that I need to be able to obtain fresh credentials every 12 hours.&lt;/p&gt;
&lt;p&gt;A really neat feature of Fly Machines is that they support scale-to-zero. You can stop them, and Fly will automatically restart them the next time they recieve traffic.&lt;/p&gt;
&lt;p&gt;All you need to do is call &lt;code&gt;sys.exit(0)&lt;/code&gt; in your Python code (or the equivalent in any other language) and Fly will stop your container... and then restart it again with a couple of seconds of cold start time the next time an HTTP request for your container hits the Fly router.&lt;/p&gt;
&lt;p&gt;So far I'm mainly using this to avoid the cost of running containers when they aren't actually in- use. But there's a neat benefit when it comes to Litestream too.&lt;/p&gt;
&lt;p&gt;I'm using S3 credentials which expire after 12 hours. This means I need to periodically refresh the credentials and restart Litestream or it will stop being able to write to the S3 bucket.&lt;/p&gt;
&lt;p&gt;After considering a few ways of doing this, I selected the simplest to implement: have Datasette call &lt;code&gt;sys.exit(0)&lt;/code&gt; after ten hours, and let Fly restart the container causing my startup script to fetch freshly generated 12 hour credentials and pass them to Litestream.&lt;/p&gt;
&lt;p&gt;I implemented this by adding it as a new setting to my existing &lt;a href="https://datasette.io/plugins/datasette-scale-to-zero"&gt;datasette-scale-to-zero&lt;/a&gt; plugin. You can now configure that with &lt;code&gt; "max-age": "10h"&lt;/code&gt; and it will shut down Datasette once the server has been running for that long.&lt;/p&gt;
&lt;p&gt;Why does this require my own secret token system? Because when the container is restarted, it needs to make an authenticated call to my endpoint to retrieve those fresh S3 credentials. Fly persists environment variable secrets between restarts to the container, so that secret can be long-lived even while it is exchanged for short-term S3 credentials.&lt;/p&gt;
&lt;p&gt;I only just put the new backup system in place, so I'm exercising it a bit before I open things up to trial users - but so far it's looking like a very robust solution to the problem.&lt;/p&gt;
&lt;h4&gt;s3-ocr improvements&lt;/h4&gt;
&lt;p&gt;I released a &lt;a href="https://datasette.io/tools/s3-ocr"&gt;few new versions&lt;/a&gt; of &lt;a href="https://simonwillison.net/2022/Jun/30/s3-ocr/"&gt;s3-ocr&lt;/a&gt; this week, as part of my ongoing project working with the San Francisco Microscopical Society team to release a searchable version of their scanned document archives.&lt;/p&gt;
&lt;p&gt;The two main improvements are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A new &lt;code&gt;--dry-run&lt;/code&gt; option to &lt;code&gt;s3-ocr start&lt;/code&gt; which shows you what the tool will do without making any changes to your S3 bucket, or triggering any OCR jobs. &lt;a href="https://github.com/simonw/s3-ocr/issues/22"&gt;#22&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;s3-ocr start&lt;/code&gt; used to fail with an error if running it would create more than 100 (or 600 depending on your region) concurrent OCR jobs. The tool now knows how to identify that error and pause and retry starting the jobs instead. &lt;a href="https://github.com/simonw/s3-ocr/issues/21"&gt;#21&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The fix that took the most time is this: installations of the tool no longer arbitrarily fail to work depending on the environment you install them into!&lt;/p&gt;
&lt;p&gt;Solving this took me the best part of a day. The short version is this: Click 8.1.0 introduced a new feature that lets you use &lt;code&gt;@cli.command&lt;/code&gt; as a decorator instead of &lt;code&gt;@cli.command()&lt;/code&gt;. This meant that installing &lt;code&gt;s3-ocr&lt;/code&gt; in an environment that already had a previous version of Click would result in silent errors.&lt;/p&gt;
&lt;p&gt;The solution is simple: pin to &lt;code&gt;click&amp;gt;=8.1.0&lt;/code&gt; in the project dependencies if you plan to use this new syntax.&lt;/p&gt;
&lt;p&gt;If I'd read the Click &lt;a href="https://click.palletsprojects.com/en/8.1.x/changes/#version-8-1-0"&gt;changelog more closely&lt;/a&gt; I would have saved myself a whole lot of time.&lt;/p&gt;
&lt;p&gt;Issues &lt;a href="https://github.com/simonw/s3-ocr/issues/25"&gt;#25&lt;/a&gt; and &lt;a href="https://github.com/simonw/s3-ocr/issues/26"&gt;#26&lt;/a&gt; detail the many false turns I took trying to figure this out.&lt;/p&gt;
&lt;h4&gt;More fun with GPT-3 and DALL-E&lt;/h4&gt;
&lt;p&gt;This tweet scored over a million impressions on Twitter:&lt;/p&gt;

&lt;blockquote class="twitter-tweet"&gt;&lt;p lang="en" dir="ltr"&gt;New hobby: prototyping video games in 60 seconds using a combination of GPT-3 and DALL-E&lt;br /&gt;&lt;br /&gt;Here&amp;#39;s &amp;quot;Raccoon Heist&amp;quot; &lt;a href="https://t.co/xQ3Vm8p2XW"&gt;pic.twitter.com/xQ3Vm8p2XW&lt;/a&gt;&lt;/p&gt;- Simon Willison (@simonw) &lt;a href="https://twitter.com/simonw/status/1555626060384911360?ref_src=twsrc%5Etfw"&gt;August 5, 2022&lt;/a&gt;&lt;/blockquote&gt;

&lt;p&gt;As this got retweeted outside of my usual circles it started confusing people who thought the "prototype" was a working game, as opposed to a fake screenshot and a paragraph of descriptive text! I wasn't kidding when I said I spent 60 seconds on this.&lt;/p&gt;
&lt;p&gt;I also figured out how to use GPT-3 to write &lt;code&gt;jq&lt;/code&gt; one-liners. I love &lt;code&gt;jq&lt;/code&gt; but I have to look up how to use it every time, so having GPT-3 do the work for me is a pretty neat time saver. More on that in this TIL: &lt;a href="https://til.simonwillison.net/gpt3/jq"&gt;Using GPT-3 to figure out jq recipes&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;Releases this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/s3-ocr"&gt;s3-ocr&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/s3-ocr/releases/tag/0.6.3"&gt;0.6.3&lt;/a&gt; - (&lt;a href="https://github.com/simonw/s3-ocr/releases"&gt;9 releases total&lt;/a&gt;) - 2022-08-10
&lt;br /&gt;Tools for running OCR against files stored in S3&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-scale-to-zero"&gt;datasette-scale-to-zero&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-scale-to-zero/releases/tag/0.2"&gt;0.2&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-scale-to-zero/releases"&gt;4 releases total&lt;/a&gt;) - 2022-08-05
&lt;br /&gt;Quit Datasette if it has not received traffic for a specified time period&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/shot-scraper"&gt;shot-scraper&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/shot-scraper/releases/tag/0.14.3"&gt;0.14.3&lt;/a&gt; - (&lt;a href="https://github.com/simonw/shot-scraper/releases"&gt;18 releases total&lt;/a&gt;) - 2022-08-02
&lt;br /&gt;A command-line utility for taking automated screenshots of websites&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/s3-credentials"&gt;s3-credentials&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/s3-credentials/releases/tag/0.12.1"&gt;0.12.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/s3-credentials/releases"&gt;13 releases total&lt;/a&gt;) - 2022-08-01
&lt;br /&gt;A tool for creating credentials for accessing S3 buckets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-sqlite-fts4"&gt;datasette-sqlite-fts4&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-sqlite-fts4/releases/tag/0.3.2"&gt;0.3.2&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-sqlite-fts4/releases"&gt;2 releases total&lt;/a&gt;) - 2022-07-31&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;TIL this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/sqlite/related-content"&gt;Related content with SQLite FTS and a Datasette template function&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/aws/boto-command-line"&gt;Using boto3 from the command line&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/sqlite/trying-macos-extensions"&gt;Trying out SQLite extensions on macOS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/pytest/mocking-boto"&gt;Mocking a Textract LimitExceededException with boto&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/gpt3/jq"&gt;Using GPT-3 to figure out jq recipes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ocr"&gt;ocr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/s3"&gt;s3&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-cloud"&gt;datasette-cloud&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/litestream"&gt;litestream&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-3"&gt;gpt-3&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/dalle"&gt;dalle&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ocr"/><category term="s3"/><category term="datasette"/><category term="weeknotes"/><category term="datasette-cloud"/><category term="fly"/><category term="litestream"/><category term="gpt-3"/><category term="dalle"/></entry><entry><title>Litestream: Live Read Replication</title><link href="https://simonwillison.net/2022/Apr/13/litestream-live-read-replication/#atom-tag" rel="alternate"/><published>2022-04-13T02:04:57+00:00</published><updated>2022-04-13T02:04:57+00:00</updated><id>https://simonwillison.net/2022/Apr/13/litestream-live-read-replication/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://tip.litestream.io/guides/read-replica/"&gt;Litestream: Live Read Replication&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The documentation for the read replication implemented in the latest Litestream beta (v0.4.0-beta.2). The design is really simple and clever: the primary runs a web server on a port, and replica instances can then be started with a configured URL pointing to the IP and port of the primary. That’s all it takes to have a SQLite database replicated to multiple hosts, each of which can then conduct read queries against their local copies.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/simonw/status/1513890372920950791"&gt;@simonw&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/replication"&gt;replication&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/litestream"&gt;litestream&lt;/a&gt;&lt;/p&gt;



</summary><category term="replication"/><category term="sqlite"/><category term="litestream"/></entry><entry><title>SQLite Happy Hour - a Twitter Spaces conversation about three interesting projects building on SQLite</title><link href="https://simonwillison.net/2022/Mar/23/sqlite-happy-hour/#atom-tag" rel="alternate"/><published>2022-03-23T22:01:07+00:00</published><updated>2022-03-23T22:01:07+00:00</updated><id>https://simonwillison.net/2022/Mar/23/sqlite-happy-hour/#atom-tag</id><summary type="html">
    &lt;p&gt;Yesterday I hosted SQLite Happy Hour. my first conversation using Twitter Spaces. The idea was to dig into three different projects that were doing interesting things on top of SQLite. I think it worked pretty well, and I'm curious to explore this format more in the future.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://twitter.com/simonw/status/1504604448202518529"&gt;the tweet&lt;/a&gt; that initially promoted the space:&lt;/p&gt;

&lt;blockquote class="twitter-tweet"&gt;&lt;p lang="en" dir="ltr"&gt;Interested in devious tricks to push the envelope of what you can do with SQLite?&lt;br /&gt;&lt;br /&gt;Join myself, &lt;a href="https://twitter.com/benbjohnson?ref_src=twsrc%5Etfw"&gt;@benbjohnson&lt;/a&gt;, &lt;a href="https://twitter.com/geoffreylitt?ref_src=twsrc%5Etfw"&gt;@geoffreylitt&lt;/a&gt; and &lt;a href="https://twitter.com/nschiefer?ref_src=twsrc%5Etfw"&gt;@nschiefer&lt;/a&gt; on Tuesday for a Twitter Spaces conversation about &lt;a href="https://twitter.com/litestream?ref_src=twsrc%5Etfw"&gt;@litestream&lt;/a&gt;, &lt;a href="https://twitter.com/datasetteproj?ref_src=twsrc%5Etfw"&gt;@datasetteproj&lt;/a&gt; and Riffle!&lt;a href="https://t.co/ukRMVgC09u"&gt;https://t.co/ukRMVgC09u&lt;/a&gt;&lt;/p&gt;- Simon Willison (@simonw) &lt;a href="https://twitter.com/simonw/status/1504604448202518529?ref_src=twsrc%5Etfw"&gt;March 17, 2022&lt;/a&gt;&lt;/blockquote&gt;

&lt;p&gt;My co-hosts, representing the three projects, were:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Ben Johnson &lt;a href="https://twitter.com/benbjohnson"&gt;@benbjohnson&lt;/a&gt;, creator of &lt;a href="https://litestream.io/"&gt;Litestream&lt;/a&gt; - a tool that adds replication to SQLite built on top of the WAL mechanism.&lt;/li&gt;
&lt;li&gt;Geoffrey Litt &lt;a href="https://twitter.com/geoffreylitt"&gt;@hgeoffreylitt&lt;/a&gt; and Nicholas Schiefer &lt;a href="https://twitter.com/nschiefer"&gt;@nschiefer&lt;/a&gt; who are working on Riffle, a project exploring the idea of driving reactive user interfaces using SQL queries - see &lt;a href="https://riffle.systems/essays/prelude/"&gt;Building data-centric apps with a reactive relational database&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Twitter Spaces recorded audio only lasts for 30 days, so I've exported the audio and shared it &lt;a href="https://soundcloud.com/simon-willison/sqlite-happy-hour-22nd-march-2022"&gt;on SoundCloud&lt;/a&gt; as well.&lt;/p&gt;

&lt;iframe width="100%" height="166" scrolling="no" frameborder="no" allow="autoplay" src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/1237563277&amp;amp;color=%23ff5500&amp;amp;auto_play=false&amp;amp;hide_related=false&amp;amp;show_comments=true&amp;amp;show_user=true&amp;amp;show_reposts=false&amp;amp;show_teaser=true"&gt; &lt;/iframe&gt;&lt;div style="font-size: 10px; color: #cccccc;line-break: anywhere;word-break: normal;overflow: hidden;white-space: nowrap;text-overflow: ellipsis; font-family: Interstate,Lucida Grande,Lucida Sans Unicode,Lucida Sans,Garuda,Verdana,Tahoma,sans-serif;font-weight: 100;"&gt;&lt;a href="https://soundcloud.com/simon-willison" title="Simon Willison" target="_blank" style="color: #cccccc; text-decoration: none;"&gt;Simon Willison&lt;/a&gt; · &lt;a href="https://soundcloud.com/simon-willison/sqlite-happy-hour-22nd-march-2022" title="SQLite Happy Hour - 22nd March 2022" target="_blank" style="color: #cccccc; text-decoration: none;"&gt;SQLite Happy Hour - 22nd March 2022&lt;/a&gt;&lt;/div&gt;

&lt;h4&gt;Collaborative notes from the session&lt;/h4&gt;
&lt;p&gt;Something I've observed in Twitter Spaces I've joined in the past is that they can really benefit from a dedicated back-channel, to share links and allow audience participation without people needing to first request to speak.&lt;/p&gt;
&lt;p&gt;A trick I've used with online talks I've given in the past is to start a collaborative Google Doc to collect shared notes and questions. I tried this for the Twitter Space, and it worked really well!&lt;/p&gt;
&lt;p&gt;You &lt;a href="https://docs.google.com/document/d/1ykZdd-Q_PF21N239T7EolOFSM-dHJUjjM2NVP5K26oQ/edit#"&gt;see that document here&lt;/a&gt;. During the session the document was open for anyone to edit - I've locked it down now that the session has ended.&lt;/p&gt;
&lt;p&gt;I've duplicated the final form of the document at the bottom of this post.&lt;/p&gt;
&lt;p&gt;Something I really like about this format is that it allows for additional material to be posted later. I spent some time adding more detailed answers to the questions about Datasette after the session had ended.&lt;/p&gt;
&lt;h4&gt;Thoughts for if I do this again&lt;/h4&gt;
&lt;p&gt;This was my first time hosting a space, and I learned a lot along the way.&lt;/p&gt;
&lt;p&gt;Firstly, this kind of thing works the best when there is a back and forth between the participants.&lt;/p&gt;
&lt;p&gt;My original idea was to have each project talk for ten minutes, then spend five minutes on discussion between the panel before moving on to the next project - and 15 minutes of open discussion at the end.&lt;/p&gt;
&lt;p&gt;My co-hosts suggested we try to make it more conversational, interrupting each other as we went along. We did that, and it worked much better: these conversations are far more interesting as a conversation than a monolog.&lt;/p&gt;
&lt;p&gt;I still don't have a great feel for when to interrupt people in an audio-only conversation, since unlike an in-person panel there are no visual clues to go off!&lt;/p&gt;
&lt;p&gt;Techology: it turns out Twitter Spaces has wildly different functionality on web v.s. mobile apps. We spent the first five minutes making sure all of our speakers could talk! We really should have done a tech rehearsal first, but I wasn't sure how to do that without accidentally broadcasting it to the world - maybe setup burner Twitter accounts for testing?&lt;/p&gt;
&lt;p&gt;Presenting audio-only is itself a challenge: I'm used to leaning on visual demos when I explain what Datasette is in a talk, and not having those to fall back on was challenging. I had jotted down notes on the main points I wanted to hit which certainly helped, but I think there's a whole new presenting skill here that I need to work harder to develop.&lt;/p&gt;
&lt;p&gt;Exporting the recorded audio from Twitter was frustrating but possible. I wrote some notes on how I did that &lt;a href="https://til.simonwillison.net/twitter/export-edit-twitter-spaces"&gt;in this TIL&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;Our collaborative notes in full&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;SQLite Happy Hour Twitter Space&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;22nd March 2022 - &lt;a href="https://www.worldtimebuddy.com/?qm=1&amp;amp;lid=5128581,5391959,5419384,2643743&amp;amp;h=5128581&amp;amp;date=2022-3-22&amp;amp;sln=15.5-16.5&amp;amp;hf=1"&gt;12:30pm PT / 1:30pm MT / 3:30pm ET&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Welcome to the SQLite Happy Hour! This hour-long session will feature three projects that are doing interesting things with SQLite. Each project will provide a ten minute overview, followed by five minutes of discussion from the panel. The last 15 minutes of the hour will be an open discussion and general Q&amp;amp;A.&lt;/p&gt;
&lt;p&gt;This document is open for anyone to edit. Please feel free to drop notes and questions in as we go along.&lt;/p&gt;
&lt;p&gt;The recording of the space is available here: &lt;a href="https://twitter.com/i/spaces/1ypKdEXvkMLGW"&gt;https://twitter.com/i/spaces/1ypKdEXvkMLGW&lt;/a&gt;&lt;/p&gt;
&lt;h5&gt;Riffle&lt;/h5&gt;
&lt;p&gt;Geoffrey Litt &lt;a href="https://twitter.com/geoffreylitt"&gt;@geoffreylitt&lt;/a&gt;, Nicholas Schiefer &lt;a href="https://twitter.com/nschiefer"&gt;@nschiefer&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Riffle asks: what if you wrote your whole UI as a query over a local database? So far, we've built a prototype using SQLite and React. More background in this paper:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://riffle.systems/essays/prelude/"&gt;Building data-centric apps with a reactive relational database&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Research project goal is to make development simpler, as opposed to the ongoing trend of more complexity.&lt;/p&gt;
&lt;p&gt;Riffle looks at having a database-centric mechanism at the heart of the view. Declarative queries could make apps easier to understand and debug.&lt;/p&gt;
&lt;p&gt;SQLite is the tool used for the prototype.&lt;/p&gt;
&lt;p&gt;Local first architecture: Ink &amp;amp; Switch have been promoting this. Return to a world where you local client device serves as a source of truth - you can access data offline etc - and when the network is available your data gets synced to the cloud.&lt;/p&gt;
&lt;p&gt;The prototype: a reactive layer that uses SQLite as a state management backend for React, using &lt;a href="https://sql.js.org/#/"&gt;https://sql.js.org/&lt;/a&gt; which compiles SQLite in WASM. Also built prototypes of desktop apps using &lt;a href="https://github.com/tauri-apps/tauri"&gt;https://github.com/tauri-apps/tauri&lt;/a&gt; - like Electron but using the system web browser instead of bundling its own.&lt;/p&gt;
&lt;p&gt;Since they control the writes, they can re-execute every query after any writes happen. SQLite is so fast that this works fine, queries all take under a ms and even with a thousand queries you can still run them all.&lt;/p&gt;
&lt;p&gt;ALL UI state is in the database - there's no local React component state - literally everything is in the database. This means all UI state is persistent by default.&lt;/p&gt;
&lt;p&gt;IndexedDB is used for the in-browser persistence. The Tauri desktop app stores to a file on disk. Maybe SQL.js could do that with the new Chrome filesystem API stuff too?&lt;/p&gt;
&lt;p&gt;Questions about Riffle:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Will Riffle target vanilla JS, or Node.js?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It's running client-side, so vanilla JS&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;From Stephen: What about browser-native UI state like scroll position, URL path, query string, multiple independent browser tabs, etc?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Great question. We do some syncing of browser-native state to put it in the DB: eg, to support virtualized list rendering we update scroll state in the DB with an event handler. But there's definitely some browser state that isn't being captured reliably. In the purest world, the pixels on your screen would be produced by a DB query :)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;From Predrag Gruevski: Would "query the queries" be a viable approach for narrowing the set of queries that need to be re-executed after a given write? Simple example: if table X gets modified, query for all queries that have table X in a FROM clause, then re-execute them.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;yeah, that's roughly the direction we're headed. It's a little trickier than that if you start having subqueries / materialized view, but good general idea&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;From Longwei Su: Right now, each db update will cause a whole refresh. Is there any plan to refine the binding? So that any db update will only trigger UI component that "subscribe" to this section of the data. Sqlite have trigger, which can have callback on record update. How to construct that  "publisher"-&amp;gt; "subscriber" mapping from sql query?&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Comments for Riffle:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;From Jesse - &lt;a href="http://web.dev/file-system-access/"&gt;http://web.dev/file-system-access/&lt;/a&gt; isn't a very rich api - I think you could persist to it, but I don't think you can seek/update/.../all the posix stuff sqlite probably needs&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Hasura &lt;a href="https://github.com/hasura/graphql-engine/blob/master/architecture/live-queries.md"&gt;documented&lt;/a&gt; how they do reactive queries with Postgres, might be useful for minimising refetch overhead?&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h5&gt;Datasette&lt;/h5&gt;
&lt;p&gt;Simon Willison &lt;a href="https://twitter.com/simonw"&gt;@simonw&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt; is an open source multi-tool for exploring and publishing data. It explores SQLite as a read-only mechanism for publishing structured data online in as flexible a manner as possible, and aims to build an ecosystem of plugins that can handle a wide range of exploratory data analysis challenges.&lt;/p&gt;
&lt;p&gt;Video introduction here: &lt;a href="https://simonwillison.net/2021/Feb/7/video/"&gt;https://simonwillison.net/2021/Feb/7/video/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Questions about Datasette:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;How does it compares with &lt;a href="https://github.com/dinedal/textql"&gt;https://github.com/dinedal/textql&lt;/a&gt;, it seems the same but instead of sqlite binaries, just raw csv files which are more ubiquitous, and easier to view and edit with with office software (msf excel, libreoffice calc) ?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;sqlite-utils memory provides similar functionality: &lt;a href="https://simonwillison.net/2021/Jun/19/sqlite-utils-memory/"&gt;https://simonwillison.net/2021/Jun/19/sqlite-utils-memory/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Does Datasette need to worry about SQLite's &lt;a href="https://www.sqlite.org/security.html"&gt;Defense Against the Dark Arts&lt;/a&gt; security guidelines?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Yes, absolutely! I've put a lot of work in there. Most importantly, Datasette enforces a time limit on queries, which cuts them off if they take more than a second.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The SQLite3 docs are sometimes light on examples for the tricky stuff (e.g., enabling WAL). What's your best sort of info beyond the official docs?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I've been publishing my own notes here: &lt;a href="https://til.simonwillison.net/sqlite"&gt;https://til.simonwillison.net/sqlite&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;The SQLite Forum is amazing - I ask questions on there and often get a reply from the maintainers within a few hours: &lt;a href="https://sqlite.org/forum/forummain"&gt;https://sqlite.org/forum/forummain&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;From Predrag Gruevski: Regarding learning curve, is a GraphQL web IDE (with syntax highlighting / autocomplete etc.) sufficiently user-friendly for folks more comfortable with a spreadsheet than a CLI tool or SQL?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Probably not! GraphQL requires thinking like a programmer too. I'm interested in helping people who aren't yet ready to learn any kind of programming language&lt;/li&gt;
&lt;li&gt;I have a plugin for Datasette that adds GraphQL with the GraphiQL user interface - demo here: &lt;a href="https://datasette-graphql-demo.datasette.io/graphql/github?query=%7B%0A%20%20issue_comments%20%7B%0A%20%20%20%20totalCount%0A%20%20%20%20pageInfo%20%7B%0A%20%20%20%20%20%20hasNextPage%0A%20%20%20%20%20%20endCursor%0A%20%20%20%20%7D%0A%20%20%20%20nodes%20%7B%0A%20%20%20%20%20%20html_url%0A%20%20%20%20%20%20issue_url%0A%20%20%20%20%20%20id%0A%20%20%20%20%20%20node_id%0A%20%20%20%20%20%20created_at%0A%20%20%20%20%20%20updated_at%0A%20%20%20%20%20%20author_association%0A%20%20%20%20%20%20body%0A%20%20%20%20%20%20reactions%0A%20%20%20%20%20%20performed_via_github_app%0A%20%20%20%20%20%20user%20%7B%0A%20%20%20%20%20%20%20%20id%0A%20%20%20%20%20%20%20%20name%0A%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20issue%20%7B%0A%20%20%20%20%20%20%20%20id%0A%20%20%20%20%20%20%20%20title%0A%20%20%20%20%20%20%7D%0A%20%20%20%20%7D%0A%20%20%7D%0A%7D"&gt;datasette-graphql-demo.datasette.io&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Thanks! Would love to compare notes on this -- my experience from working with analysts at my employer was that they were able to master GraphiQL very quickly. In a sense, it was more intimidating than actually difficult, so working with them directly to get them over the initial difficulty hump via examples and targeted exercises made a huge positive impact.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h5&gt;Litestream&lt;/h5&gt;
&lt;p&gt;Ben Johnson &lt;a href="https://twitter.com/benbjohnson"&gt;@benbjohnson&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://litestream.io/"&gt;Litestream&lt;/a&gt; adds replication to SQLite, allowing databases to be cheaply replicated to storage systems such as S3. Litestream also now implements live read-replication, where many read replicas can be run against a single leader database.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.sqlite.org/np1queryprob.html"&gt;https://www.sqlite.org/np1queryprob.html&lt;/a&gt; - Many Small Queries Are Efficient in SQLite&lt;/p&gt;
&lt;p&gt;Questions about Litestream:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;What does the planned hot standby feature look like, especially regarding durability guarantees during fail-over?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;BJ: Hot standby is a tough issue to generalize. The database-as-a-service version of Litestream that's coming will handle this but it's not necessarily planned for Litestream)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Will DBaaS be hosted, OSS, or both?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It'll be both&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;From Longwei Su: I assume offline update will be commit locally then sync with the online storage. If there is a offline commit that conflict with the online version(that already committed in). How to resolve the conflict?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Not sure if this relates to Litestream but; how big is sql.js --- how much does it cost (in kilobytes) to load sqlite in the browser?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;BJ: I think sql.js is 1.2MB so the cost depends on how much your provider charges for bandwidth&lt;/li&gt;
&lt;li&gt;Thanks! Meant "cost" in the sense of bytes transferred over wire --- this answers it :)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;GraphQL&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/help-scraper"&gt;https://github.com/simonw/help-scraper&lt;/a&gt; is scraping GraphQL schemas&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/my-talks"&gt;my-talks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/twitter"&gt;twitter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/litestream"&gt;litestream&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ben-johnson"&gt;ben-johnson&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/geoffrey-litt"&gt;geoffrey-litt&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="sqlite"/><category term="my-talks"/><category term="twitter"/><category term="datasette"/><category term="litestream"/><category term="ben-johnson"/><category term="geoffrey-litt"/></entry><entry><title>logpaste</title><link href="https://simonwillison.net/2021/Mar/17/logpaste/#atom-tag" rel="alternate"/><published>2021-03-17T15:48:45+00:00</published><updated>2021-03-17T15:48:45+00:00</updated><id>https://simonwillison.net/2021/Mar/17/logpaste/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/mtlynch/logpaste/tree/fe53779aef25cdc9dfaabd9404ff801951730ddb"&gt;logpaste&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Useful example of how to use the Litestream SQLite replication tool in a Dockerized application: S3 credentials are passed to the container on startup, it then attempts to restore the SQLite database from S3 and starts a Litestream process in the same container to periodically synchronize changes back up to the S3 bucket.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/deliberatecoder/status/1371993772549103621"&gt;@deliberatecoder&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/replication"&gt;replication&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/s3"&gt;s3&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/docker"&gt;docker&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/litestream"&gt;litestream&lt;/a&gt;&lt;/p&gt;



</summary><category term="replication"/><category term="s3"/><category term="sqlite"/><category term="docker"/><category term="litestream"/></entry><entry><title>Quoting Ben Johnson</title><link href="https://simonwillison.net/2021/Feb/11/ben-johnson/#atom-tag" rel="alternate"/><published>2021-02-11T20:50:21+00:00</published><updated>2021-02-11T20:50:21+00:00</updated><id>https://simonwillison.net/2021/Feb/11/ben-johnson/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://news.ycombinator.com/item?id=26106043"&gt;&lt;p&gt;Litestream runs continuously on a test server with generated load and streams backups to S3. It uses physical replication so it'll actually restore the data from S3 periodically and compare the checksum byte-for-byte with the current database.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://news.ycombinator.com/item?id=26106043"&gt;Ben Johnson&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/testing"&gt;testing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/litestream"&gt;litestream&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ben-johnson"&gt;ben-johnson&lt;/a&gt;&lt;/p&gt;



</summary><category term="testing"/><category term="litestream"/><category term="ben-johnson"/></entry></feed>