Simon Willison's Weblog: litestream

Zero-latency SQLite storage in every Durable Object

2024-10-13T22:26:49+00:00

Zero-latency SQLite storage in every Durable Object

Kenton Varda introduces the next iteration of Cloudflare's Durable Object platform, which recently upgraded from a key/value store to a full relational system based on SQLite.

For useful background on the first version of Durable Objects take a look at Cloudflare's durable multiplayer moat by Paul Butler, who digs into its popularity for building WebSocket-based realtime collaborative applications.

The new SQLite-backed Durable Objects is a fascinating piece of distributed system design, which advocates for a really interesting way to architect a large scale application.

The key idea behind Durable Objects is to colocate application logic with the data it operates on. A Durable Object comprises code that executes on the same physical host as the SQLite database that it uses, resulting in blazingly fast read and write performance.

How could this work at scale?

A single object is inherently limited in throughput since it runs on a single thread of a single machine. To handle more traffic, you create more objects. This is easiest when different objects can handle different logical units of state (like different documents, different users, or different "shards" of a database), where each unit of state has low enough traffic to be handled by a single object

Kenton presents the example of a flight booking system, where each flight can map to a dedicated Durable Object with its own SQLite database - thousands of fresh databases per airline per day.

Each DO has a unique name, and Cloudflare's network then handles routing requests to that object wherever it might live on their global network.

The technical details are fascinating. Inspired by Litestream, each DO constantly streams a sequence of WAL entries to object storage - batched every 16MB or every ten seconds. This also enables point-in-time recovery for up to 30 days through replaying those logged transactions.

To ensure durability within that ten second window, writes are also forwarded to five replicas in separate nearby data centers as soon as they commit, and the write is only acknowledged once three of them have confirmed it.

The JavaScript API design is interesting too: it's blocking rather than async, because the whole point of the design is to provide fast single threaded persistence operations:

let docs = sql.exec(`
  SELECT title, authorId FROM documents
  ORDER BY lastModified DESC
  LIMIT 100
`).toArray();

for (let doc of docs) {
  doc.authorName = sql.exec(
    "SELECT name FROM users WHERE id = ?",
    doc.authorId).one().name;
}

This one of their examples deliberately exhibits the N+1 query pattern, because that's something SQLite is uniquely well suited to handling.

The system underlying Durable Objects is called Storage Relay Service, and it's been powering Cloudflare's existing-but-different D1 SQLite system for over a year.

I was curious as to where the objects are created. According to this (via Hacker News):

Durable Objects do not currently change locations after they are created. By default, a Durable Object is instantiated in a data center close to where the initial get() request is made. [...] To manually create Durable Objects in another location, provide an optional locationHint parameter to get().

And in a footnote:

Dynamic relocation of existing Durable Objects is planned for the future.

where.durableobjects.live is a neat site that tracks where in the Cloudflare network DOs are created - I just visited it and it said:

This page tracks where new Durable Objects are created; for example, when you loaded this page from Half Moon Bay, a worker in San Jose, California, United States (SJC) created a durable object in San Jose, California, United States (SJC).

Via lobste.rs

Tags: software-architecture, sqlite, cloudflare, litestream, scaling, websockets

Introducing datasette-litestream: easy replication for SQLite databases in Datasette

2023-09-13T19:28:37+00:00

Introducing datasette-litestream: easy replication for SQLite databases in Datasette

We use Litestream on Datasette Cloud for streaming backups of user data to S3. Alex Garcia extracted out our implementation into a standalone Datasette plugin, which bundles the Litestream Go binary (for the relevant platform) in the package you get when you run “datasette install datasette-litestream”—so now Datasette has a very robust answer to questions about SQLite disaster recovery beyond just the Datasette Cloud platform.

Tags: datasette-cloud, sqlite, plugins, datasette, alex-garcia, litestream

Stringing together several free tiers to host an application with zero cost using fly.io, Litestream and Cloudflare

2022-10-07T17:47:34+00:00

Stringing together several free tiers to host an application with zero cost using fly.io, Litestream and Cloudflare

Alexander Dahl provides a detailed description (and code) for his current preferred free hosting solution for small sites: SQLite (and a Go application) running on Fly’s free tier, with the database replicated up to Cloudflare’s R2 object storage (again on a free tier) by Litestream.

Tags: fly, litestream, hosting, sqlite, cloudflare

Introducing LiteFS

2022-09-21T18:56:42+00:00

Introducing LiteFS

LiteFS is the new SQLite replication solution from Fly, now ready for beta testing. It’s from the same author as Litestream but has a very different architecture; LiteFS works by implementing a custom FUSE filesystem which spies on SQLite transactions being written to the journal file and forwards them on to other nodes in the cluster, providing full read-replication. The signature Litestream feature of streaming a backup to S3 should be coming within the next few months.

Via Hacker News

Tags: ben-johnson, fly, sqlite, litestream, replication, databases

Litestream backups for Datasette Cloud (and weeknotes)

2022-08-11T17:19:52+00:00

My main focus this week has been adding robust backups to the forthcoming Datasette Cloud.

Datasette Cloud is a SaaS service for Datasette. It allows people to create a private Datasette instance where they can upload data, visualize and transform it and share it with other members of their team. You can join the waiting list to try it out using this form.

I'm building Datastte Cloud on Fly, specifically on Fly Machines.

Security is a big concern for Datasette Cloud. Teams should only be able to access their own data - bugs where users accidentally (or maliciously) access data for another team should be protected against as much as possible.

To help guarantee that, I've designed Datasette Cloud so that each team gets their own, dedicated instance, running in a Firecracker VM managed by Fly. Their data lives in a dedicated volume.

Fly volumes already implement snapshot backups, but I'm interested in defence in depth. This is where Litestream comes in (coincidentally now part of Fly, although it wasn't when I first selected it as my backup strategy).

I'm using Litestream to constantly backup the data for each Datasette Cloud team to an S3 bucket. In the case of a complete failure of a volume, I can restore data from a backup that should be at most a few seconds out of date. Litestream also gives me point-in-time backups, such that I can recover a previous version of the data within a configurable retention window.

Keeping backups isolated

Litestream works by writing a constant stream of pages from SQLite's WAL (Write-Ahead Log) up to an S3 bucket. It needs the ability to both read and write from S3.

This requires making S3 credentials available within the containers that run Datasette and Litestream for each team account.

Credentials in those containers are not visible to the users of the software, but I still wanted to be confident that if the credentials leaked in some way the isolation between teams would be maintained.

Initially I thought about having a separate S3 bucket for each team, but it turns out AWS has a default limit of 100 buckets per account, and a hard limit of 1,000. I aspire to have more than 1,000 customers, so this limit makes a bucket-per-team seem like the wrong solution.

I've learned an absolute ton about S3 and AWS permissions building my s3-credentials tool for creating credentials for accessing S3.

One of the tricks I've learned is that it's possible to create temporary, time-limited credentials that only work for a prefix (effectively a folder) within an S3 bucket.

This means I can run Litestream with credentials that are specific to the team - that can read and write only from the team-ID/ prefix in the S3 bucket I am using to store the backups.

Obtaining temporary credentials

My s3-credentials tool can create credentials for a prefix within an S3 bucket like this:

s3-credentials create my-bucket-for-backus \
  --duration 12h \
  --prefix team-56/

This command uses the sts.assume_role() AWS method to create credentials that allow access to that bucket, attaching this generated JSON policy to it in order to restrict access to the provided prefix.

I extracted the relevant Python code from s3-credentials and used it to create a private API endpoint in my Datasette Cloud management server which could return the temporary credentials needed by the team container.

With the endpoint in place, my code for launching a team container can do this:

Create the volume and machine for that team (if they do not yet exist)
Generate a signed secret token that the machine container can exchange for its S3 credentials
Launch the machine container, passing it the secret token
On launch, the container runs a script which exchanges that secret token for its 12 hour S3 credentials, using the private API endpoint I created
Those credentials are used to populate the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN environment variables used by Litestream
Start Litestream, which then starts Datasette

Restarting every 12 hours

You may be wondering why I bothered with that initial secret token - why not just pass the temporary AWS credentials to the container when I launch it?

The reason for this is that I need to be able to obtain fresh credentials every 12 hours.

A really neat feature of Fly Machines is that they support scale-to-zero. You can stop them, and Fly will automatically restart them the next time they recieve traffic.

All you need to do is call sys.exit(0) in your Python code (or the equivalent in any other language) and Fly will stop your container... and then restart it again with a couple of seconds of cold start time the next time an HTTP request for your container hits the Fly router.

So far I'm mainly using this to avoid the cost of running containers when they aren't actually in- use. But there's a neat benefit when it comes to Litestream too.

I'm using S3 credentials which expire after 12 hours. This means I need to periodically refresh the credentials and restart Litestream or it will stop being able to write to the S3 bucket.

After considering a few ways of doing this, I selected the simplest to implement: have Datasette call sys.exit(0) after ten hours, and let Fly restart the container causing my startup script to fetch freshly generated 12 hour credentials and pass them to Litestream.

I implemented this by adding it as a new setting to my existing datasette-scale-to-zero plugin. You can now configure that with "max-age": "10h" and it will shut down Datasette once the server has been running for that long.

Why does this require my own secret token system? Because when the container is restarted, it needs to make an authenticated call to my endpoint to retrieve those fresh S3 credentials. Fly persists environment variable secrets between restarts to the container, so that secret can be long-lived even while it is exchanged for short-term S3 credentials.

I only just put the new backup system in place, so I'm exercising it a bit before I open things up to trial users - but so far it's looking like a very robust solution to the problem.

s3-ocr improvements

I released a few new versions of s3-ocr this week, as part of my ongoing project working with the San Francisco Microscopical Society team to release a searchable version of their scanned document archives.

The two main improvements are:

A new --dry-run option to s3-ocr start which shows you what the tool will do without making any changes to your S3 bucket, or triggering any OCR jobs. #22
s3-ocr start used to fail with an error if running it would create more than 100 (or 600 depending on your region) concurrent OCR jobs. The tool now knows how to identify that error and pause and retry starting the jobs instead. #21

The fix that took the most time is this: installations of the tool no longer arbitrarily fail to work depending on the environment you install them into!

Solving this took me the best part of a day. The short version is this: Click 8.1.0 introduced a new feature that lets you use @cli.command as a decorator instead of @cli.command(). This meant that installing s3-ocr in an environment that already had a previous version of Click would result in silent errors.

The solution is simple: pin to click>=8.1.0 in the project dependencies if you plan to use this new syntax.

If I'd read the Click changelog more closely I would have saved myself a whole lot of time.

Issues #25 and #26 detail the many false turns I took trying to figure this out.

More fun with GPT-3 and DALL-E

This tweet scored over a million impressions on Twitter:

New hobby: prototyping video games in 60 seconds using a combination of GPT-3 and DALL-E

Here's "Raccoon Heist" pic.twitter.com/xQ3Vm8p2XW
- Simon Willison (@simonw) August 5, 2022

As this got retweeted outside of my usual circles it started confusing people who thought the "prototype" was a working game, as opposed to a fake screenshot and a paragraph of descriptive text! I wasn't kidding when I said I spent 60 seconds on this.

I also figured out how to use GPT-3 to write jq one-liners. I love jq but I have to look up how to use it every time, so having GPT-3 do the work for me is a pretty neat time saver. More on that in this TIL: Using GPT-3 to figure out jq recipes

Releases this week

s3-ocr: 0.6.3 - (9 releases total) - 2022-08-10
Tools for running OCR against files stored in S3
datasette-scale-to-zero: 0.2 - (4 releases total) - 2022-08-05
Quit Datasette if it has not received traffic for a specified time period
shot-scraper: 0.14.3 - (18 releases total) - 2022-08-02
A command-line utility for taking automated screenshots of websites
s3-credentials: 0.12.1 - (13 releases total) - 2022-08-01
A tool for creating credentials for accessing S3 buckets
datasette-sqlite-fts4: 0.3.2 - (2 releases total) - 2022-07-31

TIL this week

Tags: ocr, s3, datasette, weeknotes, datasette-cloud, fly, litestream, gpt-3, dalle

Litestream: Live Read Replication

2022-04-13T02:04:57+00:00

Litestream: Live Read Replication

The documentation for the read replication implemented in the latest Litestream beta (v0.4.0-beta.2). The design is really simple and clever: the primary runs a web server on a port, and replica instances can then be started with a configured URL pointing to the IP and port of the primary. That’s all it takes to have a SQLite database replicated to multiple hosts, each of which can then conduct read queries against their local copies.

Via @simonw

Tags: replication, sqlite, litestream

SQLite Happy Hour - a Twitter Spaces conversation about three interesting projects building on SQLite

2022-03-23T22:01:07+00:00

Yesterday I hosted SQLite Happy Hour. my first conversation using Twitter Spaces. The idea was to dig into three different projects that were doing interesting things on top of SQLite. I think it worked pretty well, and I'm curious to explore this format more in the future.

Here's the tweet that initially promoted the space:

Interested in devious tricks to push the envelope of what you can do with SQLite?

Join myself, @benbjohnson, @geoffreylitt and @nschiefer on Tuesday for a Twitter Spaces conversation about @litestream, @datasetteproj and Riffle!https://t.co/ukRMVgC09u
- Simon Willison (@simonw) March 17, 2022

My co-hosts, representing the three projects, were:

Ben Johnson @benbjohnson, creator of Litestream - a tool that adds replication to SQLite built on top of the WAL mechanism.
Geoffrey Litt @hgeoffreylitt and Nicholas Schiefer @nschiefer who are working on Riffle, a project exploring the idea of driving reactive user interfaces using SQL queries - see Building data-centric apps with a reactive relational database.

Twitter Spaces recorded audio only lasts for 30 days, so I've exported the audio and shared it on SoundCloud as well.

Simon Willison · SQLite Happy Hour - 22nd March 2022

Collaborative notes from the session

Something I've observed in Twitter Spaces I've joined in the past is that they can really benefit from a dedicated back-channel, to share links and allow audience participation without people needing to first request to speak.

A trick I've used with online talks I've given in the past is to start a collaborative Google Doc to collect shared notes and questions. I tried this for the Twitter Space, and it worked really well!

You see that document here. During the session the document was open for anyone to edit - I've locked it down now that the session has ended.

I've duplicated the final form of the document at the bottom of this post.

Something I really like about this format is that it allows for additional material to be posted later. I spent some time adding more detailed answers to the questions about Datasette after the session had ended.

Thoughts for if I do this again

This was my first time hosting a space, and I learned a lot along the way.

Firstly, this kind of thing works the best when there is a back and forth between the participants.

My original idea was to have each project talk for ten minutes, then spend five minutes on discussion between the panel before moving on to the next project - and 15 minutes of open discussion at the end.

My co-hosts suggested we try to make it more conversational, interrupting each other as we went along. We did that, and it worked much better: these conversations are far more interesting as a conversation than a monolog.

I still don't have a great feel for when to interrupt people in an audio-only conversation, since unlike an in-person panel there are no visual clues to go off!

Techology: it turns out Twitter Spaces has wildly different functionality on web v.s. mobile apps. We spent the first five minutes making sure all of our speakers could talk! We really should have done a tech rehearsal first, but I wasn't sure how to do that without accidentally broadcasting it to the world - maybe setup burner Twitter accounts for testing?

Presenting audio-only is itself a challenge: I'm used to leaning on visual demos when I explain what Datasette is in a talk, and not having those to fall back on was challenging. I had jotted down notes on the main points I wanted to hit which certainly helped, but I think there's a whole new presenting skill here that I need to work harder to develop.

Exporting the recorded audio from Twitter was frustrating but possible. I wrote some notes on how I did that in this TIL.

Our collaborative notes in full

SQLite Happy Hour Twitter Space

22nd March 2022 - 12:30pm PT / 1:30pm MT / 3:30pm ET

Welcome to the SQLite Happy Hour! This hour-long session will feature three projects that are doing interesting things with SQLite. Each project will provide a ten minute overview, followed by five minutes of discussion from the panel. The last 15 minutes of the hour will be an open discussion and general Q&A.

This document is open for anyone to edit. Please feel free to drop notes and questions in as we go along.

The recording of the space is available here: https://twitter.com/i/spaces/1ypKdEXvkMLGW

Riffle

Geoffrey Litt @geoffreylitt, Nicholas Schiefer @nschiefer

Riffle asks: what if you wrote your whole UI as a query over a local database? So far, we've built a prototype using SQLite and React. More background in this paper:

Building data-centric apps with a reactive relational database

Research project goal is to make development simpler, as opposed to the ongoing trend of more complexity.

Riffle looks at having a database-centric mechanism at the heart of the view. Declarative queries could make apps easier to understand and debug.

SQLite is the tool used for the prototype.

Local first architecture: Ink & Switch have been promoting this. Return to a world where you local client device serves as a source of truth - you can access data offline etc - and when the network is available your data gets synced to the cloud.

The prototype: a reactive layer that uses SQLite as a state management backend for React, using https://sql.js.org/ which compiles SQLite in WASM. Also built prototypes of desktop apps using https://github.com/tauri-apps/tauri - like Electron but using the system web browser instead of bundling its own.

Since they control the writes, they can re-execute every query after any writes happen. SQLite is so fast that this works fine, queries all take under a ms and even with a thousand queries you can still run them all.

ALL UI state is in the database - there's no local React component state - literally everything is in the database. This means all UI state is persistent by default.

IndexedDB is used for the in-browser persistence. The Tauri desktop app stores to a file on disk. Maybe SQL.js could do that with the new Chrome filesystem API stuff too?

Questions about Riffle:

Will Riffle target vanilla JS, or Node.js?
- It's running client-side, so vanilla JS
From Stephen: What about browser-native UI state like scroll position, URL path, query string, multiple independent browser tabs, etc?
- Great question. We do some syncing of browser-native state to put it in the DB: eg, to support virtualized list rendering we update scroll state in the DB with an event handler. But there's definitely some browser state that isn't being captured reliably. In the purest world, the pixels on your screen would be produced by a DB query :)
From Predrag Gruevski: Would "query the queries" be a viable approach for narrowing the set of queries that need to be re-executed after a given write? Simple example: if table X gets modified, query for all queries that have table X in a FROM clause, then re-execute them.
- yeah, that's roughly the direction we're headed. It's a little trickier than that if you start having subqueries / materialized view, but good general idea
From Longwei Su: Right now, each db update will cause a whole refresh. Is there any plan to refine the binding? So that any db update will only trigger UI component that "subscribe" to this section of the data. Sqlite have trigger, which can have callback on record update. How to construct that "publisher"-> "subscriber" mapping from sql query?

Comments for Riffle:

From Jesse - http://web.dev/file-system-access/ isn't a very rich api - I think you could persist to it, but I don't think you can seek/update/.../all the posix stuff sqlite probably needs
Hasura documented how they do reactive queries with Postgres, might be useful for minimising refetch overhead?

Datasette

Simon Willison @simonw

Datasette is an open source multi-tool for exploring and publishing data. It explores SQLite as a read-only mechanism for publishing structured data online in as flexible a manner as possible, and aims to build an ecosystem of plugins that can handle a wide range of exploratory data analysis challenges.

Video introduction here: https://simonwillison.net/2021/Feb/7/video/

Questions about Datasette:

How does it compares with https://github.com/dinedal/textql, it seems the same but instead of sqlite binaries, just raw csv files which are more ubiquitous, and easier to view and edit with with office software (msf excel, libreoffice calc) ?
- sqlite-utils memory provides similar functionality: https://simonwillison.net/2021/Jun/19/sqlite-utils-memory/
Does Datasette need to worry about SQLite's Defense Against the Dark Arts security guidelines?
- Yes, absolutely! I've put a lot of work in there. Most importantly, Datasette enforces a time limit on queries, which cuts them off if they take more than a second.
The SQLite3 docs are sometimes light on examples for the tricky stuff (e.g., enabling WAL). What's your best sort of info beyond the official docs?
- I've been publishing my own notes here: https://til.simonwillison.net/sqlite
- The SQLite Forum is amazing - I ask questions on there and often get a reply from the maintainers within a few hours: https://sqlite.org/forum/forummain
From Predrag Gruevski: Regarding learning curve, is a GraphQL web IDE (with syntax highlighting / autocomplete etc.) sufficiently user-friendly for folks more comfortable with a spreadsheet than a CLI tool or SQL?
- Probably not! GraphQL requires thinking like a programmer too. I'm interested in helping people who aren't yet ready to learn any kind of programming language
- I have a plugin for Datasette that adds GraphQL with the GraphiQL user interface - demo here: datasette-graphql-demo.datasette.io
- Thanks! Would love to compare notes on this -- my experience from working with analysts at my employer was that they were able to master GraphiQL very quickly. In a sense, it was more intimidating than actually difficult, so working with them directly to get them over the initial difficulty hump via examples and targeted exercises made a huge positive impact.

Litestream

Ben Johnson @benbjohnson

Litestream adds replication to SQLite, allowing databases to be cheaply replicated to storage systems such as S3. Litestream also now implements live read-replication, where many read replicas can be run against a single leader database.

https://www.sqlite.org/np1queryprob.html - Many Small Queries Are Efficient in SQLite

Questions about Litestream:

What does the planned hot standby feature look like, especially regarding durability guarantees during fail-over?
- BJ: Hot standby is a tough issue to generalize. The database-as-a-service version of Litestream that's coming will handle this but it's not necessarily planned for Litestream)
Will DBaaS be hosted, OSS, or both?
- It'll be both
From Longwei Su: I assume offline update will be commit locally then sync with the online storage. If there is a offline commit that conflict with the online version(that already committed in). How to resolve the conflict?
Not sure if this relates to Litestream but; how big is sql.js --- how much does it cost (in kilobytes) to load sqlite in the browser?
- BJ: I think sql.js is 1.2MB so the cost depends on how much your provider charges for bandwidth
- Thanks! Meant "cost" in the sense of bytes transferred over wire --- this answers it :)

GraphQL

https://github.com/simonw/help-scraper is scraping GraphQL schemas

Tags: sqlite, talks, twitter, datasette, litestream, ben-johnson, geoffrey-litt

logpaste

2021-03-17T15:48:45+00:00

logpaste

Useful example of how to use the Litestream SQLite replication tool in a Dockerized application: S3 credentials are passed to the container on startup, it then attempts to restore the SQLite database from S3 and starts a Litestream process in the same container to periodically synchronize changes back up to the S3 bucket.

Via @deliberatecoder

Tags: docker, litestream, sqlite, s3, replication

Quoting Ben Johnson

2021-02-11T20:50:21+00:00

Litestream runs continuously on a test server with generated load and streams backups to S3. It uses physical replication so it'll actually restore the data from S3 periodically and compare the checksum byte-for-byte with the current database.

— Ben Johnson

Tags: testing, litestream, ben-johnson

trustme

2021-02-11T20:00:56+00:00

trustme

This looks incredibly useful. Run “python -m trustme” and it will create three files for you: server.pem, server.key and a client.pem client certificate, providing a certificate for “localhost” (or another host you spefict) using a fake certificate authority. Looks like it should be the easiest way to test TLS locally.

Via Seth Michael Larson

Tags: tls, certificates, litestream