Simon Willison's Weblog: http-range-requests

Unicode Explorer using binary search over fetch() HTTP range requests

2026-02-27T17:50:54+00:00

Unicode Explorer using binary search over fetch() HTTP range requests

Here's a little prototype I built this morning from my phone as an experiment in HTTP range requests, and a general example of using LLMs to satisfy curiosity.

I've been collecting HTTP range tricks for a while now, and I decided it would be fun to build something with them myself that used binary search against a large file to do something useful.

So I brainstormed with Claude. The challenge was coming up with a use case for binary search where the data could be naturally sorted in a way that would benefit from binary search.

One of Claude's suggestions was looking up information about unicode codepoints, which means searching through many MBs of metadata.

I had Claude write me a spec to feed to Claude Code - visible here - then kicked off an asynchronous research project with Claude Code for web against my simonw/research repo to turn that into working code.

Here's the resulting report and code. One interesting thing I learned is that Range request tricks aren't compatible with HTTP compression because they mess with the byte offset calculations. I added 'Accept-Encoding': 'identity' to the fetch() calls but this isn't actually necessary because Cloudflare and other CDNs automatically skip compression if a content-range header is present.

I deployed the result to my tools.simonwillison.net site, after first tweaking it to query the data via range requests against a CORS-enabled 76.6MB file in an S3 bucket fronted by Cloudflare.

The demo is fun to play with - type in a single character like ø or a hexadecimal codepoint indicator like 1F99C and it will binary search its way through the large file and show you the steps it takes along the way:

Tags: algorithms, http, research, tools, unicode, ai, generative-ai, llms, ai-assisted-programming, vibe-coding, http-range-requests

Unicode Explorer — Binary Search Over HTTP

2026-02-27T16:01:33+00:00

Tool: Unicode Explorer — Binary Search Over HTTP

Tags: unicode, http-range-requests

Unicode Explorer — Binary Search Over HTTP

2026-02-27T16:00:00+00:00

Research: Unicode Explorer — Binary Search Over HTTP

By leveraging HTTP Range requests and fixed-width binary records, Unicode Explorer demonstrates efficient binary search for Unicode data directly from a static file with zero backend or dependencies. The client fetches only one 256-byte record per step, using signposts from `meta.json` to optimize initial narrowing, then performs real-time network-driven binary search, visualized in an interactive log.

Tags: unicode, http-range-requests

Gwtar: a static efficient single-file HTML format

2026-02-15T18:26:08+00:00

Gwtar: a static efficient single-file HTML format

Fascinating new project from Gwern Branwen and Said Achmiz that targets the challenge of combining large numbers of assets into a single archived HTML file without that file being inconvenient to view in a browser.

The key trick it uses is to fire window.stop() early in the page to prevent the browser from downloading the whole thing, then following that call with inline tar uncompressed content.

It can then make HTTP range requests to fetch content from that tar data on-demand when it is needed by the page.

The JavaScript that has already loaded rewrites asset URLs to point to https://localhost/ purely so that they will fail to load. Then it uses a PerformanceObserver to catch those attempted loads:

let perfObserver = new PerformanceObserver((entryList, observer) => {
    resourceURLStringsHandler(entryList.getEntries().map(entry => entry.name));
});
perfObserver.observe({ entryTypes: [ "resource" ] });

That resourceURLStringsHandler callback finds the resource if it is already loaded or fetches it with an HTTP range request otherwise and then inserts the resource in the right place using a blob: URL.

Here's what the window.stop() portion of the document looks like if you view the source:

Amusingly for an archive format it doesn't actually work if you open the file directly on your own computer. Here's what you see if you try to do that:

You are seeing this message, instead of the page you should be seeing, because gwtar files cannot be opened locally (due to web browser security restrictions).

To open this page on your computer, use the following shell command:

perl -ne'print $_ if $x; $x=1 if /<!-- GWTAR END/' < foo.gwtar.html | tar --extract

Then open the file foo.html in any web browser.

Via Hacker News

Tags: archiving, html, javascript, http-range-requests

How uv got so fast

2025-12-26T23:43:15+00:00

How uv got so fast

Andrew Nesbitt provides an insightful teardown of why uv is so much faster than pip. It's not nearly as simple as just "they rewrote it in Rust" - uv gets to skip a huge amount of Python packaging history (which pip needs to implement for backwards compatibility) and benefits enormously from work over recent years that makes it possible to resolve dependencies across most packages without having to execute the code in setup.py using a Python interpreter.

Two notes that caught my eye that I hadn't understood before:

HTTP range requests for metadata. Wheel files are zip archives, and zip archives put their file listing at the end. uv tries PEP 658 metadata first, falls back to HTTP range requests for the zip central directory, then full wheel download, then building from source. Each step is slower and riskier. The design makes the fast path cover 99% of cases. None of this requires Rust.

[...]

Compact version representation. uv packs versions into u64 integers where possible, making comparison and hashing fast. Over 90% of versions fit in one u64. This is micro-optimization that compounds across millions of comparisons.

I wanted to learn more about these tricks, so I fired up an asynchronous research task and told it to checkout the astral-sh/uv repo, find the Rust code for both of those features and try porting it to Python to help me understand how it works.

Here's the report that it wrote for me, the prompts I used and the Claude Code transcript.

You can try the script it wrote for extracting metadata from a wheel using HTTP range requests like this:

uv run --with httpx https://raw.githubusercontent.com/simonw/research/refs/heads/main/http-range-wheel-metadata/wheel_metadata.py https://files.pythonhosted.org/packages/8b/04/ef95b67e1ff59c080b2effd1a9a96984d6953f667c91dfe9d77c838fc956/playwright-1.57.0-py3-none-macosx_11_0_arm64.whl -v

The Playwright wheel there is ~40MB. Adding -v at the end causes the script to spit out verbose details of how it fetched the data - which looks like this.

Key extract from that output:

[1] HEAD request to get file size...
    File size: 40,775,575 bytes
[2] Fetching last 16,384 bytes (EOCD + central directory)...
    Received 16,384 bytes
[3] Parsed EOCD:
    Central directory offset: 40,731,572
    Central directory size: 43,981
    Total entries: 453
[4] Fetching complete central directory...
    ...
[6] Found METADATA: playwright-1.57.0.dist-info/METADATA
    Offset: 40,706,744
    Compressed size: 1,286
    Compression method: 8
[7] Fetching METADATA content (2,376 bytes)...
[8] Decompressed METADATA: 3,453 bytes

Total bytes fetched: 18,760 / 40,775,575 (100.0% savings)

The section of the report on compact version representation is interesting too. Here's how it illustrates sorting version numbers correctly based on their custom u64 representation:

Sorted order (by integer comparison of packed u64):
  1.0.0a1 (repr=0x0001000000200001)
  1.0.0b1 (repr=0x0001000000300001)
  1.0.0rc1 (repr=0x0001000000400001)
  1.0.0 (repr=0x0001000000500000)
  1.0.0.post1 (repr=0x0001000000700001)
  1.0.1 (repr=0x0001000100500000)
  2.0.0.dev1 (repr=0x0002000000100001)
  2.0.0 (repr=0x0002000000500000)

Tags: performance, python, rust, uv, http-range-requests

HTTP Range Requests for Wheel Metadata

2025-12-26T23:29:00+00:00

Research: HTTP Range Requests for Wheel Metadata

Leveraging ZIP file structure and HTTP range requests, tools like uv efficiently extract wheel metadata for Python packages without downloading entire archives. By fetching just the last 16KB of the wheel (central directory and EOCD), parsing for the METADATA file offset, and then requesting exactly its byte range, uv and the accompanying Python prototype routinely reduce bandwidth usage by over 70%.

Tags: http-range-requests

OpenTimes

2025-03-17T22:49:59+00:00

OpenTimes

Spectacular new open geospatial project by Dan Snow:

OpenTimes is a database of pre-computed, point-to-point travel times between United States Census geographies. It lets you download bulk travel time data for free and with no limits.

Here's what I get for travel times by car from El Granada, California:

The technical details are fascinating:

The entire OpenTimes backend is just static Parquet files on Cloudflare's R2. There's no RDBMS or running service, just files and a CDN. The whole thing costs about $10/month to host and costs nothing to serve. In my opinion, this is a great way to serve infrequently updated, large public datasets at low cost (as long as you partition the files correctly).

Sure enough, R2 pricing charges "based on the total volume of data stored" - $0.015 / GB-month for standard storage, then $0.36 / million requests for "Class B" operations which include reads. They charge nothing for outbound bandwidth.

All travel times were calculated by pre-building the inputs (OSM, OSRM networks) and then distributing the compute over hundreds of GitHub Actions jobs. This worked shockingly well for this specific workload (and was also completely free).

Here's a GitHub Actions run of the calculate-times.yaml workflow which uses a matrix to run 255 jobs!

Relevant YAML:

  matrix:
    year: ${{ fromJSON(needs.setup-jobs.outputs.years) }}
    state: ${{ fromJSON(needs.setup-jobs.outputs.states) }}

Where those JSON files were created by the previous step, which reads in the year and state values from this params.yaml file.

The query layer uses a single DuckDB database file with views that point to static Parquet files via HTTP. This lets you query a table with hundreds of billions of records after downloading just the ~5MB pointer file.

This is a really creative use of DuckDB's feature that lets you run queries against large data from a laptop using HTTP range queries to avoid downloading the whole thing.

The README shows how to use that from R and Python - I got this working in the duckdb client (brew install duckdb):

INSTALL httpfs;
LOAD httpfs;
ATTACH 'https://data.opentimes.org/databases/0.0.1.duckdb' AS opentimes;

SELECT origin_id, destination_id, duration_sec
  FROM opentimes.public.times
  WHERE version = '0.0.1'
      AND mode = 'car'
      AND year = '2024'
      AND geography = 'tract'
      AND state = '17'
      AND origin_id LIKE '17031%' limit 10;

In answer to a question about adding public transit times Dan said:

In the next year or so maybe. The biggest obstacles to adding public transit are:

Collecting all the necessary scheduling data (e.g. GTFS feeds) for every transit system in the county. Not insurmountable since there are services that do this currently.

Finding a routing engine that can compute nation-scale travel time matrices quickly. Currently, the two fastest open-source engines I've tried (OSRM and Valhalla) don't support public transit for matrix calculations and the engines that do support public transit (R5, OpenTripPlanner, etc.) are too slow.

GTFS is a popular CSV-based format for sharing transit schedules - here's an official list of available feed directories.

This whole project feels to me like a great example of the baked data architectural pattern in action.

Via Hacker News

Tags: census, gis, open-data, openstreetmap, cloudflare, parquet, github-actions, baked-data, duckdb, http-range-requests

Amazon S3 Express One Zone now supports the ability to append data to an object

2024-11-22T04:39:35+00:00

Amazon S3 Express One Zone now supports the ability to append data to an object

This is a first for Amazon S3: it is now possible to append data to an existing object in a bucket, where previously the only supported operation was to atomically replace the object with an updated version.

This is only available for S3 Express One Zone, a bucket class introduced a year ago which provides storage in just a single availability zone, providing significantly lower latency at the cost of reduced redundancy and a much higher price (16c/GB/month compared to 2.3c for S3 standard tier).

The fact that appends have never been supported for multi-availability zone S3 provides an interesting clue as to the underlying architecture. Guaranteeing that every copy of an object has received and applied an append is significantly harder than doing a distributed atomic swap to a new version.

DuckDB-Wasm: Efficient Analytical SQL in the Browser

2021-10-29T15:25:43+00:00

DuckDB-Wasm: Efficient Analytical SQL in the Browser

First SQLite, now DuckDB: options for running database engines in the browser using WebAssembly keep on growing. DuckDB means browsers now have a fast, intuitive mechanism for querying Parquet files too. This also supports the same HTTP Range header trick as the SQLite demo from a while back, meaning it can query large databases loaded over HTTP without downloading the whole file.

Via @hfmuehleisen

Tags: sqlite, parquet, webassembly, duckdb, http-range-requests

Notes on streaming large API responses

2021-06-25T16:26:49+00:00

I started a Twitter conversation last week about API endpoints that stream large amounts of data as an alternative to APIs that return 100 results at a time and require clients to paginate through all of the pages in order to retrieve all of the data:

Any unexpected downsides to offering streaming HTTP API endpoints that serve up eg 100,000 JSON objects in a go rather than asking users to paginate 100 at a time over 1,000 requests, assuming efficient implementation of that streaming endpoint?
— Simon Willison (@simonw) June 17, 2021

I got a ton of great replies. I tried to tie them together in a thread attached to the tweet, but I'm also going to synthesize them into some thoughts here.

Bulk exporting data

The more time I spend with APIs, especially with regard to my Datasette and Dogsheep projects, the more I realize that my favourite APIs are the ones that let you extract all of your data as quickly and easily as possible.

There are generally three ways an API might provide this:

Click an "export everything" button, then wait for a while for an email to show up with a link to a downloadable zip file. This isn't really an API, in particular since it's usually hard if not impossible to automate that initial "click", but it's still better than nothing. Google's Takeout is one notable implementation of this pattern.
Provide a JSON API which allows users to paginate through their data. This is a very common pattern, although it can run into difficulties: what happens if new data is added while you are paginating through the original data, for example? Some systems only allow access to the first N pages too, for performance reasons.
Providing a single HTTP endpoint you can hit that will return ALL of your data - potentially dozens or hundreds of MBs of it - in one go.

It's that last option that I'm interested in talking about today.

Efficiently streaming data

It used to be that most web engineers would quickly discount the idea of an API endpoint that streams out an unlimited number of rows. HTTP requests should be served as quickly as possible! Anything more than a couple of seconds spent processing a request is a red flag that something should be reconsidered.

Almost everything in the web stack is optimized for quickly serving small requests. But over the past decade the tide has turned somewhat: Node.js made async web servers commonplace, WebSockets taught us to handle long-running connections and in the Python world asyncio and ASGI provided a firm foundation for handling long-running requests using smaller amounts of RAM and CPU.

I've been experimenting in this area for a few years now.

Datasette has the ability to use ASGI trickery to stream all rows from a table (or filtered table) as CSV, potentially returning hundreds of MBs of data.

Django SQL Dashboard can export the full results of a SQL query as CSV or TSV, this time using Django's StreamingHttpResponse (which does tie up a full worker process, but that's OK if you restrict it to a controlled number of authenticated users).

VIAL implements streaming responses to offer an "export from the admin" feature. It also has an API-key-protected search API which can stream out all matching rows in JSON or GeoJSON.

Implementation notes

The key thing to watch out for when implementing this pattern is memory usage: if your server buffers 100MB+ of data any time it needs to serve an export request you're going to run into trouble.

Some export formats are friendlier for streaming than others. CSV and TSV are pretty easy to stream, as is newline-delimited JSON.

Regular JSON requires a bit more thought: you can output a [ character, then output each row in a stream with a comma suffix, then skip the comma for the last row and output a ]. Doing that requires peeking ahead (looping two at a time) to verify that you haven't yet reached the end.

Or... Martin De Wulf pointed out that you can output the first row, then output every other row with a preceeding comma - which avoids the whole "iterate two at a time" problem entirely.

The next challenge is efficiently looping through every database result without first pulling them all into memory.

PostgreSQL (and the psycopg2 Python module) offers server-side cursors, which means you can stream results through your code without loading them all at once. I use these in Django SQL Dashboard.

Server-side cursors make me nervous though, because they seem like they likely tie up resources in the database itself. So the other technique I would consider here is keyset pagination.

Keyset pagination works against any data that is ordered by a unique column - it works especially well against a primary key (or other indexed column). Each page of data is retrieved using a query something like this:

select * from items order by id limit 21

Note the limit 21 - if we are retrieving pages of 20 items we ask for 21, since then we can use the last returned item to tell if there is a next page or not.

Then for subsequent pages take the 20th id value and ask for things greater than that:

select * from items where id > 20 limit 21

Each of these queries is fast to respond (since it's against an ordered index) and uses a predictable, fixed amount of memory. Using keyset pagination we can loop through an abitrarily large table of data, streaming each page out one at a time, without exhausting any resources.

And since each query is small and fast, we don't need to worry about huge queries tying up database resources either.

What can go wrong?

I really like these patterns. They haven't bitten me yet, though I've not deployed them for anything truly huge scale. So I asked Twitter what kind of problems I should look for.

Based on the Twitter conversation, here are some of the challenges that this approach faces.

Challenge: restarting servers

If the stream takes a significantly long time to finish then rolling out updates becomes a problem. You don't want to interrupt a download but also don't want to wait forever for it to finish to spin down the server.
— Adam Lowry (@robotadam) June 17, 2021

This came up a few times, and is something I hadn't considered. If your deployment process involves restarting your servers (and it's hard to imagine one that doesn't) you need to take long-running connections into account when you do that. If there's a user half way through a 500MB stream you can either truncate their connection or wait for them to finish.

Challenge: how to return errors

If you're streaming a response, you start with an HTTP 200 code... but then what happens if an error occurs half-way through, potentially while paginating through the database?

You've already started sending the request, so you can't change the status code to a 500. Instead, you need to write some kind of error to the stream that's being produced.

If you're serving up a huge JSON document, you can at least make that JSON become invalid, which should indicate to your client that something went wrong.

Formats like CSV are harder. How do you let your user know that their CSV data is incomplete?

And what if someone's connection drops - are they definitely going to notice that they are missing something, or will they assume that the truncated file is all of the data?

Challenge: resumable downloads

If a user is paginating through your API, they get resumability for free: if something goes wrong they can start again at the last page that they fetched.

Resuming a single stream is a lot harder.

The HTTP range mechanism can be used to provide resumable downloads against large files, but it only works if you generate the entire file in advance.

There is a way to design APIs to support this, provided the data in the stream is in a predictable order (which it has to be if you're using keyset pagination, described above).

Have the endpoint that triggers the download take an optional ?since= parameter, like this:

GET /stream-everything?since=b24ou34
[
    {"id": "m442ecc", "name": "..."},
    {"id": "c663qo2", "name": "..."},
    {"id": "z434hh3", "name": "..."},
]

Here the b24ou34 is an identifier - it can be a deliberately opaque token, but it needs to be served up as part of the response.

If the user is disconnected for any reason, they can start back where they left off by passing in the last ID that they successfully retrieved:

GET /stream-everything?since=z434hh3

This still requires some level of intelligence from the client application, but it's a reasonably simple pattern both to implement on the server and as a client.

Easiest solution: generate and return from cloud storage

It seems the most robust way to implement this kind of API is the least technically exciting: spin off a background task that generates the large response and pushes it to cloud storage (S3 or GCS), then redirect the user to a signed URL to download the resulting file.

This is easy to scale, gives users complete files with content-length headers that they know they can download (and even resume-downloading, since range headers are supported by S3 and GCS). It also avoids any issues with server restarts caused by long connections.

This is how Mixpanel handle their export feature, and it's the solution Sean Coates came to when trying to find a workaround for the AWS Lambda/API Gate response size limit.

If your goal is to provide your users a robust, reliable bulk-export mechanism for their data, export to cloud storage is probably the way to go.

But streaming dynamic responses are a really neat trick, and I plan to keep exploring them!

Tags: apis, scaling, streaming, asgi, http-range-requests

Hosting SQLite databases on Github Pages

2021-05-02T18:55:36+00:00

Hosting SQLite databases on Github Pages

I've seen the trick of running SQLite compiled to WASM in the browser before, but here it comes with an incredibly clever bonus trick: it uses SQLite's page structure to fetch subsets of the database file via HTTP range requests, which means you can run indexed SQL queries against a 600MB database file while only fetching a few MBs of data over the wire. Absolutely brilliant. Tucked away at the end of the post is another neat trick: making the browser DOM available to SQLite as a virtual table, so you can query and update the DOM of the current page using SQL!

Via Hacker News

Tags: sqlite, webassembly, http-range-requests

Random access to Web audio

2004-05-18T22:03:45+00:00

Random access to Web audio

Standard MP3s can be randomly accessed using HTTP’s range header—without any extra server software.

Tags: http-range-requests