Simon Willison’s Weblog

Subscribe

89 items tagged “http”

2023

Cloudflare does not consider vary values in caching decisions. Here’s the spot in Cloudflare’s documentation where they hide a crucially important detail:

“Cloudflare does not consider vary values in caching decisions. Nevertheless, vary values are respected when Vary for images is configured and when the vary header is vary: accept-encoding.”

This means you can’t deploy an application that uses content negotiation via the Accept header behind the Cloudflare CDN—for example serving JSON or HTML for the same URL depending on the incoming Accept header. If you do, Cloudflare may serve cached JSON to an HTML client or vice-versa.

There’s an exception for image files, which Cloudflare added support for in September 2021 (for Pro accounts only) in order to support formats such as WebP which may not have full support across all browsers. # 20th November 2023, 5:08 am

See this page fetch itself, byte by byte, over TLS (via) George MacKerron built a TLS 1.3 library in TypeScript and used it to construct this amazing educational demo, which performs a full HTTPS request for its own source code over a WebSocket and displays an annotated byte-by-byte representation of the entire exchange. This is the most useful illustration of how HTTPS actually works that I’ve ever seen. # 10th May 2023, 1:58 pm

urllib3 v2.0.0 is now generally available. urllib3 is 12 years old now, and is a common low-level dependency for packages like requests and httpx. The biggest new feature in v2 is a higher-level API: resp = urllib3.request(“GET”, “https://example.com”)—a very welcome addition to the library. # 26th April 2023, 10 pm

2022

RFC 7807: Problem Details for HTTP APIs (via) This RFC has been brewing for quite a while, and is currently in last call (ends 2022-11-03). I’m designing the JSON error messages for Datasette at the moment so this could not be more relevant for me. # 1st November 2022, 3:15 am

Introducing sqlite-http: A SQLite extension for making HTTP requests (via) Characteristically thoughtful SQLite extension from Alex, following his sqlite-html extension from a few days ago. sqlite-http lets you make HTTP requests from SQLite—both as a SQL function that returns a string, and as a table-valued SQL function that lets you independently access the body, headers and even the timing data for the request.

This write-up is excellent: it provides interactive demos but also shows how additional SQLite extensions such as the new-to-me “define” extension can be combined with sqlite-http to create custom functions for parsing and processing HTML. # 10th August 2022, 10:22 pm

curlconverter.com (via) This is pretty magic: paste in a “curl” command (including the ones you get from browser devtools using copy-as-curl) and this will convert that into code for making the same HTTP request... using Python, JavaScript, PHP, R, Go, Rust, Elixir, Java, MATLAB, Ansible URI, Strest, Dart or JSON. # 10th March 2022, 8:12 pm

2021

Hurl (via) Hurl is “a command line tool that runs HTTP requests defined in a simple plain text format”—written in Rust on top of curl, it lets you run HTTP requests and then execute assertions against the response, defined using JSONPath or XPath for HTML. It can even assert that responses were returned within a specified duration. # 22nd November 2021, 3:32 am

New HTTP standards for caching on the modern web (via) Cache-Status is a new HTTP header (RFC from August 2021) designed to provide better debugging information about which caches were involved in serving a request—“Cache-Status: Nginx; hit, Cloudflare; fwd=stale; fwd-status=304; collapsed; ttl=300” for example indicates that Nginx served a cache hit, then Cloudflare had a stale cached version so it revalidated from Nginx, got a 304 not modified, collapsed multiple requests (dogpile prevention) and plans to serve the new cached value for the next five minutes. Also described is $Target-Cache-Control: which allows different CDNs to respond to different headers and is already supported by Cloudflare and Akamai (Cloudflare-CDN-Cache-Control: and Akamai-Cache-Control:). # 21st October 2021, 10:40 pm

2020

Weeknotes: Archiving coronavirus.data.gov.uk, custom pages and directory configuration in Datasette, photos-to-sqlite

I mainly made progress on three projects this week: Datasette, photos-to-sqlite and a cleaner way of archiving data to a git repository.

[... 1132 words]

Async Support—HTTPX (via) HTTPX is the new async-friendly HTTP library for Python spearheaded by Tom Christie. It works in both async and non-async mode with an API very similar to requests. The async support is particularly interesting—it’s a really clean API, and now that Jupyter supports top-level await you can run ’(await httpx.AsyncClient().get(url)).text’ directly in a cell and get back the response. Most excitingly the library lets you pass an ASGI app directly to the client and then perform requests against it—ideal for unit tests. # 10th January 2020, 4:49 am

2018

Usage of ARIA attributes via HTTP Archive. A neat example of a Google BigQuery query you can run against the HTTP Archive public dataset (a crawl of the “top” websites run periodically by the Internet Archive, which captures the full details of every resource fetched) to see which ARIA attributes are used the most often. Linking to this because I used it successfully today as the basis for my own custom query—I love that it’s possible to analyze a huge representative sample of the modern web in this way. # 12th July 2018, 3:16 am

2017

How Balanced does Database Migrations with Zero-Downtime. I’m fascinated by the idea of “pausing” traffic during a blocking site maintenance activity (like a database migration) and then un-pausing when the operation is complete—so end clients just see some of their requests taking a few seconds longer than expected. I first saw this trick described by Braintree. Balanced wrote about a neat way of doing this just using HAproxy, which lets you live reconfigure the maxconns to your backend down to zero (causing traffic to be queued up) and then bring the setting back up again a few seconds later to un-pause those requests. # 7th November 2017, 11:36 am

2014

Whether 404 custom error page necessary for a website?

They aren’t required, but if you don’t have a custom 404 page you’re missing out on a very easy way of improving the user experience of your site, and protecting against expired or incorrect links from elsewhere on the web.

[... 98 words]

2012

What will HTTP be superseded by?

HTTP 1.x will likely never be completely replaced, but there is ongoing work at the moment to define HTTP 2.0. The first draft of this was released in November and is based on Google’s SPDY protocol, which is already widely deployed in Google Chrome and Google’s web properties (other browsers have experimented with support for SPDY as well): http://en.m.wikipedia.org/wiki/H...

[... 122 words]

How can I download a web server’s directory and all subdirectories with one command?

Use wget (you can install it with apt-get install wget)

[... 90 words]

2011

What are the best practices in Node.js to communicate with an existing Java backend?

Node speaks HTTP extremely well, and using HTTP means you can do things like put an HTTP load balancer or cache (such as varnish) between Node and your Java application server at a later date.

[... 55 words]

We can deploy new versions of our software, make database schema changes, or even rotate our primary database server, all without failing to respond to a single request. We can accomplish this because we gave ourselves the ability suspend our traffic, which gives us a window of a few seconds to make some changes before letting the requests through. To make this happen, we built a custom HTTP server and application dispatching infrastructure around Python’s Tornado and Redis.

Dan Manges, Braintree # 30th June 2011, 9:27 pm

On HTTP Load Testing. Mark Nottingham explains that running good HTTP benchmarks means understanding available network bandwidth, using dedicated physical hardware, testing at progressively higher loads and a whole lot more. # 18th May 2011, 10:17 am

The Inside Story of How Facebook Responded to Tunisian Hacks (via) “By January 5, it was clear that an entire country’s worth of passwords were in the process of being stolen right in the midst of the greatest political upheaval in two decades.”—which is why you shouldn’t serve your login form over HTTP even though it POSTs over HTTPS. # 24th January 2011, 6:06 pm

2010

gzip support for Amazon Web Services CloudFront. This would have saved me a bunch of work a few weeks ago. CloudFront can now be pointed at your own web server rather than S3, and you can ask it to forward on the Accept-Encoding header and cache multiple content versions based on the result. # 12th November 2010, 5:33 am

LWPx::ParanoidAgent. Every programming language needs an equivalent of this library—a robust, secure way to make HTTP requests against URLs from untrusted sources without risk of tarpits, internal network access, socket starvation, weird server errors, or other nastiness. # 31st August 2010, 2:30 am

nodejitsu’s node-http-proxy (via) Exactly what I’ve been waiting for—a robust HTTP proxy library for Node that makes it trivial to proxy requests to a backend with custom proxy behaviour added in JavaScript. The example app adds an artificial delay to every request to simulate a slow connection, but other exciting potential use cases could include rate limiting, API key restriction, logging, load balancing, lint testing and more besides. # 28th July 2010, 11:34 pm

python/trunk/Lib/httplib.py in 1994 (via) Python’s original httplib implementation, checked in by Guido 16 years and 4 months ago. Not much younger than the Web itself. # 4th July 2010, 11:25 pm

Mongrel2 is “Self-Hosting”. Zed Shaw’s Mongrel2 is shaping up to be a really interesting project. “A web server simply written in C that loves all languages equally”, the two most interesting new ideas are the ability to handle HTTP, Flash Sockets and WebSockets all on the same port (thanks to an extension to the Mongrel HTTP parser that can identify all three protocols) and the ability to hook Mongrel2 up to the backend servers using either TCP/IP or ZeroMQ. I’m guessing this means Mongrel2 could hold an HTTP request open, fire off some messages and wait for various backends to send messages back to construct the response, making async processing just as easy as a regular blocking request/response cycle. # 17th June 2010, 8:11 pm

ElasticSearch memcached module. Fascinating idea: the ElasticSearch search server provides an optional memcached protocol plugin for added performance which maps simple HTTP to memcached. GET is mapped to memcached get commands, POST is mapped to set commands. This means you can use any memcached client to communicate with the search server. # 15th May 2010, 10:17 am

A HTTP Proxy Server in 20 Lines of node.js. Proxying is definitely a sweet spot for Node.js. Peteris Krummins takes it a step further, adding host blacklists and an IP whitelist as configuration files and using Node’s watchFile method to automatically reload changes to them. # 28th April 2010, 1:24 pm

Introduction to nginx.conf scripting. Slideshow—hit left arrow to navigate through the slides. The nginx community is officially nuts. Starts out with a simple “Hello world” using the echo module, then rapidly descends down the rabbit hole in to array operations, sub-requests, memcached connection pooling and eventually non-blocking Drizzle SQL execution against a sharded cluster—all implemented in the nginx.conf configuration file. # 21st April 2010, 11:40 pm

Side-Channel Leaks in Web Applications. Interesting new security research. SSL web connections encrypt the content but an attacker can still see the size of the HTTP requests going back and forward—which can be enough to extract significant pieces of information, especially in applications that make a lot of Ajax requests. # 23rd March 2010, 4:24 pm

Node.js, redis, and resque (via) Paul Gross has been experimenting with Node.js proxies for allowing web applications to be upgraded without missing any requests. Here he places all incoming HTTP requests in a redis queue, then has his backend Rails servers consume requests from the queue and push the responses back on to a queue for Node to deliver. When the backend application is upgraded, requests remain in the queue and users see a few seconds of delay before their request is handled. It’s not production ready yet (POST requests aren’t handled, for example) but it’s a very interesting approach. # 28th February 2010, 11:02 pm

Elastic Search (via) Solr has competition! Like Solr, Elastic Search provides a RESTful JSON HTTP interface to Lucene. The focus here is on distribution, auto-sharding and high availability. It’s even easier to get started with than Solr, partly due to the focus on providing a schema-less document store, but it’s currently missing out on a bunch of useful Solr features (a web interface and faceting are the two that stand out). The high availability features look particularly interesting. UPDATE: I was incorrect, basic faceted queries are already supported. # 11th February 2010, 6:33 pm