Simon Willison’s Weblog

Blogmarks tagged http

Filters: Type: blogmark × http ×


Async Support—HTTPX (via) HTTPX is the new async-friendly HTTP library for Python spearheaded by Tom Christie. It works in both async and non-async mode with an API very similar to requests. The async support is particularly interesting—it’s a really clean API, and now that Jupyter supports top-level await you can run ’(await httpx.AsyncClient().get(url)).text’ directly in a cell and get back the response. Most excitingly the library lets you pass an ASGI app directly to the client and then perform requests against it—ideal for unit tests. # 10th January 2020, 4:49 am

Usage of ARIA attributes via HTTP Archive. A neat example of a Google BigQuery query you can run against the HTTP Archive public dataset (a crawl of the “top” websites run periodically by the Internet Archive, which captures the full details of every resource fetched) to see which ARIA attributes are used the most often. Linking to this because I used it successfully today as the basis for my own custom query—I love that it’s possible to analyze a huge representative sample of the modern web in this way. # 12th July 2018, 3:16 am

How Balanced does Database Migrations with Zero-Downtime. I’m fascinated by the idea of “pausing” traffic during a blocking site maintenance activity (like a database migration) and then un-pausing when the operation is complete—so end clients just see some of their requests taking a few seconds longer than expected. I first saw this trick described by Braintree. Balanced wrote about a neat way of doing this just using HAproxy, which lets you live reconfigure the maxconns to your backend down to zero (causing traffic to be queued up) and then bring the setting back up again a few seconds later to un-pause those requests. # 7th November 2017, 11:36 am

On HTTP Load Testing. Mark Nottingham explains that running good HTTP benchmarks means understanding available network bandwidth, using dedicated physical hardware, testing at progressively higher loads and a whole lot more. # 18th May 2011, 10:17 am

The Inside Story of How Facebook Responded to Tunisian Hacks (via) “By January 5, it was clear that an entire country’s worth of passwords were in the process of being stolen right in the midst of the greatest political upheaval in two decades.”—which is why you shouldn’t serve your login form over HTTP even though it POSTs over HTTPS. # 24th January 2011, 6:06 pm

gzip support for Amazon Web Services CloudFront. This would have saved me a bunch of work a few weeks ago. CloudFront can now be pointed at your own web server rather than S3, and you can ask it to forward on the Accept-Encoding header and cache multiple content versions based on the result. # 12th November 2010, 5:33 am

LWPx::ParanoidAgent. Every programming language needs an equivalent of this library—a robust, secure way to make HTTP requests against URLs from untrusted sources without risk of tarpits, internal network access, socket starvation, weird server errors, or other nastiness. # 31st August 2010, 2:30 am

nodejitsu’s node-http-proxy (via) Exactly what I’ve been waiting for—a robust HTTP proxy library for Node that makes it trivial to proxy requests to a backend with custom proxy behaviour added in JavaScript. The example app adds an artificial delay to every request to simulate a slow connection, but other exciting potential use cases could include rate limiting, API key restriction, logging, load balancing, lint testing and more besides. # 28th July 2010, 11:34 pm

python/trunk/Lib/httplib.py in 1994 (via) Python’s original httplib implementation, checked in by Guido 16 years and 4 months ago. Not much younger than the Web itself. # 4th July 2010, 11:25 pm

Mongrel2 is “Self-Hosting”. Zed Shaw’s Mongrel2 is shaping up to be a really interesting project. “A web server simply written in C that loves all languages equally”, the two most interesting new ideas are the ability to handle HTTP, Flash Sockets and WebSockets all on the same port (thanks to an extension to the Mongrel HTTP parser that can identify all three protocols) and the ability to hook Mongrel2 up to the backend servers using either TCP/IP or ZeroMQ. I’m guessing this means Mongrel2 could hold an HTTP request open, fire off some messages and wait for various backends to send messages back to construct the response, making async processing just as easy as a regular blocking request/response cycle. # 17th June 2010, 8:11 pm

ElasticSearch memcached module. Fascinating idea: the ElasticSearch search server provides an optional memcached protocol plugin for added performance which maps simple HTTP to memcached. GET is mapped to memcached get commands, POST is mapped to set commands. This means you can use any memcached client to communicate with the search server. # 15th May 2010, 10:17 am

A HTTP Proxy Server in 20 Lines of node.js. Proxying is definitely a sweet spot for Node.js. Peteris Krummins takes it a step further, adding host blacklists and an IP whitelist as configuration files and using Node’s watchFile method to automatically reload changes to them. # 28th April 2010, 1:24 pm

Introduction to nginx.conf scripting. Slideshow—hit left arrow to navigate through the slides. The nginx community is officially nuts. Starts out with a simple “Hello world” using the echo module, then rapidly descends down the rabbit hole in to array operations, sub-requests, memcached connection pooling and eventually non-blocking Drizzle SQL execution against a sharded cluster—all implemented in the nginx.conf configuration file. # 21st April 2010, 11:40 pm

Side-Channel Leaks in Web Applications. Interesting new security research. SSL web connections encrypt the content but an attacker can still see the size of the HTTP requests going back and forward—which can be enough to extract significant pieces of information, especially in applications that make a lot of Ajax requests. # 23rd March 2010, 4:24 pm

Node.js, redis, and resque (via) Paul Gross has been experimenting with Node.js proxies for allowing web applications to be upgraded without missing any requests. Here he places all incoming HTTP requests in a redis queue, then has his backend Rails servers consume requests from the queue and push the responses back on to a queue for Node to deliver. When the backend application is upgraded, requests remain in the queue and users see a few seconds of delay before their request is handled. It’s not production ready yet (POST requests aren’t handled, for example) but it’s a very interesting approach. # 28th February 2010, 11:02 pm

Elastic Search (via) Solr has competition! Like Solr, Elastic Search provides a RESTful JSON HTTP interface to Lucene. The focus here is on distribution, auto-sharding and high availability. It’s even easier to get started with than Solr, partly due to the focus on providing a schema-less document store, but it’s currently missing out on a bunch of useful Solr features (a web interface and faceting are the two that stand out). The high availability features look particularly interesting. UPDATE: I was incorrect, basic faceted queries are already supported. # 11th February 2010, 6:33 pm

HTTP + Politics = ? Mark Nottingham ponders the technical implications of Australia’s decision to apply a filter to all internet traffic. Australia is large enough (and far enough away from the northern hemisphere) that the speed of light is a performance issue, but filtering technologies play extremely poorly with optimisation technologies such as HTTP pipelining and Google’s SPDY proposal. # 15th December 2009, 3:36 pm

Real time online activity monitor example with node.js and WebSocket. A neat exploration of Node.js—first hooking a “tail -f” process up to an HTTP push stream, then combining that with HTML 5 WebSockets to achieve reliable streaming. # 8th December 2009, 11:07 pm

Language Detection: A Witch’s Brew? The Flickr team make the case for using the Accept-Language header over IP detection to pick a site’s language, with a simple UI for switching languages in case you get it wrong. They’ve been using this for two and a half years without any significant problems. # 5th December 2009, 5:30 pm

SPDY: The Web, Only Faster. Alex Russell explains the benefits of Google’s SPDF proposal (a protocol that upgrades HTTP)—including header compression, multiplexing, the ability to send additional resources such as images and stylesheets down without needing the data:uri hack and Comet support built in to the core assumptions of the protocol. # 13th November 2009, 1 pm

node.js. “Evented I/O for V8 JavaScript”—a JavaScript environment built on top of the super-fast V8 engine which provides event-based IO functionality for building highly concurrent TCP and HTTP servers. The API design is superb—everything is achieved using JavaScript events and callbacks (even regular file IO) and the small standard library ships with comprehensive support for HTTP and DNS. Overall it’s very similar to Twisted and friends, but JavaScript’s anonymous function syntax feels more natural than the Python equivalent. It compiles cleanly on Snow Leopard. Definitely a project to watch. # 9th November 2009, 11:25 pm

Traffic Server. Mark Nottingham explains the release of Traffic Server, a new Apache Incubator open source project donated by Yahoo! using code originally developed at Inktomi around a decade ago. Traffic Server is a HTTP proxy/cache, similar to Squid and Varnish (though Traffic Server acts as both a forward and reverse proxy, whereas Varnish only handles reverse). # 1st November 2009, 12:15 pm

High-end Varnish-tuning. Tuning the Varnish HTTP cache to serve 27K requests/second on a single core 2.2GHz Opteron. # 20th October 2009, 9:25 am

cloud-crowd. New parallel processing worker/job queue system with a strikingly elegant architecture. The central server is an HTTP server that manages job requests, which are farmed out to a number of node HTTP servers which fork off worker processes to do the work. All communication is webhook-style JSON, and the servers are implemented in Sinatra and Thin using a tiny amount of code. The web-based monitoring interface is simply beautiful, using canvas to display graphs showing the system’s overall activity. # 21st September 2009, 11:09 pm

PostBin. Handy debugging tool for webhooks—create a TinyURL-style URL, then see a log of any POST requests made to that address. # 21st September 2009, 11:03 pm

PubSub-over-Webhooks with RabbitHub. RabbitMQ, the Erlang-powered AMQP message queue, is growing an HTTP interface based on webhooks and PubSubHubBub. # 1st July 2009, 8:22 pm

The Resource Expert Droid. Like the HTML Validator but for your server’s HTTP headers—extremely useful. # 25th June 2009, 10:06 am

Styling buttons to look like links. Nat has a neat trick for styling submit buttons to look like regular links—so there’s absolutely no excuse for using a “delete” link when you should be using a POST request. # 10th June 2009, 2:11 am

A rev=“canonical” HTTP Header. Chris Shiflett proposes optionally exposing rev=canonical information in an HTTP header, thus allowing sites to discover shorter URLs using just a HEAD request and removing the need to parse HTML. The pingback specification also uses this shortcut. # 12th April 2009, 12:33 pm

Concurrence. Exciting: a Python framework for “creating massively concurrent network applications” (the tutorial benchmarks a Hello World web server at over 8,000 requests a second). It’s implemented on top of libevent using pyrex, can run on either Stackless Python or Greenlets from the py library and ships with a WSGI server, an HTTP client and a DBAPI 2.0 compliant MySQL driver. # 15th March 2009, 1:28 pm