Simon Willison’s Weblog

Subscribe

HttpClient PHP class

6th April 2003

I’ve been working in quite a roundabout fashion recently. My principle target is to build a collaborative blogging system. As part of this, I needed an RSS aggregator to allow a single blog to show the most recent entries from a number of other, related blogs. Then I needed a way of downloading RSS feeds from external sites. While thinking about this (although to be fair it’s pretty much a solved problem) I was inspired to build something that could cache whole sites. And that lead me to need a PHP HTTP client class for retriving information from the web. So I wrote one of those :)

HttpClient is similar in some ways to Snoopy (which I have been using and recommending for years) but takes a different approach and includes some interesting new features. Firstly, while Snoopy contains a bunch of code for parsing HTML (to extract forms, links and the link) HttpClient concentrates purely on the HTTP side of things, leaving HTML parsing to other classes. Secondly, HttpClient supports gzip encoding. And finally, HttpClient is designed to be used multiple times in a single session, and will store and resend cookies and referral information between requests.

The HttpClient site has example code, a manual and a demo which shows the client accessing Amazon.com with debug mode turned on.

As an aside, I learnt a couple of useful things about HTTP while putting the class together, both of them from reading comments in the PHP Manual. Firstly, HTTP 1.1 is best avoided from a scripting point of view—it requires support for chunked encoding if you want to avoid random hex added to your content, and provides no practical advantages over HTTP 1.0 (cookies / gzip encoding and the all important Host: header work just fine without it). Secondly, if you want to uncompress gzip encoded content from an HTTP response you need to remove the first 10 characters before running the gzinflate() function or it will fail with a mysterious error.

This is HttpClient PHP class by Simon Willison, posted on 6th April 2003.

Next: Lots and lots of CSS buttons

Previous: Personal web cache

Previously hosted at http://simon.incutio.com/archive/2003/04/06/httpClient