Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

Python Client Libraries

Three really useful looking Python modules: ClientForm, ClientTable and ClientCookie. ClientForm looks like it provides similar functionality to the form handling part of the WWW::Mechanize perl module, discussed previously. It essentially provides a very simple interface for loading an HTML page, parsing out the form information then filling in the form and submitting it back to the server. The author recommends it for automated testing (I’ve always had trouble figuring out how to link unit testing in to web applications) but I’m sure it could be useful for screen scraping tools as well. ClientTable is an early beta of a powerful looking table parser, and ClientCookie sits on top of the standard urllib library and transparently persists cookies in between requests.

This is Python Client Libraries by Simon Willison, posted on 5th September 2003.

View blog reactions

Next: I guess I should hand in my passport

Previous: Installing PySQLite

11 comments

  1. Oooh, thanks Simon, this post was quite timely for me, as I'm just "diving in" to unit testing Python web apps... these modules do look useful. I was using a very dumb class to mask a request, which works ok, at least with Zope (which I'm developing within).

    Gina - 5th September 2003 20:54 - #

  2. I'd love to know how you get on with unit testing in that case - I've been wanting to give it a go for months but I've been put off by the relative complexity of testing web and database applications. The best explanations of unit testing I've seen all focus on simple algorithms where a function has an expected input and an expected output, but web applications and database stuff isn't quite that simple.

    Simon Willison - 5th September 2003 21:00 - #

  3. I haven't got on to testing web interfaces yet but being as all of the web apps we create are loosely coupled to xhtml interface then it's a simple matter of testing the core separately from the interface.

    In order to test databases, I developed a python shelve cacheing system whilst creating my celko/adjacency tree class. have a look at the blog entry for unit testing databases

    Tim Parkin - 6th September 2003 17:59 - #

  4. So have you tried my webunit framework? Mentioned it in the comments the last time you looked into these things. It has fetching, cookie handling, a complete DOM parser with extensions for pulling out a form and submitting it. It has browser-like behaviour - you may request that it both follow redirects and load all linked information automatically. All in the unittest framework. http://mechanicalcat.net/tech/webunit/

    Richard Jones - 6th September 2003 23:44 - #

  5. I've tried some modules a long ago. In the end, nothing beated the great httpunit library. It's in java, and I have to use it from jython. They've implemented forms, cookies, redirection, and even javascript! I believe that none of python http client libraries are complete as httpunit.

    Paulo Eduardo Neves - 8th September 2003 21:18 - #

  6. It implements *javascript*??? Yeah, that trumps webunit. My code does everything else (you didn't mention "load all the stuff referenced from a page like a browser does) but I *assume* httpunit does that.

    Richard Jones - 9th September 2003 08:03 - #

  7. Well, here's how I've been doing it (and this probably isn't the best way):

    I've developed a class which will be instantiated within a Zope template, and (obviously) accesses parameters on the current http request. (Zope makes it easy and accesses cookies, form posts and query string params the same exact way.) So I write a setRequest method for my class. This way, within the Zope template, the object is instantiated and the request object is set to the current http request (available through Zope). Then the object's methods do what they're supposed to do.

    When I write my unit test, I instantiate a request object (a simple dumb wrapper I wrote in only a few minutes) and set all the parameters my web object would expect (or not expect), and then call setRequest(dumbFakeRequest) on the object I'm testing. Then I write my test methods from there.

    It's worked so far.

    Gina - 12th September 2003 23:15 - #

  8. Richard Jones:

    It implements *javascript*??? Yeah, that trumps webunit. My code does everything else (you didn't mention "load all the stuff referenced from a page like a browser does) but I *assume* httpunit does that.

    Hah, you're falling behind, Richard <wink>. Actually, httpunit doesn't implement an interpreter (of course!), it just reuses Mozilla's Rhino JS interpreter. I think HTTPClient is also worth a mention, as are Mozilla, Konqueror and MSIE (all accessible from Python).

    I've just written some JavaScript support for Python. I used Mozilla's other JS implementation, spidermonkey. Sort-of works, but still very early days, though.

    Of course, Python already does cookies, redirections, META refresh, HTML DOM etc. No nice unified browser-like API yet, though. I'm working on it...

    John - 26th September 2003 18:46 - #

  9. These modules sound like they could be used for reverse-screenscraping, i.e., automated form filling. I'm considering a project that would automate filling out a series of forms on a public web-site for a real estate company. The filings are mandated by law, but the municipality is not interested in creating an interface to allow for loading from other systems - just interactive. It's not an ideal solution, but better than having someone type in hundreds of submissions per day. Has anyone used these Python modules for anything like that?

    Michael Hayes - 30th September 2003 22:52 - #

  10. (rather delayed answer..) Yes. Pretty much exactly that situation, in fact (but many more submissions).

    John - 6th January 2004 18:42 - #

  11. I am using perl www:mech to automate and screen scrape nyc gov websites ( with permission) I cant seem to get one site to allow for the scrape and i think it has to do with javascript. What can i use on a non linux machine to berter do this. are there any modules for perl that allow javascript?

    stephen - 28th January 2004 02:14 - #

Comments are closed.

Previously hosted at http://simon.incutio.com/archive/2003/09/05/pythonClientLibraries

A django site