Simon Willison’s Weblog

Subscribe

February 2003

Feb. 3, 2003

Mechanize the web

Via Keith Devens, Screen-scraping with WWW::Mechanize describes how Perl’s WWW::Mechanize module can be used to grab information from sites that require a user login. I’ve always dismissed screen scraping as something of a wasted effort, given the fact that a major rewrite of the scraper is required whenever the target site tweaks its HTML. This article has encouraged me to reconsider—some of the functionality in WWW::Mechanise is fantastic:

[... 262 words]

Feb. 4, 2003

Vellum on Windows

Via Paul Freeman, detailed instructions for installing Stuart’s Vellum Python blogging system on Windows using either IIS or Apache.

More on screen scraping

In response to yesterday’s screen scraping post, Richard Jones describes a screen scraping technique that uses PyWebPwerf, a Python performance measuring tool.

[... 80 words]

Feb. 5, 2003

Zeldman and definition lists

I’m really liking Jeffrey Zeldman’s latest redesign. Aside from a pretty face, the markup holds some interesting ideas as well. For example, I’ve never seen a definition list used for a blogroll style list before:

[... 194 words]

The slashdot effect

Dave Winer asks why Joel Spolsky gets much more traffic when slashdotted than UserLand’s hosted sites tend to. Joel explains (it’s all down to network effects) and mpt kicks in a few ideas as well.

More YAML

Paul Tchistopolskii’s XML Alternatives reminded me to take another look at YAML. The specification has been updated since I last looked and seems to be a bit more complicated, but it’s still a very nicely designed format. Implementations are available for Perl, Python and Ruby with C and Java on the way but strangely no one seems to be doing one for PHP yet. I’m doing a course at Uni on compilers at the moment which includes quite a lot of stuff about writing parsers so I’m very tempted to have a go at a YAML implementation in the next few weeks just to try stuff out. The possibility of easily swapping relatively complex data structures between PHP and Python is pretty tempting as well.

Enhanced textareas

Via Leonard Lin, a nice demonstration of an enhanced HTML text area (with buttons to add tags) that works in IE, Mozilla and Phoenix. Until recently this had not been possible thanks to a long standing Mozilla bug.

Feb. 6, 2003

Better mouse gestures

Optimoz have released Version 0.3.5 Release Candidate 3 of their mouse gestures add-on for Mozilla based browsers. I hadn’t tried the version 0.3.5 series before and the improvements are impressive to say the least:

[... 171 words]

A better phoenix icon

And over on Blogzilla, Lim Chee Aun has finally solved the niggling bug with Phoenix 0.5 on Windows where the icon shown in the taskbar is an ugly default Windows image.

Feb. 7, 2003

Meetup needs work

It looks like Scott got burned by a PHP MeetUp arranged at an out of business restaurant that then failed to materialise at all. From his comments it seems like he’s not the only person to hit problems. I have yet to attend a meetup (the Bristol UK ones rarely have anyone sign up for them) but I love the concept, so it’s a real shame to hear about problems like this. Hopefully the MeetUp team are working on ways to stop this kind of thing from happening—some kind of short-lived email mailing list for each location/event might go some way to ensuring people who sign up for them know what’s going on and bother to show up. At least their recent changes page shows that they have been actively trying to improve the overall experience.

Real girls eat beef

Cool-2B-Real is a site for teenage girls. Real girls are “keepin’ it real” by building strong bodies and strong minds... and they’re feeling great about themselves! It has health and fitness tips, tips on feeling good about yourself, a poll (“What type of beef do you most like to eat with your friends?”), and a set of Smart Snackin’ recipes such as Nacho Beef Dip, Beef on Bamboo, Easy Beef Chili and Roast Beef and Veggie Wrap. And beef games too.

[... 110 words]

Help needed

Does anyone know if it is possible to find an HTML element’s exact position on the page (in terms of pixels-from-the-top and pixels-from-the-left) using javascript? The element I have in mind is an image that has not had any positioning applied to it, but I imagine any solution will work for other elements as well. I need something that works in Mozilla/Phoenix, although a solution for IE would be nice as well. It’s for a bookmarklet I’m thinking of writing.

Feb. 8, 2003

Image Drag bookmarklet

I got a good response to yesterday’s call for help on finding an HTML element’s co-ordinates on a page. I ended up using PPK’s findPos functions which seemed to do the trick just fine.

[... 338 words]

Hashing client-side data

Via Scott, a clever PHP technique for ensuring data sent to the browser as a cookie or hidden form variable isn’t tampered with by the user:

[... 248 words]

pngcrush

Mark invoked the lazy web earlier today in a bid to find a good way of bulk optimizing PNG files. Several people recommended pngcrush in the comments and it sounds like a fantastically useful piece of software—apparently it can run 114 different lossless compression methods on an image and automatically chose the most efficient one.

Feb. 11, 2003

Nice titles

Stuart has posted yet another inobtrusive DHTML gem—Nice Titles, inspired by a thread on web graphics.

Validity would be nice

In-Valids is an enjoyable rant by Joe Clark chastising the big guys on the web for being completely incapable of producing valid HTML.

Label elements

Peter Van Djick asks why does hardly anyone use LABEL tags? It’s a very good question—in my opinion label tags, like title attributes on links, are a complete no-brainer. They’re well supported by all modern browsers, completely backwards compatible (in that there are no ill effects for older browsers), great for accessibility and easy to implement. They’re much more than just an accessibility issue—the usability of a form is dramatically increased by the addition of label tags, especially for radio and check boxes where they greatly increase the “target area” for the user to click on.

[... 235 words]

Indexing hypertext

Dorothea Salo explains the thorny problem of indexing (the back-of-a-book kind rather than the search-engine-spider kind) marked up electronic documents. Another example of what my first year software engineering lecturer would call a “wicked problem”.

Feb. 13, 2003

Image Drag bookmarklet fixed

Boris Zbarsky offered a fix for my image drag bookmarklet’s problems in Strict doctype pages. The problem was due to Mozilla, when operating in strict mode, refusing to absolutely position elements that don’t have a unit of measurement specified. The bookmarklet now works perfectly on pretty much every page I’ve tried it on.

[... 61 words]

Feb. 15, 2003

Classes for pages

This weekend I started work on my latest web project, further details of which will no doubt follow soon. For the moment I’ll just say that it follows the classic news/articles/users with logins model—basically another small-to-medium sized PHP content management system.

[... 482 words]

micro_httpd

micro_httpd is a very small Unix-based HTTP server—so small in fact that it is implemented in just 150 lines of C. From the perspective of a relative C newbie the code makes fun reading.

Agent Frank

l.m.orchard has released the code for his oft-discussed personal web proxy in the form of Agent Frank. It looks really neat, but unfortunately as it’s written in Java and I don’t have space on my shiny new Linux install to get Java set up I can’t play with it yet (looks like I’ll have to finally shell out for the new hard drive I’ve been promising myself). Cute logo though :)

Feb. 16, 2003

Google aquire Blogger

Lots of analysis around the blogosphere today of Google’s surprise aquisition of Blogger. Cory Doctorow’s analysis is (in my opinion) especially worth reading. Personally, I just hope Google do something about Blogger’s revolting archive URLs :)

Eric Meyer’s colour blender

Eric Meyer’s Color Blender is an incredibly useful tool for picking colours for a web site. Give it two different hexadecimal colour codes and it will calculuate and display between 1 and 10 “midpoint” colours. It’s fun to play with and great for tracking down that elusive perfect shade of green...

SQL slammer analysed

Robert Graham’s analysis of SQL Slammer cleared up quite a few things I had been wondering about the worm. It confirms that the majority of the infections were caused not by SQL Server (as reported widely by the press) but by the embedded MSDE component, which is far less likely to be patched (or firewalled off from the public internet) than SQL Server.

[... 128 words]

Feb. 20, 2003

DNS mess

As the recent lack of updates demonstrates, I’ve been getting stuck in to a pretty time consuming new project. It should have launched several days ago but I made a right royal hash of the DNS settings—hopefully everything will be working fine in about 24 hours time.

[... 119 words]

Calendars and crawlers

Douglas Bowman has been having some amusing problems with robots and his calendar. The calendar, visible on every page of the site, automatically adds a “next month” and “previous month” link to allow surfers to browser through the archive in both directions. Unfortunately, Doug ommitted the logic to stop showing a “previous month” link when there were no earlier entries. An enterprising crawler started following the links, and didn’t stop until it had reached 1542!

[... 113 words]

Get a better browser!

Via Scott, this oh-so-true quote from a Microsoft “next-generation technology” consultant:

[... 377 words]

Watch out for Javascript in referrals

Here’s a good reminder why you should always encode < and > as HTML entities when displaying content from an untrusted (i.e external) source: Kasia in a nutshell was hit by a false referrer containing javascript deliberately aimed at hijacking the page the referrer was displayed on:

[... 76 words]

2003 » February

MTWTFSS
     12
3456789
10111213141516
17181920212223
2425262728