Simon Willison’s Weblog

Entries in 2004

Filters: Type: entry × Year: 2004 ×

Some notes on Wikipedia

I’ve been driving myself crazy with coursework over the past couple of weeks, and since it’s always good to have something to take your mind off things I’ve also been spending a fair amount of time lurking around the beautiful Wikipedia. Here are a few things about Wikipedia you may have missed:

[... 509 words]

A quote

From What’s next for Google, by Charles H. Ferguson.

[... 92 words]

Google Print

I’m probably late to the party on this one, but I just noticed that Google Print results are now included in any Google search that starts with “books on”. I can’t say I like the lousy discoverability of the interface much—a search box at would be a welcome addition—but the results are pretty impressive. It’s also a shame that they’re using a nasty obfuscation technique to disable copying and printing (based on serving book pages up as background images), if only because it will fuel yet more questions from newbie web developers asking how to do exactly that. Still, with today’s announcement that Google are to team up with five leading libraries to scan more books this service is going to get a whole lot more important over the next few years.

[... 153 words]

Casting out getters and setters

Python Is Not Java by Phillip J. Eby (via Ned) is the most useful article on programming I’ve read in ages. If you have any interest at all in either language, go and read it. It’s all good, but the part that really struck a nerve for me was this:

[... 221 words]

Blogmarks on

I’m horribly ill again: having defeated the mumps I now seem to have come down with some kind of ’flu thing. Lovely. In between whinging about my state of health and watching episodes of Frasier I’ve been playing with as part of my research in to web annotation. The connection between the two isn’t particularly strong but it’s clear that something very exciting is happening over there.

[... 276 words]

Eclipse download hell

One of the many thing the Mozilla/Firefox team have got right is the fantastic ease with which the application can be downloaded. Visitors to are greeted with a nice big “Free Download” link, aimed straight at the version for their (automatically detected) operating system hosted on a mirror geographically close to their IP address. It’s hard to think of any way they could improve on this.

[... 527 words]

No EU Software Patents

It would be nice if someone with some serious design credentials would knock up some more aesthetically pleasing banners.

[... 119 words]

The Register hit by XSS

Here’s a nasty one: popular tech news site The Register was hit on Saturday by the Bofra exploit, a nasty worm which uses an iframe vulnerability in (you guessed it) Internet Explorer to install nasty things on the victim’s PC. Where it gets interesting is that the attack wasn’t against the Register themselves; it came through their third party ad serving company, Falk AG.

[... 262 words]


I’ve become yet another statistic in the Bath Mumps epidemic of 2004. I’m quarantined until next Monday, and this afternoon we had a camera crew from ITV West come round to film some doom-and-gloom footage warning students to get vaccinated. Amusingly the camera man hadn’t had Mumps and took suitable precautions to avoid infection. I’m told that the piece will go out on ITV news for the south west at 6pm this evening.

[... 97 words]

Usability blunders

I stumbled across this today and thought it was just too good not to share.

[... 62 words]

Open source license help needed

Every now and then, I get an e-mail asking me to clarify the license associated with code that I’ve posted on this site, such as my date parsing script. I’m looking for an open source license that I can start slapping on things to ensure people that they can use it for whatever they want, but wading through the list of licenses is no fun at all. Here are the features I’m looking for:

[... 294 words]

Let a thousand conspiracy theories bloom

I’m about to hit the sack, but current indications are that Bush has won Ohio by a couple of percentage points and thus has been re-elected as President of the United States.

[... 122 words]

Election endorsements

My ex-colleague Jacob Kaplan-Moss has put together a fantastic site listing the presidential endorsements published by American newspapers in the run up to the election. I was looking for something like this just the other day so it was great to find the answer so close to home. I was depressed but not at all surprised to see my former employer endorse Bush, but it’s interesting to see that of the four Kansan papers listed two endorsed Kerry, despite that state’s huge Republican majority.

[... 103 words]

Keeping up appearances

Wow, I think this is the longest gap in my blogging since I started! I wish I could say I’ve been enjoying the sunshine or taking up a new hobby, but the truth is that the weather’s been horrible and I’ve just been run off my feet readjusting to life in England and at University.

[... 222 words]

Back in England

And I’m back.

[... 80 words]

Running Pydoc under mod_python

I’ve written about pydoc before. In my opinion it’s one of Python’s best kept secrets: a way of instantly browsing the properties, methods and documentation strings of any module available to the Python environment. It can even run a local HTTP server to allow for easy browsing of available documentation.

[... 372 words]

Python2.4 highlights

A.M. Kuchling’s “What’s New in Python X” documents are always a treat, and his guide to the forthcoming Python 2.4 is no exception. Among other things, 2.4 elevates sets to built in type status, dramatically improves the usability of Python’s list sort method (for easier application of DSU, aka the Schwartzian transform), makes reverse iteration easier and introduces an alternative string substitution method.

[... 231 words]

Matching newlines in JavaScript

Just a quick note: the . character in a JavaScript regular expression will never match a newline character. If you want to match any character including newlines you can use the [\s\S] character class instead, which means “any character that’s either whitespace or not whitespace”.

[... 86 words]

Browser innovation is alive and well

Here’s a feature that caught me by surprise (maybe I haven’t been keeping my ear close enough to the ground): the new Firefox 1.0 preview release supports Live Bookmarks, a novel twist on RSS aggregators where feeds look just like bookmark folders, displaying a list of bookmarks corresponding to the headlines from the feed. Best of all, the feature support RSS autodiscovery. Sites with auto-discoverable feeds display an attractive RSS icon on the right hand side of the status bar, allowing for one click subscriptions.

[... 230 words]

Command line blacklisting

Just over a year ago, I started blacklisting domain names from links featured in comment spam. My idea then was that these blacklists could become a shared resource: people would publish their own blacklist and subscribe to those of people they trust, thus making it much harder for spammers to operate. While the sheer volume of spam domains meant that the technique was much less useful than I originally anticipated, I’ve continued to maintain my blacklist ever since as a preventative measure against repeat spammers.

[... 721 words]

The bookmarklet solution to the password problem

Anyone who makes heavy use of the internet has run in to the password problem: dozens of user accounts on sites with varying degrees of trustability, leading to an unmanageable proliferation of username and password combinations. The temptation is to use the same combination on multiple sites, but doing so opens you up to the horrifying prospect of a security flaw in one site compromising al of your other accounts.

[... 366 words]

How to track an RSS feed

According to the HTTP specification, RSS/Atom aggregators should obey the HTTP 301 Moved Permanently header by altering the stored subscription URL for the feed they are attempting to retrieve.

[... 199 words]

1000th Blogmark

I just posted my 1000th blogmark. I can’t emphasize enough how much of an impact this 15 minute hack has had on both my browsing and my blogging habits. While I still tend to leave browser windows open for days at a time, I now at least have a procedure for getting rid of the ones that still interest me. More importantly, having blogmarks has eliminated the temptation to write a full blog entry (with quotation) just to share a link. This has dramatically reduced my posting rate, but has meant that when I do post an entry I usually have something moderately interesting to say.

[... 181 words]

A snarky note from the administrator

No, you can’t have a Gmail invite. No, I won’t hack your email account for you. And if you can’t find your hotmail inbox, you shouldn’t be using a computer.

[... 44 words]

Site specific stylesheets in Mozilla

New in Mozilla 1.8 Alpha 3: bug 238099—implement at-rule for matching on site/document URL. Here’s the example:

[... 111 words]

Participatory journalism

Participatory (or citizen) journalism is getting a lot of coverage at the moment, thanks in part to Dan Gillmor’s new book We the Media. For a great example of participatory journalism in action, check out Wikipedia’s outstanding coverage of the 2004 Summer Olympics. It’s already a serious competitor to the official site in terms of content, and its wiki nature means it will only get better as the games continue. Hat tip: Gadgetopia.

[... 203 words]

Early adoption, and Airport Express cut-outs

I don’t know quite how I did it, but in the past 48 hours I’ve become an Apple early adopter. I spent the weekend in Minnesota, where a visit to the Mall of America (aka Unholy Temple to Consumerism) resulted in a visit to the Apple store, and a visit to the Apple store resulted in a shiny new fourth generation 20 GB iPod. Of course, the seven and a half hour journey back south would go so much faster with an iTrip to play with, so I picked one of those up as well.

[... 477 words]

Improving online credibility

If you’ve browsed Amazon’s product reviews recently you may have noticed an interesting new feature: Badges, little icons displayed below certain people’s names. This isn’t a new idea by any means—many online communities use special icons as rewards for members who make valuable contributions (SitePoint is a good example). What’s interesting about Amazon’s badges is that one of them is “Real Name”. Amazon’s Real Names FAQ explains the badge, and includes the following:

[... 230 words]

Jimmy Wales on battling wiki spam

Jimmy Wales of Wikipedia was interviewed recently by the Slashdot community. One of the questions regarded protecting Wikis from spammers:

[... 241 words]

Site-specific extensions

I’ve been thinking about per-site user stylesheets for a while now, but my colleague Adrian has gone one better: his All Music Guide Corrector extension for Firefox fixes their horrible JavaScript links, hides the useless Flash navigation and improves their unpopular “read more” links, causing them to load content on the current page rather than navigating to a new page entirely.

[... 211 words]