Simon Willison’s Weblog


Entries in Sep, 2003

Filters: Type: entry × Year: 2003 × Month: Sep × Sorted by date

Battling comment spam

It’s a sad state of affairs when you come back to your blog after a week elsewhere and have to add another 56 domains to your blacklist. I’m actually getting more comment spam than legitimate comments now—this is becoming more than just a minor nuisance. I’m considering a number of improvements, including adding a moderation queue to comments on entries posted more than a month ago, disabling the comment form if the referral is a search engine (as per Russell Beattie’s suggestion) and adding some kind of wildcard support to the blacklist file.

[... 114 words]

“Interactive Tabular Data”

Just spotted in a comment by Drew McLellan on Russell Beattie’s Notebook:

[... 147 words]

Auto-complete text boxes

There’s a great new article up on Sitepoint describing a technique for adding auto-complete functionality to normal HTML text input fields using Javascript. The code uses a whole bunch of browser-specific code, but it has to thanks to the unconsistent ways in which different browsers handle text selection ranges. Unfortunately the article doesn’t actually provide a demo of the code in action, so I’ve posted one here. It’s a very nice effect.

[... 86 words]

The pirate’s code

So, now that “talk like a pirate” day has sadly come to an end, it’s time to reveal the five minute code hack that rendered my front page semi-legible for the best part of a day. It was actually pretty simple:

[... 260 words]

New virus?

I don’t usually get more than 5 or 6 spams a day, but today I’ve been hammered with an additional 7 emails with executable attachments claiming to be the “latest critical patch” from Microsoft. The emails are HTML formatted and make a pretty convincing rendition of a page, so I can see how less savvy internet users could easily fall for them. Is this yet another virus outbreak? I make that the third in as many weeks.

[... 81 words]


It be International Pirate Day, ya scurvy landlubbers. Avast and be merry, for the day is young and the wenches plentiful. And remember t’chat up line of the day: Prepare to be boarded!

[... 45 words]

Dive Into Python reborn

Sweet. Mark Pilgrim is working on Dive Into Python again, funded by a dead tree publisher for publication in 2004 (hopefully). The free version will stay available as well. I’ve always preferred reading paper to reading a screen so I’m definitely down for a copy.

[... 135 words]


Via Ned Batchelder, an article on Reversing Regular Expressions from Otherwise known as Sexeger, these offer a performance boost over normal regular expressions for certain tasks. The basic idea is pretty simple: searching backwards through a string using a regular expression can be a messy business, but by reversing both the string and the expression, running it, then reversing the result far better performance can be achieved (reversing a string is a relatively inexpensive operation). The example code is in Perl, but I couldn’t resist trying it in Python. The challenge is to find the last number occurring in a string.

[... 384 words]

Google conspiracy theories

Microdoc News have a poorly researched story suggesting that Google have been engineering their search results to favour their own properties:

[... 582 words]

Interactive Python

I adore the Python interactive interpreter. I use it for development (it’s amazing how many bugs you can skip by testing your code line by line in the interactive environment), I use it for calculations, but recently I’ve also found myself using it just as a general tool for answering questions.

[... 983 words]


Paul Sowden is the blogger who inspired me to start my own blog over a year ago. He’s restarted his blog at a new domain: Let’s hope the new site doesn’t live up to its name. Oh, and be sure to view source for Paul’s own special brand of minimalist valid HTML 4.

[... 70 words]

New content management blog

Ideas in Technology and Publishing is a great new blog covering content management, XML and other publishing related technologies. It’s less than a month old so it’s still possible to read through the archives in full, which I’ve just done and recommend to anyone with an interest in content management.

[... 54 words]

Curious emails

There follow two of the weirdest emails I have ever received through my contact form. The first is a fascinating rant against standards compliant client side scripting:

[... 288 words]

Python for teaching mathematics

Kirby Urner provides some great examples of how Python can be used as an aid to understanding mathematics on the marketing-python mailing list. I particularly liked this demonstration of Pascal’s triangle using Python generators:

[... 139 words]


Russ Weakley has followed up his excellent Listamatic with a useful set of tutorials on styling lists. The style of the tutorial looks worth emulating: each page adds a new property, explaining what it does and showing how it affects the list. I particularly liked the Icon lists demonstration.

[... 53 words]

Screen readers and display: none

I’ve long heard rumours that some screen readers fail to read out text hidden using the CSS display: none property, but I had never really investigated it as I don’t have access to a screen reader myself (I should really download the JAWS trial some day). Bob Easton’s What do screen readers really say? describes the problem and specifies a number of tests for screen reader abilities, the results of which are collated on this Wiki page. As a side note, quickly collecting the results of this kind of test is an excellent way to make use of a Wiki.

[... 331 words]

Prior Art

The most interesting thing to come out of this whole Eolas disaster could well turn out to be Ray Ozzie’s description of how Lotus Notes was demonstrating many of the funamental abilities of today’s browsers, including dynamic application embedding remarkably similar to that covered back in the patent, way back in 1993. The patent was filed in 1994. Prior art? We can only hope.

[... 69 words]


Retro games, 80s music. Awesome. Via NTK.

[... 13 words]

PostgreSQL Performance Optimisation

Via the pgsql-performance mailing list, a great guide to Tuning PostgreSQL for performance, accompanied by a huge table of annotated configuration options.

[... 40 words]

Javascript free rollovers

I’ve talked about image rollovers on this site before, but I’ve never seen a technique I like half as much as Pixy’s Fast rollovers, no preload needed. Like all good techniques, it’s so simple I’m surprised no one has thought of it before. The trick is that a single image is loaded containing the different rollover states, then positioned as the background of a fixed pixel size link element in such a way that only one of the states is shown. The :hover style simply changes the offset of the background, revealing the secondary (or even tertiary) state.

[... 103 words]

Andy in the Garden

My friend Andy’s design skills have been recognised by the CSS Zen Garden. Congratulations! He’s number 42.

[... 22 words]

“Is Evil..” titles are evil

Too excellent articles on Object Oriented Design: Why extends is evil and Why getter and setter methods are evil. Ignore the inflammatory titles: the subheading of the second article, “Make your code more maintainable by avoiding accessors”, is a much better indication of their content. I picked up some great tips on proper use of OOP from reading them. In particular, the section on CRC cards made something click which hadn’t clicked when I looked at them earlier this year for my ill fated University software project.

[... 93 words]


I’ve always wondered how fonts work. I now have a much better understanding of the technology involved thanks to Microsoft’s excellent Typography site, in particular this Introduction to hinting from 1997 (via ).

[... 35 words]

Short stories

Cory Doctorow has a new book of short stories coming out, and has released six out of nine of them under a creative commons license following the success of Down and Out in the Magic Kingdom. I just finished reading Craphound and I thoroughly enjoyed it.

[... 56 words]

Thunderbird 0.2

Thunderbird has to have the most deceiving version numbers of any software I’ve ever used. I avoided version 0.1 for ages because I incorrectly assumed that a 0.1 release shouldn’t be trusted with my email. I’ve just upgraded from 0.1 to the new 0.2 and a good product has got even better—it’s noticably faster and more responsive and they’ve knocked 1.5MB off the size of installer. I love the new direction the Mozilla organisation have been taking with their focus on separate applications; I wonder if we’ll be seeing a spin off of Composer any time soon.

[... 104 words]

I guess I should hand in my passport

An example Britishness test based on proposals by the home office for a written test for immigrants applying for citizenship. I got 3 out of 10! (via Simon Brunning).

[... 36 words]

Python Client Libraries

Three really useful looking Python modules: ClientForm, ClientTable and ClientCookie. ClientForm looks like it provides similar functionality to the form handling part of the WWW::Mechanize perl module, discussed previously. It essentially provides a very simple interface for loading an HTML page, parsing out the form information then filling in the form and submitting it back to the server. The author recommends it for automated testing (I’ve always had trouble figuring out how to link unit testing in to web applications) but I’m sure it could be useful for screen scraping tools as well. ClientTable is an early beta of a powerful looking table parser, and ClientCookie sits on top of the standard urllib library and transparently persists cookies in between requests.

[... 132 words]

Installing PySQLite

Techno Weenie has a detailed guide to setting up PySQLite on boxes you don’t have root access to. SQLite looks ideal for small to medium sized applications so I can see this being really useful should I ever write something that uses it.

[... 48 words]


Russ Weakley’s Listamatic borrows a whole bunch of fun CSS list effects from around the web and shows how they can be applied to the same markup to produce a large range of different results.

[... 39 words]

Blacklisting Comment Spam

I’m fed up with comment spam. From now on, any comment I judge to be spam will be deleted, and the domains linked to from that comment will be blacklisted. Any future comments that contain links to those domains will be refused. My blacklist will be made available as a simple text file, one domain per line, at blacklist.txt. You are welcome to grab a copy of that file once every 24 hours and use it as part of your own comment spam prevention system. I will manually approve all domains that are added to it to ensure only domains of a dubious nature end up blacklisted.

[... 185 words]



