Simon Willison’s Weblog

Subscribe

September 2003

Sept. 2, 2003

Fighting Filters and DDoS

Paul Graham’s essays on fighting spam are generally excellent; it was Paul who sparked the recent flurry of activity surrounding Bayesian statistical filters and inspired the creation of some of the best tools for fighting spam yet. Paul’s latest suggestion, Filters that fight back, seems to me to miss the mark in a big way. Paul suggests email servers should “follow” links in any email received. This would turn the tables on spam, as suddenly sending out a million spams would result in a million useless hits to the site being promoted, quickly brining it to its knees. It’s a great concept, until some malicious script kiddie realises that they’ve been handed a tool to run massive distributed denial-of-service attacks on any domain they care to target. Not to mention that such a feature would make many legitimate mass email tools prohibitively expensive to run.

[... 190 words]

SQLObject

My new favourite toy is SQLObject, an object-relational mapper which makes heavy use of Python’s special method names to create objects which can be used to transparently access and modify data in a relational database. I tried to write something like this in PHP once before and failed miserably, but SQLObject has such an elegant design that I’m just annoyed I didn’t find out about it sooner. Here’s some example code, adapted from the SQLOBject site:

[... 249 words]

Googling for fun and profit

In the style of Mark Pilgrim, Googling for fun and profit.

Show less errors

The W3C Validator team are seeking help with the latest version of their validator, dubbed the “Zeldman Made Us Do It!” release. They want people to play with the beta and submit suggestions for error messages that would make more sense to the average user. They also have a new feature called “fussy mode” which acts a bit like a lint tool for checking code, highlighting problems that aren’t necessarily illegal markup but may not be best practise techniques.

[... 501 words]

Blacklisting Comment Spam

I’m fed up with comment spam. From now on, any comment I judge to be spam will be deleted, and the domains linked to from that comment will be blacklisted. Any future comments that contain links to those domains will be refused. My blacklist will be made available as a simple text file, one domain per line, at blacklist.txt. You are welcome to grab a copy of that file once every 24 hours and use it as part of your own comment spam prevention system. I will manually approve all domains that are added to it to ensure only domains of a dubious nature end up blacklisted.

[... 185 words]

Sept. 5, 2003

Listamatic

Russ Weakley’s Listamatic borrows a whole bunch of fun CSS list effects from around the web and shows how they can be applied to the same markup to produce a large range of different results.

Installing PySQLite

Techno Weenie has a detailed guide to setting up PySQLite on boxes you don’t have root access to. SQLite looks ideal for small to medium sized applications so I can see this being really useful should I ever write something that uses it.

Python Client Libraries

Three really useful looking Python modules: ClientForm, ClientTable and ClientCookie. ClientForm looks like it provides similar functionality to the form handling part of the WWW::Mechanize perl module, discussed previously. It essentially provides a very simple interface for loading an HTML page, parsing out the form information then filling in the form and submitting it back to the server. The author recommends it for automated testing (I’ve always had trouble figuring out how to link unit testing in to web applications) but I’m sure it could be useful for screen scraping tools as well. ClientTable is an early beta of a powerful looking table parser, and ClientCookie sits on top of the standard urllib library and transparently persists cookies in between requests.

I guess I should hand in my passport

An example Britishness test based on proposals by the home office for a written test for immigrants applying for citizenship. I got 3 out of 10! (via Simon Brunning).

Thunderbird 0.2

Thunderbird has to have the most deceiving version numbers of any software I’ve ever used. I avoided version 0.1 for ages because I incorrectly assumed that a 0.1 release shouldn’t be trusted with my email. I’ve just upgraded from 0.1 to the new 0.2 and a good product has got even better—it’s noticably faster and more responsive and they’ve knocked 1.5MB off the size of installer. I love the new direction the Mozilla organisation have been taking with their focus on separate applications; I wonder if we’ll be seeing a spin off of Composer any time soon.

Sept. 8, 2003

Short stories

Cory Doctorow has a new book of short stories coming out, and has released six out of nine of them under a creative commons license following the success of Down and Out in the Magic Kingdom. I just finished reading Craphound and I thoroughly enjoyed it.

Hinting

I’ve always wondered how fonts work. I now have a much better understanding of the technology involved thanks to Microsoft’s excellent Typography site, in particular this Introduction to hinting from 1997 (via ).

“Is Evil..” titles are evil

Too excellent articles on Object Oriented Design: Why extends is evil and Why getter and setter methods are evil. Ignore the inflammatory titles: the subheading of the second article, “Make your code more maintainable by avoiding accessors”, is a much better indication of their content. I picked up some great tips on proper use of OOP from reading them. In particular, the section on CRC cards made something click which hadn’t clicked when I looked at them earlier this year for my ill fated University software project.

Sept. 9, 2003

Andy in the Garden

My friend Andy’s design skills have been recognised by the CSS Zen Garden. Congratulations! He’s number 42.

Sept. 10, 2003

Javascript free rollovers

I’ve talked about image rollovers on this site before, but I’ve never seen a technique I like half as much as Pixy’s Fast rollovers, no preload needed. Like all good techniques, it’s so simple I’m surprised no one has thought of it before. The trick is that a single image is loaded containing the different rollover states, then positioned as the background of a fixed pixel size link element in such a way that only one of the states is shown. The :hover style simply changes the offset of the background, revealing the secondary (or even tertiary) state.

Sept. 11, 2003

PostgreSQL Performance Optimisation

Via the pgsql-performance mailing list, a great guide to Tuning PostgreSQL for performance, accompanied by a huge table of annotated configuration options.

Sept. 12, 2003

Jump!

Retro games, 80s music. Awesome. Via NTK.

Sept. 13, 2003

Prior Art

The most interesting thing to come out of this whole Eolas disaster could well turn out to be Ray Ozzie’s description of how Lotus Notes was demonstrating many of the funamental abilities of today’s browsers, including dynamic application embedding remarkably similar to that covered back in the patent, way back in 1993. The patent was filed in 1994. Prior art? We can only hope.

Screen readers and display: none

I’ve long heard rumours that some screen readers fail to read out text hidden using the CSS display: none property, but I had never really investigated it as I don’t have access to a screen reader myself (I should really download the JAWS trial some day). Bob Easton’s What do screen readers really say? describes the problem and specifies a number of tests for screen reader abilities, the results of which are collated on this Wiki page. As a side note, quickly collecting the results of this kind of test is an excellent way to make use of a Wiki.

[... 331 words]

Listutorial

Russ Weakley has followed up his excellent Listamatic with a useful set of tutorials on styling lists. The style of the tutorial looks worth emulating: each page adds a new property, explaining what it does and showing how it affects the list. I particularly liked the Icon lists demonstration.

Python for teaching mathematics

Kirby Urner provides some great examples of how Python can be used as an aid to understanding mathematics on the marketing-python mailing list. I particularly liked this demonstration of Pascal’s triangle using Python generators:

[... 139 words]

Sept. 14, 2003

Curious emails

There follow two of the weirdest emails I have ever received through my contact form. The first is a fascinating rant against standards compliant client side scripting:

[... 288 words]

Sept. 15, 2003

New content management blog

Ideas in Technology and Publishing is a great new blog covering content management, XML and other publishing related technologies. It’s less than a month old so it’s still possible to read through the archives in full, which I’ve just done and recommend to anyone with an interest in content management.

Don’t delete.me

Paul Sowden is the blogger who inspired me to start my own blog over a year ago. He’s restarted his blog at a new domain: delete.me.uk. Let’s hope the new site doesn’t live up to its name. Oh, and be sure to view source for Paul’s own special brand of minimalist valid HTML 4.

Interactive Python

I adore the Python interactive interpreter. I use it for development (it’s amazing how many bugs you can skip by testing your code line by line in the interactive environment), I use it for calculations, but recently I’ve also found myself using it just as a general tool for answering questions.

[... 983 words]

Sept. 17, 2003

Google conspiracy theories

Microdoc News have a poorly researched story suggesting that Google have been engineering their search results to favour their own properties:

[... 582 words]

“sexeger”[::-1]

Via Ned Batchelder, an article on Reversing Regular Expressions from Perl.com. Otherwise known as Sexeger, these offer a performance boost over normal regular expressions for certain tasks. The basic idea is pretty simple: searching backwards through a string using a regular expression can be a messy business, but by reversing both the string and the expression, running it, then reversing the result far better performance can be achieved (reversing a string is a relatively inexpensive operation). The example code is in Perl, but I couldn’t resist trying it in Python. The challenge is to find the last number occurring in a string.

[... 384 words]

Dive Into Python reborn

Sweet. Mark Pilgrim is working on Dive Into Python again, funded by a dead tree publisher for publication in 2004 (hopefully). The free version will stay available as well. I’ve always preferred reading paper to reading a screen so I’m definitely down for a copy.

[... 135 words]

Sept. 19, 2003

Aaaaarr

It be International Pirate Day, ya scurvy landlubbers. Avast and be merry, for the day is young and the wenches plentiful. And remember t’chat up line of the day: Prepare to be boarded!

[... 45 words]

New virus?

I don’t usually get more than 5 or 6 spams a day, but today I’ve been hammered with an additional 7 emails with executable attachments claiming to be the “latest critical patch” from Microsoft. The emails are HTML formatted and make a pretty convincing rendition of a Microsoft.com page, so I can see how less savvy internet users could easily fall for them. Is this yet another virus outbreak? I make that the third in as many weeks.

2003 » September

MTWTFSS
1234567
891011121314
15161718192021
22232425262728
2930