Simon Willison’s Weblog


Friday, 16th August 2002

A plan for spam

Paul Graham: A Plan for Spam. Paul suggests using content based filters that learn from users specifically marking messages as spam or legitimate mail. The system then picks emails apart looking for commmon terms (in both the body and the header of the message) that can then be used later on to identify spam messages. He claims his test have let through only 5 per 1000 spams, with 0 false positives. Impressive stuff, and great reading for the excellent explanations of some advanced alogithmic and statistical techniques.

[... 89 words]

Fiendish markup quiz

Hixie has posed a fiendish markup quiz—spot the four markup errors in a document that validates. It’s harder than it sounds. I’ve mailed off my answers, but I’m not expecting to get full marks.

[... 43 words]

Comments improvement

I’ve improved the comment system at the bequest of Adrian Holovaty. URLs posted in a comment (both those beginning with http:// and those beginning just with www.) will now be converted in to links.

[... 34 words]

Magic quotes solution

Pink Goblin (otherwise known as HarryF) explains why magic quotes are evil. This is an issue that every PHP developer should be aware of, as it can cause all kinds of problems in your scripts if you ignore it. He suggests using a custom myAddSlashes() function which only calls addslashes() if magic quotes are turned off. I have an alternative solution—chose your preferred setting (quotes on or off) and apply it at run time to all incoming data in one go. My code for doing this is available here. By a bizzare coincidence I wrote the script this morning, then spotted a link to the Pink Goblin article on tidak ada literally five minutes after finishing it.

[... 130 words]

New memes make Baby Jesus cry

Things that make Baby Jesus cry (stolen from Mark Pilgrim). Google as social commentary?

[... 21 words]

Python RSS locator

Mark Pilgrim has written an ultra-liberal RSS locator (in Python, naturally). I guess he had to scratch an itch. The amount of work it puts in to locating an RSS feed for a site is astonishing, especially when you consider how short the actual code is.

[... 50 words]

Zeldman played by a stand up comic

Eric Meyer has confessed.

[... 7 words]

css-discuss rocks

css-discuss has seen some interesting threads in the past 24 hours and the new archive means I can link straight to them—so here goes. Kentaro Kaji kicked off the topic of techniques for aligning an image with the bottom of a block of text. In the same thread, Benn Nunn advocated avoiding width and height attributes on images and keeping that information in an external style sheet. Other topics included accessible navigation and a tricky absolute positioning problem with Opera. The most informative mailing list I’m currently subscribed to just keeps getting better.

[... 122 words]

Today’s required reading

10 Tips on Writing the Living Web is full of invaluable advice for anyone who wants their weblog to be of interest to other people.

[... 26 words]

2002 » August