Simon Willison’s Weblog

A plan for spam

Paul Graham: A Plan for Spam. Paul suggests using content based filters that learn from users specifically marking messages as spam or legitimate mail. The system then picks emails apart looking for commmon terms (in both the body and the header of the message) that can then be used later on to identify spam messages. He claims his test have let through only 5 per 1000 spams, with 0 false positives. Impressive stuff, and great reading for the excellent explanations of some advanced alogithmic and statistical techniques.

This is A plan for spam by Simon Willison, posted on 16th August 2002.

Next: Why Scott doesn't read your blog

Previous: Fiendish markup quiz

Previously hosted at http://simon.incutio.com/archive/2002/08/16/aPlanForSpam