Simon Willison’s Weblog

Subscribe

HTML entities for email addresses: don’t bother

2nd December 2003

I’ve suspected this for a long time, and now here’s the empirical evidence: Popular Spam Protection Technique Doesn’t Work. If you’re relying on HTML entities to protect your email address from spam harvesters—for example username@example.com—your email address may as well be in plain text. Chip Rosenthal downloaded a tool called “Web Data Extractor v4.0” and tried it on some test data to prove once and for all that the technique doesn’t work.

My advice is to use your common sense when analysing a potential spam protection technique. If you were a spammer, would you be able to outwit the method? Spammers aren’t always very smart, but the people who write spamming tools (and get paid big bucks for them) are. Also remember to think about the payoff—unencoding a bunch of entities is a cheap operation. Embedding a Javascript interpreter to decipher email addresses that are glued together using Javascript at the last possible moment is a lot harder and could slow down a tool, so it may not be worth the effort.

I’m still pretty confident in my own anti-spam harvester technique of hiding my email address behind a POST form, but even that could eventually be outsmarted by a really dedicated harvesting tool.

This is HTML entities for email addresses: don’t bother by Simon Willison, posted on 2nd December 2003.

Next: Downloading your hotmail inbox

Previous: Selectutorial

Previously hosted at http://simon.incutio.com/archive/2003/12/02/entitiesDontWork