HTML entities for email addresses: don’t bother
2nd December 2003
I’ve suspected this for a long time, and now here’s the empirical evidence: Popular Spam Protection Technique Doesn’t Work. If you’re relying on HTML entities to protect your email address from spam harvesters—for example username@example.com
—your email address may as well be in plain text. Chip Rosenthal downloaded a tool called “Web Data Extractor v4.0” and tried it on some test data to prove once and for all that the technique doesn’t work.
My advice is to use your common sense when analysing a potential spam protection technique. If you were a spammer, would you be able to outwit the method? Spammers aren’t always very smart, but the people who write spamming tools (and get paid big bucks for them) are. Also remember to think about the payoff—unencoding a bunch of entities is a cheap operation. Embedding a Javascript interpreter to decipher email addresses that are glued together using Javascript at the last possible moment is a lot harder and could slow down a tool, so it may not be worth the effort.
I’m still pretty confident in my own anti-spam harvester technique of hiding my email address behind a POST form, but even that could eventually be outsmarted by a really dedicated harvesting tool.
More recent articles
- Storing times for human events - 27th November 2024
- Ask questions of SQLite databases and CSV/JSON files in your terminal - 25th November 2024
- Weeknotes: asynchronous LLMs, synchronous embeddings, and I kind of started a podcast - 22nd November 2024