HTML entities for email addresses: don’t bother
I’ve suspected this for a long time, and now here’s the empirical evidence: Popular Spam Protection Technique Doesn’t Work. If you’re relying on HTML entities to protect your email address from spam harvesters—for example username@example.com—your email address may as well be in plain text. Chip Rosenthal downloaded a tool called “Web Data Extractor v4.0” and tried it on some test data to prove once and for all that the technique doesn’t work.
My advice is to use your common sense when analysing a potential spam protection technique. If you were a spammer, would you be able to outwit the method? Spammers aren’t always very smart, but the people who write spamming tools (and get paid big bucks for them) are. Also remember to think about the payoff—unencoding a bunch of entities is a cheap operation. Embedding a Javascript interpreter to decipher email addresses that are glued together using Javascript at the last possible moment is a lot harder and could slow down a tool, so it may not be worth the effort.
I’m still pretty confident in my own anti-spam harvester technique of hiding my email address behind a POST form, but even that could eventually be outsmarted by a really dedicated harvesting tool.
Graham - 2nd December 2003 04:09 - #
Luke - 2nd December 2003 04:45 - #
Devon - 2nd December 2003 05:46 - #
tim - 2nd December 2003 10:17 - #
Rich - 2nd December 2003 12:23 - #
I always assumed that spambot writers are at least as savvy as I am to these techniques and that trying to keep ahead of them or employing trivial obfuscation like entity encoding was a waste of time. Then I read a study that demonstrated otherwise. Much may have changed in the year or so since that study was run, but on the face of it simple techniques like entity encoding seem surprisingly effective.
Not that I use it.
Sam - 2nd December 2003 16:34 - #
Devon: That's extremely bad for accessibility. The problem with excluding bots that way is that you exclude blind or visually impaired people as well. Also, if you have the facilities for generating an image and checking the response like that, you should have the facilities to use a friendlier approach like simon's email contact form.
Graham: Some people complain that a CC isn't filed in their e-mail system the way a sent message would be. It's not too hard to address this complaint either (assuming you've got a half decent host). Set up an e-mail address that goes to a script instead of anyone, have people e-mail that, and then they get sent out to them an e-mail containing relevant e-mail addresses. The spammers won't benefit from this e-mail address, since they generally fake their address.
Lach - 3rd December 2003 02:03 - #
What is a good net citizen to do ?!
All these solutions are great, but they will fail on the next generation of harvester robots. Current methodologies rely on the fact that robots trawl websites for HTML source pages, which allows systems such as the Hiveware Enkoder to sucessfuly prevent the harvesting of addresses because the page has to be rendered and javascript processed before the address is available.
New robots are now being built however that ( in just one example ) use browsers rendering engines and javascript processors to construct the page as the end user would see and then analyse the resulting DOM data structure. Therefore systems like the Hiveware Enkoder become useless because the robots can now see the processed javascript email address.
It is a shame that there are people out there willing to write software for this purpose. They must have no morals whatsoever !
I still think we need to fight this war at the protocol level. Just my 2 cents.
Paul - 3rd December 2003 16:46 - #
Still, I suspect it is only a matter of time before someone figures out how to crack that ... which is why I too use form-based email contacts on most of the sites I now develop.
Mean Dean - 4th December 2003 10:21 - #
Eric - 5th December 2003 02:21 - #
abhisheksi2005 - 29th October 2004 15:02 - #
Alejandro - 25th September 2005 07:09 - #
Hilary Caws-Elwitt - 8th January 2006 16:48 - #