Battling comment spam
It’s a sad state of affairs when you come back to your blog after a week elsewhere and have to add another 56 domains to your blacklist. I’m actually getting more comment spam than legitimate comments now—this is becoming more than just a minor nuisance. I’m considering a number of improvements, including adding a moderation queue to comments on entries posted more than a month ago, disabling the comment form if the referral is a search engine (as per Russell Beattie’s suggestion) and adding some kind of wildcard support to the blacklist file.
I’d really rather not do any of this, but the problem looks like it’s going to escalate.
Lars - 30th September 2003 14:16 - #
http://jeremy.zawodny.com/blog/archives/000984.htm l
Danny Shepherd - 30th September 2003 14:31 - #
Sam - 30th September 2003 14:40 - #
Simon Willison - 30th September 2003 14:50 - #
Travholt - 30th September 2003 14:50 - #
Simon Willison - 30th September 2003 14:56 - #
Thorn Vandevelde - 30th September 2003 16:15 - #
Sam - 30th September 2003 16:39 - #
Bill Brandon - 30th September 2003 17:39 - #
Yoz - 30th September 2003 18:28 - #
markku - 30th September 2003 19:32 - #
What about a transparent Turing Test, one the author doesn't even realize they are completing? A lot of this is inspired directly from Yoz's comment...
First, dynamically switch the email and URI positions. I would bet that no-one would have any problems with this, and it would diversify your individual comment pages a bit leading to greater security. If a URI is entered for the email, flag them (don't display and mark possibly-blacklist until review).
Another idea would be to replace the input name/id attributes with codewords instead of the standard. You could even make them randomly encoded name/id attributes and have an algorithm decode them. If someone contiues to submit the old form, blacklist them.
And finally, create a set of hidden fields whose name/id attributes match the system you have now, but with jargon values. If these values change, blacklist them.
My whole point with this was that diversity is key. Using this system over a large number of sites would raise the bar for spambots, as they would be unable to figure out what was going on. But I have no idea what to do with people physically entering these values. :\
Stephen - 30th September 2003 21:49 - #
(Bah - that's the second time I've managed to completely ignore Simon's comment-posting instructions and have my separate paragraphs gracelessly smooshed into one. Just to explain - my previous comment explains two separate schemes for blocking spammers, one targeted at robots and the other at humans)
Stephen nails it when he says that diversity is key in fooling robots. That said, I disagree with the methods he suggests, as most robots are already good enough to parse HTML forms and work out the required fields and values. I say that based on Shelley's experiences. But I still think it's far easier to outwit a spambot than to write one that can spam any blog.
But how many of these spammers are actually bots? I didn't think that manual spamming would be worth it but Simon disagrees, and I can see his reasoning. To fight those, you need heavier artillery like moderation queues, blacklists, content analysis etc. I'm still biased against heavily complex solutions - I think we're better off being agile in our solutions rather than investing large amounts of time and code into big (and usually centralised) single projects. I like moderation queues because they can be quick to deal with if done right, there are ways of minimising interference with genuine posters (as I previously suggested) but more importantly they should dissuade the persistent manual spammer after his first couple of posts never make it live.
But since we're dealing with a human spammer, how about this idea: Ask him not to do it. No, really. If I were browsing around blogs and having to manually spam each one, I'd want to minimise wasted time as much as possible.
I'm thinking mainly of this Slashdot post. The comment form equivalent would be default text in the comment body TEXTAREA saying something like
You can go one better by removing the Google incentive and stopping Google from indexing comments on blog entries, also including notification of this in the warning text.
Yoz - 1st October 2003 02:00 - #
Addendum: I should make it clear that I don't think that merely asking spammers to stop is necessarily going to work. However, visibly removing the incentive to spend time spamming your blog may help, plus (as you can probably tell) I'm all in favour of scattergun approaches to stopping spam if those approaches are quick and easy to try.
Yoz - 1st October 2003 02:16 - #
I used to use a "shoutbox" as well as allow comments on posts. I considered the shoutbox to be more adhoc/freeform and it would regularly get spammed. My comments on the other hand never (I think) got spammed. The spammers seemed to like the shoutbox for some reason and at least they're all in one place.
Sometimes I'd alter the spam url slightly for fun. Hell, if someone wants cracked software they're going to have to work for it!
I decided to experiment when I last redesigned my site and decided to turn off all comments and got rid of the shoutbox thinking people will just email me. Hehe was that ever a mistake. Site/visitor interaction is probably in minus numbers.
pete - 1st October 2003 08:41 - #
This is only a thought, and probably would not work, but it just flashed before my eyes before rapidly vanishing again. If you had some sort of "peer support group", where people who you trust could have certain rights, like deleting comments for you, or at least removing them for your review. You must have enough people regularly coming through here to make this work, in theory.
Having said that, it's completely unviable for sites with even moderate traffic, as the number of times "trusted visitors" came through would most likely not be very high.
Without thinking about it, I can't see an easy way of implementing it. Having thought about it, I realise how silly the idea sounds. But I'll post it anyway, if not purely to waste some of my time and yours.
Andrew - 1st October 2003 09:49 - #
Actually, I really like the idea Andrew - but I think the most practical thing to do such a thing is to have a RSS feed for blog comments and subscribe to that in your favorite aggregator. Then you would have it refresh every half hour, perhaps every hour, and you would quickly see when someone had spammed the comments, and you could go in and just click on a link that said 'remove for moderation'.
If say.. 10-15 people who were blogging did this for each other, they would quickly be able to get rid of the comment spam.
Eivind - 1st October 2003 10:46 - #
I love that idea as well. To completely eliminate comment spam you need to be sitting at your computer 24 hours a day deleting spam comments as soon as they come in. This is obviously inpractical as even the most hard core of geeks need to sleep, eat and occasionally interact with the outside world. However, given a group of a dozen or so blgogers from around the world you can be pretty sure at least one of them will be at their computer at any one time. By helping each other out, that group of bloggers could achieve 24 hour surveillance against comment spam with very little individual effort.
The idea could probably be adapted to help fight other forms of abuse as well. In fact, it's already used on sites such as Wikipedia where a large community monitors the "recent changes" page and combats any negative activity almost as soon as it appears.
Simon Willison - 1st October 2003 10:56 - #
What about some method of voting? I mean generally most of the people who visit your site, or at least most of the people who would find your site are going to be 'more mature', educated. Less likely to (and I really would love to have another word to use) spaz out and cause trouble with the system.
Some sort of voting system where if a post is out of place simply mark it down. Once enough people have voted it down, it goes to review. This may start to lean towards the solutions that are just too big a project for the amount of effort it takes to watch out for rogue spam-comments.
Do we even have a techie name for spam comments? Spamments!
Andrew Donaldson - 1st October 2003 12:15 - #
I just found this article on killing comment spam via Jeremy Zadwodny's blog - I give you Killing Comment Spam for Dummies. It does seem to be aimed sqaurely at MT users though.
sam - 1st October 2003 12:40 - #
What about using bayesian filtering?
You could probably hook up an open-source email spam filter pretty easily to examine good and bad comment buckets (files?).
You'd have to train it, but as much spam as you're currently dealing with, that should take little time.
If a comment falls in the spam bucket, then... there's room for other people to make suggestions. ;)
I'd say, respond with a message that apologetically explains that the link text "click here" looks spammy to you, and could the commenter please choose something else?
Or, instead of asking them to change text, you could ask them to enter a nonce at that point. The vast majority of legit comments would never see this secondary screen.
Jeremy Dunck - 1st October 2003 22:49 - #
Stephen - 2nd October 2003 11:02 - #
But the peer-review filtering doesn't scale.
If every blogger had a similar system, then most readers would be asked to filter multiple blogs-- and since we're an incestuous crew, most readers would also have their own blogs to be filtered.
It's a club solution, not a lojack solution. ;-)
Jeremy Dunck - 2nd October 2003 14:50 - #
Simon Willison - 2nd October 2003 18:23 - #
Simon,
OK, you're talking about a cooperative. I was thinking along the lines of random visits from admin-esque samaritans. Yes, I think that would work.
Hmm. What don't you like about the bayesian approach? Less work, after implementing. ;)
Jeremy Dunck - 3rd October 2003 14:52 - #
Murphy - 16th February 2004 15:50 - #
windflash - 14th June 2004 14:31 - #