Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

Blacklisting Comment Spam

I’m fed up with comment spam. From now on, any comment I judge to be spam will be deleted, and the domains linked to from that comment will be blacklisted. Any future comments that contain links to those domains will be refused. My blacklist will be made available as a simple text file, one domain per line, at blacklist.txt. You are welcome to grab a copy of that file once every 24 hours and use it as part of your own comment spam prevention system. I will manually approve all domains that are added to it to ensure only domains of a dubious nature end up blacklisted.

If you start using a similar system, drop me a line and I will start using your blacklist as well (note that I will not merge it with my own public list). If I find you have been blacklisting innocent domains I will cancel my subscription to your blacklist. In this way, I hope to build a decentralised web of trust whereby other people’s recommendations help my system combat spam better.

This is Blacklisting Comment Spam by Simon Willison, posted on 2nd September 2003.

View blog reactions

Next: Listamatic

Previous: Show less errors

31 comments

  1. nice one.

    Oh BTW, here's your Python blacklist script.

    dee - 2nd September 2003 20:04 - #

  2. All the penis enlargement sites I could ever need. What a great resource! Cough.

    Tom Gilder - 2nd September 2003 20:16 - #

  3. I hadn't thought about that. My blacklist file will probably be the top search results on Google for penis enlargement within a week :/

    Simon Willison - 2nd September 2003 20:19 - #

  4. Just stick it in a directory and tell the robots not to go in there via. a robots.txt.

    Tim Fountain - 2nd September 2003 20:47 - #

  5. The WordPress Development Blog was hit by this same comment spammer, earlier (cleaned up, since). We'll probably end up implementing a built-in admin interface for blacklists in a future release....

    Dougal Campbell - 2nd September 2003 21:47 - #

  6. Something I'd like to experiment with is using SpamBayes to filter comments. Taking it further, the same technique might be useful for automatic filtering spam and graffiti from wiki's.

    Van Gale - 3rd September 2003 03:50 - #

  7. Well, this is a new one on me - and I don't mean the comment spam thing, but the way that the Google results are displayed - with just the title attribute and, more importantly, a rank number - this is fantastic. Simon - how did you get it to display like this? I know that I can hack the URL but I'd love to know how you would normally do this directly from Google - advanced options presumably?

    Ian Lloyd - 3rd September 2003 08:55 - #

  8. Thank goodness its not just me. My blog gets about 3 readers a day from what I can tell (and I'm not sure it doesn't also include me in that figure) and so I always keep a close eye on any comments. So far spam comments have matched real comments on about a 1-to-1 ratio which is kind of annoying. Thinking about this commentor authentication is going to have to be the way to go, or perhaps even moderated comments (although I dislike this idea as it greatly breaks up the flow of any free discussion, which as you may of guessed happens rarely at my blog :-) ).

    Sam Newman - 3rd September 2003 09:48 - #

  9. Make that 4 Sam - I just visited following your tale of woe ;)

    Ian Lloyd - 3rd September 2003 13:31 - #

  10. At last! Feeling sorry for myself has finally paid off :-)

    Sam Newman - 3rd September 2003 14:37 - #

  11. I set one up too. Let's get this Web of Trust started.

    Adrian Holovaty - 3rd September 2003 17:04 - #

  12. Why a flat file?

    Why not do the RBL-type thing and publish these as a DNS Zone? If an A record query returns "127.0.0.2", then that IP address is banned.

    (You can find implementation details lots of places, including here.)

    Using DNS is surely a lot more light-weight than shipping a flat-file around (or even than using HTTP to query a database).

    Jacques Distler - 4th September 2003 04:52 - #

  13. I recently added comments to my blog system, and was wondering if my DHTML "add a comment" popup window would be accessible to these.. "comment spam bots"? Theoretically, and how I imagine it works, is they'd scan the page (minus styles) for any form elements, so my initial thoughts are that I'm no less exposed than the guy next door.

    coda - 4th September 2003 10:40 - #

  14. I think someone should host a web services app that manages such a blacklist. Would make it very easy for other users to check sites. That is a perfect example of what web services is for. http://www.johnsjottings.com/archives/2003/09/04/c omment_spam_solutions.html

    john - 4th September 2003 13:51 - #

  15. Assuming these are bots, couldn't Apache users just use an htaccess file to block POST requests from certain user-agents (like libwww-perl)?

    Joe Grossberg - 6th September 2003 16:00 - #

  16. It could work, but only if the comment spammers are dumb enough to be using the default user-agent settings on their bots instead of modifying them to look like a real browser.

    Simon Willison - 6th September 2003 16:09 - #

  17. Yep, the moron who's been spamming me (for some zipcode thing!) uses libwww-perl ... for now, they're stupid enough for that to suffice.

    Joe Grossberg - 7th September 2003 06:30 - #

  18. I think the problem of comment spam is a much easier one to solve than the email equivalent. While this blacklist system is certainly a good one, it's also quite complex and requires continual effort. Here are some simpler steps that MT users can take.

    Yoz - 9th September 2003 16:08 - #

  19. Hey Simon, nice to know someone else has been thinking about these things as well.

    Here is my comment URL blacklist. Up to the minute and replete with spammy goodness. We should really think about making this as standard as a robots.txt or foaf.rdf file... Maybe put it in meta tags for autodiscovery as well...

    Oh, also, I just created a non-hack MT solution for keeping the blacklisted comments from ever showing up on the site: Killing Comment Spam Dead. Would love to have your input. I am very seriously considering starting a site devoted to the comment spam clearinghouse idea as I mention on that entry. Contact me if you want to collaborate.

    Jay - 28th September 2003 00:29 - #

  20. I was contemplating writing an automatic system (for people without the technical knowhow to make one themselves). Would you have any objections to me including the above?

    Ben - 13th October 2003 10:28 - #

  21. I've started a SOAP service to deal with this. See this blog entry for the announce and last night a posted this entry about the first method being completed.
    Thanks for the blacklist file, that list formed the base entries in the service.

    A.Sleep - 12th November 2003 12:58 - #

  22. I think this might work for you. I have not yet implemented it, but it seems like a possible solution. Possibly I can send you a full report (as i will be writing one for our site director anyway) in a week or so. http://www.jayallen.org/journey/2003/09/killing_co mment_spam_for_dummies

    jerry - 25th December 2003 03:38 - #

  23. I've done some work on keyword filtering for blosxom in the same style as MT-Blacklist. You can find it at the url given above. I'd like to see something of a DNS-based approach too. The problem seems to be that it's not just foobar.dom that gets used as the spammer's url but also baz.foobar.dom and quix.foobar.dom. Anyway, using keyword based filtering seems to be effective...

    Doug Alcorn - 3rd February 2004 18:42 - #

  24. I work for an ISP and this has made me think that it would be great to be able to supply a couple of alternative DNS servers for customers to use. These alternative DNS servers would basically resolve anything in the blacklisted domains to localhost, thus preventing their annoying ads from being downloaded. The list would have to be regularly downloaded by each ISP and then used to rebuild the DNS config but that's not hard. Thoughts?

    Kingsley Tart - 28th February 2004 14:29 - #

  25. Hello Everyone! Just want to drop a line to say hi! I really enjoy reading your website! Thanx! Catch you all later!

    hgh - 4th May 2004 22:52 - #

  26. ^ Guess you better add that guy to the list.

    A. Visitor - 3rd July 2004 06:09 - #

  27. I came across your site as I was attempting to search for information on a program called (SpamCop). I have three friends that I email on a regular basis, whose mail keeps coming back with the message, " your IP blocked by SpamCop" I have contacted my IP, my friends IP , and they both tell me that it is not their problem. I have attempted on four different ocassions to contact SpammCop with absolutely no results. Is anyone in this forum familiar with this type of problem? Thanks.

    Bill Anness - 11th March 2005 20:08 - #

  28. I've done some work on keyword filtering for blosxom in the same style as MT-Blacklist. You can find it at the url given above. I'd like to see something of a DNS-based approach too. The problem seems to be that it's not just foobar.dom that gets used as the spammer's url but also baz.foobar.dom and quix.foobar.dom. Anyway, using keyword based filtering seems to be effective...

    Rosa - 16th May 2005 16:01 - #

  29. You can also use modsecurity with apache to block comment spam (and attacks too!) and there are some huge well maintained lists out there for it. This technique also works for other software packages, and not just MT, so you can use modsecurity to protect forums, guestbooks, other blogging software, you name it. You can find one website (which has the biggest blacklist I've found) here: http://www.gotroot.com/mod_security+rules

    Linux guru - 25th December 2005 17:51 - #

  30. Use tag "nofolow" and ok ;) I read about this on waregate.com

    WaLes - 12th January 2006 21:00 - #

  31. 2xll !! - nofollow. Better is:
    a="http:\\www.external_domain.com" rel='external nofollow' 
    

    kibol - 1st February 2006 14:37 - #

Comments are closed.

Previously hosted at http://simon.incutio.com/archive/2003/09/02/blacklisting

A django site