Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

Command line blacklisting

Just over a year ago, I started blacklisting domain names from links featured in comment spam. My idea then was that these blacklists could become a shared resource: people would publish their own blacklist and subscribe to those of people they trust, thus making it much harder for spammers to operate. While the sheer volume of spam domains meant that the technique was much less useful than I originally anticipated, I’ve continued to maintain my blacklist ever since as a preventative measure against repeat spammers.

I have a confession to make: all of my blog administration (with the exception of adding entries and blogmarks) is performed using phpMyAdmin. The trouble with writing your own software is that it’s very easy to skimp on the backend tools, since you’re the only person who will ever see them. Incidentally, this is the main reason I plan to switch to WordPress just as soon as I find the inspiration to write the necessary import scripts. Comments are deleted in phpMyAdmin, and domains are blacklisted by manually editing the blacklist.txt file via FTP.

This has been really bugging me, especially since I have so little other use for FTP that my only installed client is an unregistered version of Transmit (closes after ten minutes, won’t save passwords along with account details). I’ve been muddling along with that for longer than I care to admit, but today I decided to take 10 minutes out to solve the problem once and for all. I could have put together a web interface for adding new domains but I wasn’t really in the mood, so I decided to put time spent reading The Art of Unix Programming to good use and knock out a simple command line application.

The result (minus my login details) can be found here. Sample usage: ./blacklist.py www.domain.org www.domain2.com. It follows the Unix ideal of being the simplest-thing-that-could-possibly-work, and ended up taking longer to write than I expected thanks mainly to the craziness of Python’s ftplib. I’ve seen complaints about this before, and it thoroughly deserves its bad reputation.

Here’s one example: retrlines is the method used to retrieve ascii text from the server. Bizzarely, it doesn’t actually return the text receieved; instead, it expects you to provide it with a callback function that will be fed each line in turn, minus the newline. Sounds like a job for StringIO, but StringIO objects don’y have a writeline method (required to add the newline back on). I ended up writing my own extension of the StringIO2 class and adding a writeline method just to preserve the newlines returned from the server!

Strange APIs aside, I’m pretty pleased with the final result. It follows a bunch of Unix design patterns (and skips others such as those related to configuration, but I’m not overly bothered about those) including the following:

  1. A usage note is displayed if no arguments are provided.
  2. Multiple domains can be blacklisted at once, by providing them as multiple command line arguments.
  3. Domains that are already in the blacklist are skipped, and a message is written to standard error.
  4. If the script suceeds, it doesn’t say anything at all.

It also uses the common Python idiom of wrapping the principle logic in a function and then calling that from a block that runs only if the file is executed directly (the __name__ == '__main__' idiom) so that other Python code can import the module and reuse its functionality if required.

There’s plenty of room for improvement: being able to pipe a list of domains in via standard input would be nice, and hard coding the (unencrypted) username and password is sloppy (as is expecting the blacklist.txt file to live in the FTP home directory). Even better, with SSH access the whole thing could be replaced with an infinitely more secure one-liner: echo www.domain-to-ban.org | ssh username@server "cat - >> blacklist.txt". I’m happy though: an irritating task has become much less irritating and I have some example code to fall back on next time I need to get mucky with ftplib.

This is Command line blacklisting by Simon Willison, posted on 9th September 2004.

View blog reactions

Next: Browser innovation is alive and well

Previous: The bookmarklet solution to the password problem

10 comments

  1. Good to know WP is in your thoughts :) Rolling your own must be fun, though I can't imagine doing the same, spoilt as I am.

    Carthik - 9th September 2004 06:31 - #

  2. That's all far too much effort :). Just hook it up to procmail and GPG, et voilà - you can add domains to your blacklist via email.

    PS: The preview screws up the à in this comment - it appears fine in the preview, but is altered in the textarea.

    Jim Dabell - 9th September 2004 15:50 - #

  3. You mirrored my own thoughts here exactly. I too crafted my own weblog system, and ego-points aside it was a nightmare to manage. Eventually I created a PHP command line hack: type in PHP code into a textarea, it's eval()'d and echo'd back out. When I started a second weblog in WP, it didn't even feel like weblogging (or at least the weblogging I was used to). As I've heard Matt say: code is easy, UI is hard. ;)

    Stephen - 10th September 2004 01:04 - #

  4. Have you ever thought about using Serendipity (s9y.org) over WP for your migration? IMHO its has far better upside than WP and it seems to be a tighter knitt community where your suggestions and work appear in the product in a short period of time. Some nice spam prevention working seems to be planned for the 0.8 release (0.7 is now in beta)... and the admin interface is fairly good... again IMHO.

    Samual - 10th September 2004 04:19 - #

  5. There's plenty of room for improvement: being able to pipe a list of domains in via standard input would be nice

    xargs blacklist.py
    :)

    Mark IJbema - 10th September 2004 19:55 - #

  6. Nice work Simon. I must admit that I still use FTP now and then and find NCFTP indispensible. It's available via Fink on Mac OSX if you want to try it out.

    Andy Todd - 10th September 2004 20:22 - #

  7. About your blacklist... Blogging tools should support blacklist subscription, much like Trackbacks and Pings work now. I can subscribe through my Blog interface to your blacklist, lets say, that would ping your site once every week, or whatever. That way, I can have my blacklist up to date. This method would make it easier to spread the blacklist as it's done semi-automatically. You think it'd be easy to make something like this? You can call it Spamback.

    Phoat - 11th September 2004 13:40 - #

  8. Blocking out entire domains will only stop legitimate discussion. Is there something like Popfile for comment spam? That would help solve the problem, I think.

    David Hooper - 11th September 2004 23:07 - #

  9. I think it would be kind of "sad" if you replaced your weblog by one of these standardized weblogsystems spread all over the web.

    alain - 13th September 2004 10:18 - #

  10. FYI Simon, Transmit stores passwords unregistered. Perhaps it no longer does that after 1 month. I'm almost at that point with it so we'll see.

    jonahfish - 16th September 2004 13:37 - #

Comments are closed.

Previously hosted at http://simon.incutio.com/archive/2004/09/09/commandline

A django site