Simon Willison’s Weblog

Command line blacklisting

Just over a year ago, I started blacklisting domain names from links featured in comment spam. My idea then was that these blacklists could become a shared resource: people would publish their own blacklist and subscribe to those of people they trust, thus making it much harder for spammers to operate. While the sheer volume of spam domains meant that the technique was much less useful than I originally anticipated, I’ve continued to maintain my blacklist ever since as a preventative measure against repeat spammers.

I have a confession to make: all of my blog administration (with the exception of adding entries and blogmarks) is performed using phpMyAdmin. The trouble with writing your own software is that it’s very easy to skimp on the backend tools, since you’re the only person who will ever see them. Incidentally, this is the main reason I plan to switch to WordPress just as soon as I find the inspiration to write the necessary import scripts. Comments are deleted in phpMyAdmin, and domains are blacklisted by manually editing the blacklist.txt file via FTP.

This has been really bugging me, especially since I have so little other use for FTP that my only installed client is an unregistered version of Transmit (closes after ten minutes, won’t save passwords along with account details). I’ve been muddling along with that for longer than I care to admit, but today I decided to take 10 minutes out to solve the problem once and for all. I could have put together a web interface for adding new domains but I wasn’t really in the mood, so I decided to put time spent reading The Art of Unix Programming to good use and knock out a simple command line application.

The result (minus my login details) can be found here. Sample usage: ./blacklist.py www.domain.org www.domain2.com. It follows the Unix ideal of being the simplest-thing-that-could-possibly-work, and ended up taking longer to write than I expected thanks mainly to the craziness of Python’s ftplib. I’ve seen complaints about this before, and it thoroughly deserves its bad reputation.

Here’s one example: retrlines is the method used to retrieve ascii text from the server. Bizzarely, it doesn’t actually return the text receieved; instead, it expects you to provide it with a callback function that will be fed each line in turn, minus the newline. Sounds like a job for StringIO, but StringIO objects don’y have a writeline method (required to add the newline back on). I ended up writing my own extension of the StringIO2 class and adding a writeline method just to preserve the newlines returned from the server!

Strange APIs aside, I’m pretty pleased with the final result. It follows a bunch of Unix design patterns (and skips others such as those related to configuration, but I’m not overly bothered about those) including the following:

  1. A usage note is displayed if no arguments are provided.
  2. Multiple domains can be blacklisted at once, by providing them as multiple command line arguments.
  3. Domains that are already in the blacklist are skipped, and a message is written to standard error.
  4. If the script suceeds, it doesn’t say anything at all.

It also uses the common Python idiom of wrapping the principle logic in a function and then calling that from a block that runs only if the file is executed directly (the __name__ == '__main__' idiom) so that other Python code can import the module and reuse its functionality if required.

There’s plenty of room for improvement: being able to pipe a list of domains in via standard input would be nice, and hard coding the (unencrypted) username and password is sloppy (as is expecting the blacklist.txt file to live in the FTP home directory). Even better, with SSH access the whole thing could be replaced with an infinitely more secure one-liner: echo www.domain-to-ban.org | ssh [email protected] "cat - >> blacklist.txt". I’m happy though: an irritating task has become much less irritating and I have some example code to fall back on next time I need to get mucky with ftplib.

This is Command line blacklisting by Simon Willison, posted on 9th September 2004.

Next: Browser innovation is alive and well

Previous: The bookmarklet solution to the password problem

Previously hosted at http://simon.incutio.com/archive/2004/09/09/commandline