Command line blacklisting
9th September 2004
Just over a year ago, I started blacklisting domain names from links featured in comment spam. My idea then was that these blacklists could become a shared resource: people would publish their own blacklist and subscribe to those of people they trust, thus making it much harder for spammers to operate. While the sheer volume of spam domains meant that the technique was much less useful than I originally anticipated, I’ve continued to maintain my blacklist ever since as a preventative measure against repeat spammers.
I have a confession to make: all of my blog administration (with the exception of adding entries and blogmarks) is performed using phpMyAdmin. The trouble with writing your own software is that it’s very easy to skimp on the backend tools, since you’re the only person who will ever see them. Incidentally, this is the main reason I plan to switch to WordPress just as soon as I find the inspiration to write the necessary import scripts. Comments are deleted in phpMyAdmin, and domains are blacklisted by manually editing the blacklist.txt file via FTP.
This has been really bugging me, especially since I have so little other use for FTP that my only installed client is an unregistered version of Transmit (closes after ten minutes, won’t save passwords along with account details). I’ve been muddling along with that for longer than I care to admit, but today I decided to take 10 minutes out to solve the problem once and for all. I could have put together a web interface for adding new domains but I wasn’t really in the mood, so I decided to put time spent reading The Art of Unix Programming to good use and knock out a simple command line application.
The result (minus my login details) can be found here. Sample usage: ./blacklist.py www.domain.org www.domain2.com. It follows the Unix ideal of being the simplest-thing-that-could-possibly-work, and ended up taking longer to write than I expected thanks mainly to the craziness of Python’s ftplib. I’ve seen complaints about this before, and it thoroughly deserves its bad reputation.
Here’s one example:
retrlines is the method used to retrieve ascii text from the server. Bizzarely, it doesn’t actually return the text receieved; instead, it expects you to provide it with a callback function that will be fed each line in turn, minus the newline. Sounds like a job for StringIO, but
StringIO objects don’y have a writeline method (required to add the newline back on). I ended up writing my own extension of the
StringIO2 class and adding a writeline method just to preserve the newlines returned from the server!
Strange APIs aside, I’m pretty pleased with the final result. It follows a bunch of Unix design patterns (and skips others such as those related to configuration, but I’m not overly bothered about those) including the following:
- A usage note is displayed if no arguments are provided.
- Multiple domains can be blacklisted at once, by providing them as multiple command line arguments.
- Domains that are already in the blacklist are skipped, and a message is written to standard error.
- If the script suceeds, it doesn’t say anything at all.
It also uses the common Python idiom of wrapping the principle logic in a function and then calling that from a block that runs only if the file is executed directly (the
__name__ == '__main__' idiom) so that other Python code can import the module and reuse its functionality if required.
There’s plenty of room for improvement: being able to pipe a list of domains in via standard input would be nice, and hard coding the (unencrypted) username and password is sloppy (as is expecting the blacklist.txt file to live in the FTP home directory). Even better, with SSH access the whole thing could be replaced with an infinitely more secure one-liner:
echo www.domain-to-ban.org | ssh username@server "cat - >> blacklist.txt". I’m happy though: an irritating task has become much less irritating and I have some example code to fall back on next time I need to get mucky with
More recent articles
- Datasette Enrichments: a new plugin framework for augmenting your data - 1st December 2023
- llamafile is the new best way to run a LLM on your own computer - 29th November 2023
- Prompt injection explained, November 2023 edition - 27th November 2023
- I'm on the Newsroom Robots podcast, with thoughts on the OpenAI board - 25th November 2023
- Weeknotes: DevDay, GitHub Universe, OpenAI chaos - 22nd November 2023
- Deciphering clues in a news article to understand how it was reported - 22nd November 2023
- Exploring GPTs: ChatGPT in a trench coat? - 15th November 2023
- Financial sustainability for open source projects at GitHub Universe - 10th November 2023
- ospeak: a CLI tool for speaking text in the terminal via OpenAI - 7th November 2023
- DALL-E 3, GPT4All, PMTiles, sqlite-migrate, datasette-edit-schema - 30th October 2023