Social whitelisting with OpenID
22nd January 2007
A key feature of OpenID is that it provides a globally unique identifier for every user, no matter what site or service they are using on the Web.
This gives us a powerful tool to fight comment spam. If someone has logged in with an OpenID and we are confident that they are not a spammer (remember, spammers can create OpenIDs too) we can add them to a whitelist, allowing their comments to skip any moderation step or spam guard that we might have in place.
This weblog has a comment spam detection system based on simple heuristics. Comments are assigned a score; if the score exceeds a certain level the comment is placed in a queue for moderation. As of today, one of the heuristics is “does the comment author have an OpenID that is on the whitelist”. I’ve populated my whitelist with the OpenIDs of people who have posted two or more useful comments and do not appear to be using an anonymous provider. I’ll be adding to it regularly in the future.
Here comes the social part: I’m sharing my whitelist. If you run your own OpenID-enabled weblog you are welcome to include my whitelist in your comment spam heuristics. If you publish your own whitelist, I will happily do the same.
Social whitelisting benefits from being de-centralised, just like OpenID. If I find that you have whitelisted a spammer, I can unsubscribe from your whitelist. There’s no central authority or point of failure.
Long-time readers may be feeling a strong sense of deja-vu. Way back in September 2003, I proposed shared comment blacklists as a solution to weblog comment spam. The idea was simple: every time you delete a spam comment, you add the link it was advertising to a public blacklist. Other blogs could then subscribe to your blacklist and block any new comments advertising the same site.
The blacklisting idea was flawed from the very start. It was a classic example of Marcus J. Ranum’s number one dumbest idea in computer security: Default Permit. Spam blacklists assume that if we don’t know a link is bad, it’s good. Spammers can create new bad links far faster than we can blacklist them.
Here’s Ranum’s suggested alternative:
The opposite of “Default Permit” is “Default Deny” and it is a really good idea. It takes dedication, thought, and understanding to implement a “Default Deny” policy, which is why it is so seldom done. It’s not that much harder to do than “Default Permit” but you’ll sleep much better at night.
Social whitelisting uses Default Deny. As such, I believe it has a much higher chance of making a useful impact on the comment spam problem.
Update: I should have mentioned that this idea developed over a number of discussions with Tom Coates, which totally slipped my mind when I was writing it up at 3am.
More recent articles
- Weeknotes: Embeddings, more embeddings and Datasette Cloud - 17th September 2023
- Build an image search engine with llm-clip, chat with models with llm chat - 12th September 2023
- LLM now provides tools for working with embeddings - 4th September 2023
- Datasette 1.0a4 and 1.0a5, plus weeknotes - 30th August 2023
- Making Large Language Models work for you - 27th August 2023
- Datasette Cloud, Datasette 1.0a3, llm-mlc and more - 16th August 2023
- How I make annotated presentations - 6th August 2023
- Weeknotes: Plugins for LLM, sqlite-utils and Datasette - 5th August 2023
- Catching up on the weird world of LLMs - 3rd August 2023
- Run Llama 2 on your own Mac using LLM and Homebrew - 1st August 2023