Punishing Bad Behavior
June 20th, 2005 by Michael Hampton
It’s been two months now since I started the Bad Behavior project. I’m stopping for a moment to take a look back to see how far it’s come, and to glance at the journey ahead.
In case you somehow don’t know what I’m talking about, let me fill you in. Bad Behavior is PHP-based software which blocks automated link spam. And link spam is the growing problem of spammers taking advantage of blogs, wikis, forums, guestbooks, CMS, and similar software to post spam. Link spam has been a serious problem for a couple of years, and many people have tackled it with varying degrees of success.
On the evening of 31 December 2004, I suffered what every blogger experiences eventually: my first comment spam attack. A spammer using automated software and open proxy servers sent 764 spam comments to my site, only half of which were caught by WordPress. The other half were scattered all over my site. After deleting all of the junk, I responded by writing some code, and thus was born the WordPress SpamAssassin plugin, which filters blog comments through SpamAssassin. It actually proved to be useful at stopping a lot of spam, but wasn’t able to catch all of it. Throughout the life of wp-spamassassin, the main thing I learned is that email spam and blog spam are two quite different creatures. I finally wound up having WordPress moderate all first-time commenters, and gave up further development of wp-spamassassin around mid-March, recognizing it as not quite appropriate to the task.
At that point I began using the Spam Karma 2 WordPress plugin. It proved quite effective at keeping spam off my site, but it has two serious drawbacks: first, the spam is still there in your database and if you get a lot of spam you have to spend a lot of time managing it, and second, it invariably would catch legitimate comments and mark them as spam, making the spam management problem far worse than it would otherwise be.
I want to spend my time blogging. I don’t want to spend my time scouring through 1,000 or more spams a day for the three comments that were thrown in by mistake. But there simply was no other solution I could live with. They either required a captcha, which isn’t accessible to some people with disabilities, or required JavaScript, which many people turn off and also isn’t accessible to some people with disabilities, or had too high a false positive rate, or too high a false negative rate, or…
At some point I realized there was an approach it seemed no one had tried yet. I did several Google searches looking for any evidence that anyone had tried this approach with any software on any blog, forum, wiki or CMS, and came up empty. I began coding, and about a week later I put out the first release candidate of Bad Behavior.
The premise behind Bad Behavior is not to analyze the comment, but to analyze the visitor. The idea I had had was that if the spam is automated, the spambot software must be distinguishable from actual people reading your site. But spambots typically fake the User-Agent and Referer fields in the HTTP request. What else is there to work with? As it turns out, there are quite a few other fields in the HTTP header that can be analyzed, if only you know what they are and how to get at them. And it turns out that spambots do have a fingerprint that allows them to be distinguished from the Web browsers they pretend to be.
I designed Bad Behavior to be fast and portable to PHP-based software other than WordPress and to err on the side of caution, allowing a user through if there is doubt as to whether it is a spambot, so as to minimize false positives. Accordingly, Bad Behavior became wildly successful in a very short time, even beyond my initial expectations. So many people have downloaded it and used it that I can’t even count them all in any reasonable manner. People are even writing plugins for it and porting it to other platforms. It now runs on WordPress, MediaWiki and Geeklog, and I’ve received reports of people using it on Drupal, ExpressionEngine and custom PHP-based sites in its generic mode.
It hasn’t been all sweetness and light, though. I’ve had days where I had to release twice to fix some stupid error I should have caught the first time round. I’ve had Microsoft do things which caused their search engine bot to get blocked. I’ve seen a sharp increase in spam directed here, both of the blog variety and the email variety.
But I also get to see new link spammer techniques as they develop, because they seem to want to test them here. This gives me a window of opportunity in the event that something new needs to be added to Bad Behavior, or something needs to be changed. Surprisingly, spambot software is not getting much better overall. While the spammers are beginning to adapt to Bad Behavior, they still have serious weaknesses in their delivery methods that I am able to take advantage of to keep them blocked out. For now, I’m far ahead of the spammers. They have a lot of catching up to do, but updating spambot software takes time and costs money, and most spammers won’t bother, since (unfortunately) there are still far too many sites out there without adequate protection, such as Bad Behavior.
The sophisticated link spammer technique in common use now is to use some sort of script to harvest comment forms from a group of sites, then to fill in the fields appropriately, and a few hours or days later, to use a network of open proxy servers to relay the spam comments to thousands — or hundreds of thousands — of sites which use the same type of software. Repeatedly.
As spambot software continues to improve, I am seeing more instances of spambot software which closely matches the fingerprint of legitimate user agents such as Internet Explorer or Firefox. Bad Behavior must continue to improve to analyze the delivery method these spammers use, and the next step is to analyze the open proxy server. Accordingly the Bad Behavior Blackhole is a spin-off project which intends to do just that. Like Bad Behavior, it is designed to cause minimal or no inconvenience to actual humans by providing a fully automated, immediate removal process — but only for humans. (At the time of this writing, manual removal is implemented, and automated removal is in testing.) A lot of work has been done on the open proxy server problem already, and Bad Behavior Blackhole will build on this. When it is ready, Bad Behavior Blackhole will be integrated into Bad Behavior.
There are other ideas on the drawing board as well, but I don’t want to give the spammers (who actually do read my blog, but can’t seem to leave a comment) too much of a clue where I am going or how I will shut them down next. Like them, you will have to stay tuned.
I hope you found this little essay interesting, and if you haven’t installed Bad Behavior yet, what are you waiting for?
steve caturan Says
using php.ini, i use BB to prepend it for all of my hosted sites. i haven’t had a single complaint since i implemented the scheme along with mod_security. looking forward to future updates. keep up the good work!
Jun 20th, 2005 at 9:58 pm
Ajay Says
Need I even mention how much I love your plugin that I wrote the plugin to display the stats
Thanks for giving the community such an amazing plugin. I know I have really saved a lot of cleanup time and dedicated it to learning and blogging after I used Bad Behavior
If I can help in anyway, don’t hesitate to ask
Jun 21st, 2005 at 1:08 am
Cyrris Says
Indeed. I installed Bad Behaviour just yesterday on my own blog, and even after 24 hours I’m already pleased with the results.
You do awesome work, and the best part is that we all benefit so much from it – and for free! I’m impressed. Very impressed.
Jun 20th, 2005 at 11:55 pm
Michael Hampton Says
Just spread the word far and wide.
Jun 21st, 2005 at 1:44 am
Jeff Says
Thanks for bringing us BB. I’ve been looking for the ‘ideal’ combination of blog anti-spam measures, and BB seems to be high in the running.
I’m glad you have more ideas in the works… I’m concerned that at some point the spambots may be indistinguishable from regular browsers, and that blocking open proxys may limit a percentage of legitimate users (as Captchas or JavaScript do).
I’m considering writing a SpamAssassin plugin for Nucleus which would make comment handling as easy as the junk mail handling in Thunderbird.
I’m curious about your experience with SA – did you find that it doesn’t process spam comments (made into mail format) as well as regular mail?
Jun 21st, 2005 at 8:14 am
skippy Says
Great work. I was initially skeptical, as to the efficacy and the responsiveness, of Bad Behaviour. I’m glad I installed it though: I see nothing to complain about, and a lot to praise.
Thanks!
Jun 21st, 2005 at 10:00 am
Amy Stephen Says
Michael -
Your tool is *amazing*! It is literally saving my website from comment spam attacks. I appreciate very much you freely sharing your work.
Now, I have a question for you – is it possible to override by IP address the blocking? I have a couple of friends who have been blocked. I had to remove the WP plug-in while one posted and I don’t want to let my guard down.
Maybe this is too specialized of a need — most wouldn’t know how to use it — for me, though, I sure would use such a feature. But, I love it just the way it is, too, and I thank you sincerely for sharing.
Amy
Dec 1st, 2006 at 6:41 am
Michael Curry Says
Thanks for the awesome work! Several of my Drupal-based sites have been under recent attack, and I’ve been able to block them using a variety of techniques; now it’s time to adopt Bad Behavior and prepare for the next wave. I’ll let you know how it performs.
I think that behavior-based monitoring is definitely the way to go. My sites have shown an interesting pattern leading up to spammer attacks, so I’m optimistic that these techniques can help us stay one step ahead of the spammers.
Cheers!
-Mike
Dec 26th, 2006 at 12:42 am