How Bad Behavior handles false positives

December 20th, 2008 by Michael Hampton

I’m seeing increased chatter lately about Bad Behavior and concerns about so-called false positives, where a user whose comment the site owner wants is prevented from commenting. Since Bad Behavior handles this issue in a completely different manner than other solutions, I think it’s time it was addressed in detail.

Obviously no anti-spam solution is going to last long if it prevents legitimate comments, yet all of them do have false positives, to some extent, or else they are almost entirely ineffective. Every email user has had the experience of seeing expected email messages in the Junk folder, and every WordPress user has plowed through the hundreds of spams caught by Akismet or Defensio to find the one or two legitimate comments they somehow flagged. It happens.

The reason it happens is that spammers are trying to get their spam past our defenses, with varying degrees of success, by making it appear legitimate, or pulling various tricks to try to confuse the spam filters. The converse of this is that every so often, a legitimate message will look like spam. It has become, unfortunately, an arms race between spammers and the anti-spammers, with you, the hapless user, caught in the middle.

This is just one reason that Bad Behavior doesn’t look at the content of a message when trying to decide whether it is spam. Instead, I recognize that spammers attempt to trick and confuse, and so Bad Behavior looks at the metadata which accompanies each HTTP request, the headers, IP address, etc. Each web browser has certain identifying characteristics which can be spotted in the HTTP headers, and spammers, whose bots pretend to be actual web browsers, often don’t get all of these characteristics right. Or they relay their spam through open proxy servers or botnets which leave their own identifying characteristics on the HTTP request. Similar characteristics apply for legitimate bots, such as search engines.

By comparing the HTTP request to what is expected of a legitimate request, Bad Behavior can not only block spammers without analyzing the message content, it can in many cases block the spammers’ robots from scraping the content of your site, or block malicious attacks against your application software (e.g. WordPress) — even if the attack is previously unknown.

Problems arise, though, when theory meets reality. During Bad Behavior’s over three year history, there have been many instances in which a request appeared illegitimate for what turned out to be innocuous reasons, such as improperly implemented web browsers or proxies. In all of these cases, I have either updated Bad Behavior, worked with the author of the software to help them resolve the problem, or both. For obvious reasons I strongly prefer the latter option; it’s my opinion that software which doesn’t conform to the basic Internet standards (RFCs) should be updated to do so, and its users should demand better from those vendors or seek other solutions.

In cases where the interaction between Bad Behavior and a user’s web browser or proxy is at issue, my goal is zero false positives, even at the risk of allowing some spam to pass through. This is why I have always recommended that no anti-spam solution be used alone, and that Bad Behavior be paired with another solution which does analyze message content, such as Akismet.

There is another class of “false” positives, though, and that is those where the user being blocked represents a potential threat. This includes computers which are stuffed full of viruses, botnets and other malware, and those from which spam is already pouring forth, with or without the user’s knowledge. Bad Behavior blocks these, even though the user may be someone from whom the site owner wants to receive a comment, for the safety of both the site and the blocked user.

In one egregious case, a user visited their favorite blog, which was running Bad Behavior. But because their computer was part of a botnet sending comment spam, it them started sending spam to the blog the user just visited! Bad Behavior blocked this, but it incidentally also blocked the user who happened to be reading the very same blog his computer was trying to deliver spam to. Later I found out that this person refused to use any sort of anti-virus software, even though it could be downloaded for free, and did not care if his computer was sending spam. Bad Behavior will continue to block this sort of recalcitrant user, even if some would consider it a “false” positive.

In these circumstances, Bad Behavior delivers a message to the user explaining that they were temporarily blocked and giving directions on how the user can resolve the problem. In most cases it involves merely removing the malware from the computer. In the rare case that the directions given do not resolve the problem, the user also gets contact information for the site owner for further help. If this happens, the site owner can review the Bad Behavior logs to determine what might be going on, and then forward the report to me. I go over these reports and help the site owner and original user resolve the problem.

Sometimes this results in a change to Bad Behavior, sometimes a third party program is fixed, and sometimes the user’s computer gets some advanced anti-virus help. Even with all the thousands of people using Bad Behavior, and the countless millions visiting those sites, I only get approximately one such report a week.

By way of example, the last two such reports I received were interesting. One was from a major U.S. city newspaper which uses Bad Behavior to protect its blogs. The user who was blocked turned out to have had a third-party software component on their computer which, while legitimate in and of itself, is popular with many malware authors, but the user was still being blocked even after the malware and the third-party component were removed. The component had left traces of itself behind, causing the user to continue to be blocked. I gave the newspaper instructions on how to remove these traces, and also the third-party component was updated to no longer leave these traces behind.

The most recent report concerns a small business router appliance from a major vendor. Due to a design error in the router’s firmware, Bad Behavior would block accesses through this device if any of the router’s web filtering features were enabled. I was able to provide the router user with a workaround, and the vendor has opened an internal ticket to resolve the issue. A fix is expected in a future firmware release for the affected routers.

Compared to how widely Bad Behavior is used, these reports are quite rare, as even users who are blocked for reasons of having viruses on their computer are almost always able to resolve the issue by themselves. Nevertheless, I take every blocked user seriously and I put forth whatever effort is needed to make sure that Bad Behavior protects your site without needlessly inconveniencing your users.


25 Responses to “How Bad Behavior handles false positives”

  1. 1

    Eli Xenos Says

    Blah, blah, blah… Listen, I’ve stopped trying to post to Garth Turner’s blog because of your stupid program.

    Who do you think you are? Some gov’t agency?

    Your way or no way, eh?

    I’m not impressed and I’ve moved on.

  2. 2

    Michael Hampton Says

    I would love to know what you’re talking about, Eli, since if you were able to post here, then you shouldn’t have a problem anywhere else, since my “stupid program” is also running here. Obviously it isn’t my “stupid program” alone causing your problem. And if my “stupid program” is involved in your problem, then I would like to fix it. But your attitude needs a serious readjustment.

  3. 3

    Randy Says

    LOL – sorry, but i just have to laugh at poor Eli.

    I for one am impressed with the development of BB, and i use it on all my sites – and when combined with either Akismit or Mollom, i get virtually zero spam. The ancillary benefit of blocking scrapers is just icing on the cake.. The ONLY problem i ever had or have been made aware of from legitimate users was one user that had a piece of third party software left over in his registry – probably the same one you referred to..

    keep up the good work.

  4. 4

    Jay Says

    Hi Michael, since you’re likely to hear complaints from the very small number of people who have problems and nothing from the large number of people who use BB with no problems, I just wanted to add my 2 cents and say that Bad Behavior, your labor of love, is greatly appreciated and works wonderfully for so many people.

    All the work you put into Bad Behavior has really made a difference and made the internet a better place.

    Cheers.

  5. 5

    n3wjack Says

    Hear hear!
    Interesting read on the botnet blocking bit. Didn’t know it also did that. Together with Akismet it’s definitely something that keeps the evil spammers away, so I can only applaud to that.
    A small blog such as mine would be completely pointless to keep if there where no tools around like yours, as I found out a few months after it was online, and it got into the spambot directories.
    I’d be spending more time killing spam comments than actually being able to write anything.

  6. 6

    Álvaro Degives-Más Says

    Well I’m posting a false positive comment here about Eli Xenos’ experience with bearded Canadians, and it went through so… ha – in yer face!

    But: what previous commenters said. BB rocks, end of story.

    Also: Merry Christmas to y’all.

  7. 7

    Michael Lane Says

    FYI,

    I just experienced a BB blockage when I tried to load a page that I master. BB blocked the Opera browser because of a suspicious user agent string. To rememdy the problem I removed Opera’s site preferences for that particular site and then deleted all previous Opera sessions. Problem resolved.

  8. 8

    Ian M Says

    Thoughtful and insightful commentary. People who haven’t had to deal with writing spam-prevention tools really have no idea how complex and difficult it is. Hats off to you for approaching it in such a professional and comprehensive manner.

  9. 9

    Nobody Says

    I’m just curious how can I block proxy access to my website…….and come to here..wondering if your script can ban me…right before i’was using an open anonymous proxy getting from internet..
    Wow it works…….i wanna try to learn your code…
    ThanX.

  10. 10

    Nobody Says

    wait a minute…ah..
    I forgot to mention two things, one is i need to port your script to a generic php based board and second is that I already know how to read client head information to detect transparent proxies..but the pain of mine was to detect anonymous proxies having no inforamtion such as x_forwaded_for(?)..via..etc.

    maybe i should look carefully or thoroughly your site to find what i want to know…but if you happen to have a little time to share your technique to detect such (high) anonymous proxies…would you let me know the direction?

  11. 11

    Donace Says

    Hi Just a small question in regards to the new version; would it be possible for a checkbox for the setting?

    IE so you can select what to block, so:
    Useragents
    BL IP addresses
    etc etc

    Sometimes the header’s of legitimate users or feed aggregation sites are slightly modified and BB blocks hem; So I would like some varying control while I verify etc.

    Cheers

  12. 12

    Álvaro Degives-Más Says

    Donace, I’m not understanding you. What’s exactly the purpose of installing a piece of carefully deliberated but thorough protection against rogue clients, when you’re defeating it? BB has tons of options to lighten its scrutiny as it is; just read the documentation included in the files. Especially disabling the UA check is just beyond foolhardy, and quite contrary to the philosophy of BB. If you have enough visitors with whacky UA strings, investigate some other type of protection, as what BB is accomplishing is far, far too complicated to leave it to the typically clueless users’ discretion with an all too inviting checkbox. Your suggestion to disable UA checks is just an invitation to disaster, and arguably would lead to Michael Hampton getting tons of annoyed but ill-informed and unnecessary criticism to deal with.

    Anyway, you can whitelist IP addresses in the corresponding file. You can therefore also whitelist whole blocks – as silly as that is. It’s not that easy to get on the BLs checked by BB; whitelist at your own risk.

    Seriously, if your site attracts a risky profile audience, consider other alternatives. BB is a solidly constructed wall that keeps the riff-raff out, and is a great solution for site admins (like me) that don’t want to deal with at best marginally trustworthy visitors, and just show them the door to take their dangerous stuff elsewhere. Zero tolerance is a *good* thing when security is at stake; I’ve become a hardliner learning things the hard way (i.e. having to completely redo sites several times, until I figured out that I was spending 99% of my time on less than 1% of visitors).

    Live on the edge at your risk – the “don’t feed the animals” signs bear great wisdom.

  13. 13

    Donace Says

    Álvaro Degives-Más, thank you for the insightful response; the reasons I highlighted the ‘incomplete header’ and the such is because a number of small searchengine crawlers get blocked; as well as some firewalls mess with headers Kerio /Subelt Personal firewall being one.

    Though as you mention it is just 1% of users that are effected so it might be better to let spend the extra min whitelisting themselves using the internal verification provided by BB

  14. 14

    Andre Says

    I get false positive on my one web side with the code “a0105122″

    This occurs only rare, mostly wen i post something in my wiki the first time, after starting the browser.

    maybe it needs a strcmp($package['server_protocol'], “HTTP/1.1″) check?

    The header looks like this (some data are black out):

    Array
    (
    [REMOTE_ADDR] => “…”
    [REMOTE_PORT] => “…”
    [REMOTE_USER] => “…”
    [REQUEST_METHOD] => “POST”
    [REQUEST_URI] => “…”
    [REQUEST_TIME] => “…”
    [QUERY_STRING] => “”
    [SERVER_PROTOCOL] => “HTTP/1.1″
    [HTTP_ACCEPT] => “…”
    [HTTP_ACCEPT_CHARSET] => “…”
    [HTTP_ACCEPT_LANGUAGE] => “…”
    [HTTP_CONNECTION] => “Keep-Alive, TE”
    [HTTP_COOKIE] => “…”
    [HTTP_COOKIE2] => “…”
    [HTTP_EXPECT] => “100-continue”
    [HTTP_HOST] => “…”
    [HTTP_TE] => “deflate, gzip, chunked, identity, trailers”
    [HTTP_USER_AGENT] => “…”
    )

  15. 15

    Michael Hampton Says

    Andre, that doesn’t look like a false positive. It looks like a problem with the software you’re using.

  16. 16

    Andre Says

    Thanks for your comment, Michael Hampton

    Hmmm… Ok…

    i use opera, firewale, and the system is regularly virus checked.
    the error occurs too if i don’t post any text but abbort the edit.
    (i have no idea where this EXPECT is coming from, maybe from the firewall? but so mach i know, EXPECT is ok with HTTP/1.1, true?)

    in the code common_tests.inc.php is written:
    // Is it claiming to be HTTP/1.0? Then it shouldn’t do HTTP/1.1 things

    if the HTTP_EXPECT header is ok in HTTP/1.1, but then the code don’t check this:
    if (array_key_exists(‘Expect’, $package['headers_mixed']) && stripos($package['headers_mixed']['Expect'], “100-continue”) !== FALSE) {

    in the next check the HTTP/1.1 is checked:
    if ($settings['strict'] && !strcmp($package['server_protocol'], “HTTP/1.1″)) {
    if (array_key_exists(‘Pragma’, $package['headers_mixed']) && strpos($package['headers_mixed']['Pragma'], “no-cache”) !== FALSE && !array_key_exists(‘Cache-Control’, $package['headers_mixed'])) {

    if it is my software, i have no idea what could cause this problem, there is nothing unusual. where i should search the cause?
    (i don’t want abuse this commend place as forum, if everything is ok with the code in your opinion, i will fix it for the wiki with a patch, thanks.)

  17. 17

    Andre Says

    Thanks for your comment, Michael Hampton

    Hmmm… Ok…

    i use opera, firewale, and the system is regularly virus checked.
    the error occurs to if i don’t post any text but abbort the edit.
    (i have no idea where this EXPECT is coming from, maybe from the firewall? but so mach i know, EXPECT is ok with HTTP/1.1, true?)

    in the code common_tests.inc.php is written:
    // Is it claiming to be HTTP/1.0? Then it shouldn’t do HTTP/1.1 things

    if the HTTP_EXPECT header is ok in HTTP/1.1, but then the code don’t check this:
    if (array_key_exists(‘Expect’, $package['headers_mixed']) && stripos($package['headers_mixed']['Expect'], “100-continue”) !== FALSE) {

    in the next check the HTTP/1.1 is checked:
    if ($settings['strict'] && !strcmp($package['server_protocol'], “HTTP/1.1″)) {
    if (array_key_exists(‘Pragma’, $package['headers_mixed']) && strpos($package['headers_mixed']['Pragma'], “no-cache”) !== FALSE && !array_key_exists(‘Cache-Control’, $package['headers_mixed'])) {

    if it is my software, i have no idea what could cause this problem, there is nothing unusual. where i should search the cause?
    (i don’t want abuse this commend place as forum, if everything is ok with the code in your opinion, i will fix it for the wiki with a patch, thanks.)

  18. 18

    Michael Hampton Says

    Andre, since you were able to post here, which also uses Bad Behavior, I suspect something is going on with your web server. But since you filled out fake information I can’t go any further.

  19. 19

    Andre Says

    Sorry, i get a little paranoid. a spamer tray recently to spam the wiki with many IPs, and science den among others i don’t publish my email anymore. And sorry for the double Post. (You can delete my posts if you want.)

  20. 20

    Tamara Burks Says

    I tried posting on one site and it gave me a error saying it wouldn’t post because of your software. then went back and reloaded the page and saw it had gone through. I’m more than a bit confused. I was thinking that maybe that instance of fakealert virus that I had had about a week ago (and had originally thought I was just continually running across a very annoying autoscan) was cleared up. I have a antivirus, winpatrol and I use Glarysoft utilities to repeatedly clean out the temp files so if anything is hiding in there gets cleaned out.

  21. 21

    Sebastian Says

    Hi there,

    I just wanted to say thanks! I love BB and it saved me tons of time and resources. It works really well (at least I didn’t get any bad feedback at all so far.

    What I hate are ignorant users or even worse ignorant users that don’t care at all – those who cause the web to drown in SPAM, Botnets and Phishing.

    Again, thanks for BB!

  22. 22

    joe Says

    Just added two robots to the whitelist, which may be interesting to have there by default.

    1. Simile (a MIT research project on Semantic Web subjects)
    Fixed IP address: “18.51.2.218″, // Simile.MIT.EDU

    2. Sincice (a semantic web search engine with their own index robot).
    Allowing their UA seems not to work: “SindiceFetcher/0.1 (+http://sindice.com/developers/bot)”,

    I noticed the sindice people about the problem, see:

    http://forum.sindice.com/showthread.php?p=330#post330

  23. 23

    tigtog Says

    Hi, just installed BB today. The spam blocking appears impressive, but the only two people to contact me to say that they can’t see the site are just getting a blank page, no error message or self-help links.

    Their IPs are not showing up in the list of blocked IPs in the WordPress management page.

    For now I’ve just whitelisted them, until I figure my way around the php-MyAdmin logs. But why aren’t they getting any diagnostic message? I’m using WP SuperCache, and I added the code there as instructed, and other people are commenting OK (although not as many as usual).

    Any ideas?

  24. 24

    tigtog Says

    The IPs of these people didn’t show up in the PHP-MyAdmin logs either, and there’s now two more reports. Comment counts for the day are right down, so I’m going to have to disable BB until I can find a resolution.

    I’m still on an older version of WordPress because I can still use the Greasemonkey extension Akismet Auntie Spam with it. It hasn’t been updated for WP 2.7+ and I refuse to wade through spam without it, so unless BB can work to block the spam and still allow commenters I guess I’m stuck without upgrading that blog.

  25. 25

    Michael Hampton Says

    If people get a completely blank page, then there was probably a PHP error somewhere, which will be logged in the server’s error log. You might look for it there.