Comment spam killing MT weblogs?

Slashdot has an article on how comment spammers are killing Movable Type blogs by introducing high load on servers. The report turns out to be talking about a bug in MT 3.x, whereby the server gets put under high load even if the comment is rejected by MT 3.x’s anti-spam measures.

What about MT 2.x users, though? Movable Type deliberately left MT 2.x users in the cold in this regard as gentle encouragement to upgrade to MT 3.x. For me and many others, this is more hassle than it’s worth, although we definitely don’t want to have to continue to fend off the torrential spam that is coming through these days. These fuckers will do anything to get their links crawled in an effort to increase their PageRank.

Below, you’ll find a simple patch to MT 2.661, the last of the 2.6 releases. Once this is applied and someone tries to post a comment, MT will read a couple of files, bad_words and bad_urls, and reject the comment if either the author name contains any of the bad words or the comment field contains any of the bad URLs. In fact, it doesn’t just reject the comment; it also auto-bans the IP address of the prospective poster.

The bad_words and bad_urls files can actually contain regular expressions, one per line. These files should be installed in the same location as Comment.pm. In the code below, the path is hard-coded as /var/www/cgi-bin/lib/MT/App.

It takes a very short while to build up the bad_words and bad_urls files. If you’re hit with spam anything like as often as I am, you’ll find this patch starts to save you a lot of arduous work very quickly.

By posting this here, I run the risk that spammers read the code and work around my current measures, but I think the benefit of posting the code outweighs the inconvenience of potentially tipping off the spammers.

One tip: when composing your bad_words file, include &#\d+; as one of your regular expressions, as this will stop spammers from using HTML entities to get around your traps for individual words.

--- ./MT/App/Comments.pm.orig   2004-01-15 12:36:13.000000000 -0800
+++ ./MT/App/Comments.pm        2004-12-09 00:01:20.939025448 -0800
@@ -180,11 +180,73 @@
require MT::Util;
if (my $fixed = MT::Util::is_valid_url($url)) {
$url = $fixed;
+
+           # Check for bad URLS
+           my $bad_urls = '/var/www/cgi-bin/lib/MT/App/bad_urls';
+           if (-f $bad_urls) {
+               my @bad_urls;
+               my $bad_url;
+               open(URLS, $bad_urls);
+               push @bad_urls, $bad_url while chomp($bad_url = <URLS>);
+               close(URLS);
+
+               my $regex = join '|', @bad_urls;
+               if ($url =~ /$regex/) {
+                   require MT::IPBanList;
+                   my $ipban = MT::IPBanList->new();
+                   $ipban->blog_id($entry->blog_id);
+                   $ipban->ip($user_ip);
+                   $ipban->save();
+                   $ipban->commit();
+                   $app->log("IP $user_ip banned, because of bad URL: $url");
+                   return $app->handle_error($app->translate(
+                       "You have been banned from posting comments: [_1]", $url));
+               }
+           }
} else {
return $app->handle_error($app->translate(
"Invalid URL '[_1]'", $url));
}
}
+
+    # Check for author spam
+    my $bad_words = '/var/www/cgi-bin/lib/MT/App/bad_words';
+    if (-f $bad_words) {
+       my @bad_words;
+       my $bad_word;
+       open(WORDS, $bad_words);
+       push @bad_words, $bad_word while chomp($bad_word = <WORDS>);
+
+       my $regex = join '|', @bad_words;
+       my $author = $comment->author;
+         if ($author && $author =~ /$regex/i) {
+           require MT::IPBanList;
+           my $ipban = MT::IPBanList->new();
+           $ipban->blog_id($entry->blog_id);
+           $ipban->ip($user_ip);
+           $ipban->save();
+           $ipban->commit();
+           $app->log("IP $user_ip banned, because of bad comment name: $author");
+           return $app->handle_error($app->translate(
+               "You have been banned from posting comments: [_1]", $author));
+         }
+
+       # Check for text spam by looking for bad words in hypertext anchors
+       my $text = $q->param('text');
+       if ($text =~ /<a\s+href=.+?>(([^<]*)($regex)(.*?))<\/a>/is) {
+           my $bad = $1;
+           require MT::IPBanList;
+           my $ipban = MT::IPBanList->new();
+           $ipban->blog_id($entry->blog_id);
+           $ipban->ip($user_ip);
+           $ipban->save();
+           $ipban->commit();
+           $app->log("IP $user_ip banned, because of bad comment text: $bad");
+           return $app->handle_error($app->translate(
+               "You have been banned from posting comments: [_1]", $bad));
+       }
+
+    }
$comment->url(remove_html($url));
$comment->text($q->param('text'));
$comment->save;
This entry was posted in This Site. Bookmark the permalink.

2 Responses to Comment spam killing MT weblogs?

  1. ana says:

    Hm, heya, I must be a little under-tech’d to have to ask, but: what do I do with the patch? is it code I should insert in my mt-comments.cgi file, or a file on its own?

    I thought the answer would be pbvious if I opened mt-comments.cgi to edit it, but sadly, it’s not.

    I really look forward to being able to use this, it sounds nifty.

    Thank you!

    -a.

  2. I’ve responded to this with a personal e-mail already, but basically, you just need to use the UNIX patch(1) program to apply the patch to the Comments.pm module that came in your MT distribution.

Leave a Reply

Your email address will not be published. Required fields are marked *