« Running as non-admin: Office 2003 | Main | My first book... sort of »
April 25, 2005
Keyword blacklisting for MT Trackback spam
Like many others, I've gotten really tired of trackback spam. Though comment moderation helps on that front, version 3.01D of Movable Type doesn't seem to have anything corresponding for trackbacks. Because these things are indexed by Google, I have long felt an obligation to delete them on a very timely basis. I'm sick of it.
Because I use MT with mod_perl, I understand that the standard add-ons for this don't work, so I decided to hack up my own. I am very weak with the whole MT infrastructure, so elected for something simple that I could understand. This means "It's a hack". But it's been working for me quite well.
I found the code in mt/lib/MT/App/Trackback.pm that seems to process the trackbacks in realtime, and it seemed clear just how an entry could be rejected. So I added my own: the idea is that we know the URL and excerpt from the entry, and if either one contains certain keywords, we'll reject the trackback.
Adding this code to the start of the file:
use File::Spec; use MT::TBPing; use MT::Trackback; use MT::Util qw( first_n_words ...); use MT::App; @MT::App::Trackback::ISA = qw( MT::App ); #START ADDING HERE my $ban_patternfile = "/tmp/banpat"; my $ban_patternlog = "/tmp/banlog"; # THIS IS A HACK sub check_banned { my $url = shift; my $excerpt = shift; return 0 if not open(XX, $ban_patternfile); my @PATS = <XX>; close XX; my $banned = 0; foreach my $pat ( @PATS ) { $pat =~ s/#.*$//; # dump comments $pat =~ s/\s+$//; # dump trailing whitespace next if not $pat; if ( $url =~ m/$pat/i or $excerpt =~ m/$pat/i ) { $banned = 1; last; } } # log if possible if ( open (XX, ">>$ban_patternlog") ) { printf XX "%s %s\n", $banned ? "Banned" : "Passed", $url; close XX; } return $banned; } #END ADDITION sub init { my $app = shift; $app->SUPER::init(@_) or return; $app->add_methods( ...
... defines the function that does the banning and logging, and it must then be actually called by adding two lines to the ping sub:
no_utf8($tb_id, $title, $excerpt, $url, $blog_name); return $app->_response(Error=> $app->translate("Need a Source URL (url).")) unless $url; # SJF HACK return $app->_response(Error=> $app->translate("Banned trackback.")) # NEW if check_banned($url, $excerpt); # NEW if (my $fixed = MT::Util::is_valid_url($url || "")) { $url = $fixed; } else {
The file /tmp/banpat - which you can put anywhere - should contain a one-per-line list of words, phrases, or URLs that should be banned, and this list usually comes together relatively quickly. The file /tmp/banlog shows which trackbacks have been blocked or passed, though it shows only the URLs, not the excerpts.
There is nothing about this that is not a hack, and those who actually know the internals of MT can surely do better. I don't know MT internals, and every time I touch MT (including routine upgrades) it's a huge cluster, so the only thing time permitted was an ugly hack. Perhaps it will be useful to others.
... and am I the only one who finds that MT's "preview" doesn't actually do a real "preview"? Ugh.
Update: - it's been some days now, and a quick check of the banlog shows 234 entries, with only four of them being Passed. The offending entries were easy to add to the ban pattern file (and delete in MT). I think that catching 98% of trackback spam has been worth the effort.
Posted by steve at April 25, 2005 09:44 AM
Trackback Pings
TrackBack URL for this entry:
http://www.unixwiz.net/mt/trackback/34
Comments
When my blog started getting spammed, I went for a much simpler approach. Any comment with "http:" doesn't get posted. I don't get spam anymore. I posted this on Slashdot once and got a reply that the author loved me.
Posted by: MTS at April 25, 2005 03:32 PM