{"id":816,"date":"2003-09-29T10:59:59","date_gmt":"2003-09-29T10:59:59","guid":{"rendered":"http:\/\/sunpig.com\/mt-entry-816.html"},"modified":"2006-09-23T19:30:11","modified_gmt":"2006-09-23T19:30:11","slug":"bayesian-filter-for-blog-comments","status":"publish","type":"post","link":"https:\/\/sunpig.com\/martin\/2003\/09\/29\/bayesian-filter-for-blog-comments\/","title":{"rendered":"Bayesian filter for blog comments"},"content":{"rendered":"<p>I don&#8217;t get much comments spam myself right now (maybe a message a week or so), but the problem is definitely <a href=\"http:\/\/jeremy.zawodny.com\/blog\/archives\/000984.html\">getting<\/a> <a href=\"http:\/\/huminf.uib.no\/~jill\/archives\/blog_technical\/comment_spam.html\">worse<\/a>.   <\/p>\n<p>For Movable Type installations, there are several solutions available, such as an option to provide a <a href=\"http:\/\/cheerleader.yoz.com\/archives\/000849.html\">&#8220;delete this comment&#8221; link<\/a> with every &#8220;new comment&#8221; email, and a combined <a href=\"http:\/\/www.jayallen.org\/journey\/2003\/09\/killing_comment_spam_dead\">url blocker\/comments hider technique<\/a>.  Also, some people have proposed <a href=\"http:\/\/www.jacobsen.no\/anders\/blog\/archives\/2002\/11\/25\/mt_26_feature_suggestion_collaborative_spamblocking.html\">collaborative blacklists<\/a>, or <a href=\"http:\/\/simon.incutio.com\/archive\/2003\/07\/24\/commentAuthenticationPrototype\">collaborative authentication<\/a> for comments posters.<\/p>\n<p>I&#8217;m surprised that no-one seems to have suggested <a href=\"http:\/\/www.paulgraham.com\/spam.html\">Bayesian filtering<\/a> for comments, though.  I get about 15-20 spam messages via email every day, but the <a href=\"http:\/\/spambayes.sourceforge.net\/\">SpamBayes plugin for Outlook<\/a> routes almost all of them straight into a &#8220;Spam&#8221; folder.  I never see them in my inbox.  Maybe one or two message in a hundred make it through the filter, and I haven&#8217;t had any false positives for <em>ages<\/em>.  It doesn&#8217;t involve maintaining blacklists, and it&#8217;s a lot less effort than deleting every single junk message.<\/p>\n<p>In Movable Type, it you could have a &#8220;bayesfilter&#8221; property on the MTComments template tag:  <code>&lt;MTComments bayesfilter=\"1\"&gt;<\/code>.  All comments would have to pass through the filter, and only those that were not spam would make it on to the page.<\/p>\n<p>You&#8217;d need some additional mechanism to &#8220;train&#8221; the system, and somewhere to put the statistical knowledge base the filter uses to tell spam from genuine comments.  Finally, you&#8217;d need a way of correcting the system after the initial training, so that any spam that does make it through can be deleted with prejudice, and so that false positives can be corrected.<\/p>\n<p>This would be a nice anti-spam comments system.  It would involve a Movable Type plugin, and some hacking to the Movable Type application itself.  Unfortunately I don&#8217;t have time to do this right now, and even if I did have time, I&#8217;ve sworn off perl.  (Did you know that &#8220;perl&#8221; is an anagram of &#8220;pain&#8221;?)  But I wonder if the <a href=\"http:\/\/www.lazyweb.org\/\">Lazyweb<\/a> could do it for me, or if the nice people at <a href=\"http:\/\/www.sixapart.com\/\">Six Apart<\/a> would be so kind as to include this feature in MT Pro?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I don&#8217;t get much comments spam myself right now (maybe a message a week or so), but the problem is definitely <a href=\"http:\/\/jeremy.zawodny.com\/blog\/archives\/000984.html\">getting<\/a> <a href=\"http:\/\/huminf.uib.no\/~jill\/archives\/blog_technical\/comment_spam.html\">worse<\/a>. <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[24],"tags":[],"class_list":["post-816","post","type-post","status-publish","format-standard","hentry","category-blogging"],"_links":{"self":[{"href":"https:\/\/sunpig.com\/martin\/wp-json\/wp\/v2\/posts\/816","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sunpig.com\/martin\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sunpig.com\/martin\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sunpig.com\/martin\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sunpig.com\/martin\/wp-json\/wp\/v2\/comments?post=816"}],"version-history":[{"count":0,"href":"https:\/\/sunpig.com\/martin\/wp-json\/wp\/v2\/posts\/816\/revisions"}],"wp:attachment":[{"href":"https:\/\/sunpig.com\/martin\/wp-json\/wp\/v2\/media?parent=816"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sunpig.com\/martin\/wp-json\/wp\/v2\/categories?post=816"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sunpig.com\/martin\/wp-json\/wp\/v2\/tags?post=816"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}