[vox-tech] Training spamassassin's bayenessian filter

Thu, 6 Nov 2003 08:58:37 -0800

On Thu 06 Nov 03,  8:29 AM, R. Douglas Barbieri <doug@dooglio.net> said:
> On Wed, Nov 05, 2003 at 09:59:12PM -0800, Ryan Castellucci wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> > 
> > On Wednesday 05 November 2003 09:24 pm, Ken Bloom wrote:
> > > Will SpamAssassin's bayenessian be more effective if I train it on
> > > every message that comes through (even ones that it's built in tests
> > > have already rejected as spam) or only on false negatives?
> > 
> > Yes, it's much more effective if you train it on all messages.
> 
> Woah. Dumb question, but when did SpamAssassin go Bayesian? It's one of
> the reasons I switched away from it to Bogofilter.

i was wondering the same thing.  it's actually a little difficult
finding references to bayesian filtering on sa's website.  if you do a
google search, most of the results are on LUG mailing lists.

according the sa site, version 2.5 had it.

the version i'm using on one of the accounts i own on someone else's
machine, 2.43, didn't have it.

that's pretty cool.  maybe someday /. will have a "bayesian filter
shootout" to see who's most effective.   ;-)   but to be honest,
bayesian filtering along with lexical parsing seems to be the most
effective (incoming mail to dirac has both).  sa's lexical filtering,
for me at least, only catches the most obvious spams.  i've had to bump
up some of the score results to get anything resembling effective.  i'm
glad they introduced this new functionality.

pete

-- 
GPG Instructions: http://www.dirac.org/linux/gpg
GPG Fingerprint: B9F1 6CF3 47C4 7CD8 D33E 70A9 A3B9 1945 67EA 951D