[vox-tech] Training spamassassin's bayenessian filter

Ryan Castellucci vox-tech@lists.lugod.org
Thu, 6 Nov 2003 10:17:00 -0800


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thursday 06 November 2003 08:58 am, p@dirac.org wrote:
> On Thu 06 Nov 03,  8:29 AM, R. Douglas Barbieri <doug@dooglio.net> said=
:
> > On Wed, Nov 05, 2003 at 09:59:12PM -0800, Ryan Castellucci wrote:
> > > -----BEGIN PGP SIGNED MESSAGE-----
> > > Hash: SHA1
> > >
> > > On Wednesday 05 November 2003 09:24 pm, Ken Bloom wrote:
> > > > Will SpamAssassin's bayenessian be more effective if I train it o=
n
> > > > every message that comes through (even ones that it's built in te=
sts
> > > > have already rejected as spam) or only on false negatives?
> > >
> > > Yes, it's much more effective if you train it on all messages.
> >
> > Woah. Dumb question, but when did SpamAssassin go Bayesian? It's one =
of
> > the reasons I switched away from it to Bogofilter.
>
> i was wondering the same thing.  it's actually a little difficult
> finding references to bayesian filtering on sa's website.  if you do a
> google search, most of the results are on LUG mailing lists.
>
> according the sa site, version 2.5 had it.
>
>
> the version i'm using on one of the accounts i own on someone else's
> machine, 2.43, didn't have it.
>
> that's pretty cool.  maybe someday /. will have a "bayesian filter
> shootout" to see who's most effective.   ;-)   but to be honest,
> bayesian filtering along with lexical parsing seems to be the most
> effective (incoming mail to dirac has both).  sa's lexical filtering,
> for me at least, only catches the most obvious spams.  i've had to bump
> up some of the score results to get anything resembling effective.  i'm
> glad they introduced this new functionality.
>
> pete

The other neat thing spam assassin can do, with bayesian filtering, is=20
autolearning. If the score is above or below a configurable level, it=20
automaticaly trains on it, as spam or ham respectivly.

For example....

X-Spam-Status: No, hits=3D-10.9 required=3D6.0
        tests=3DEMAIL_ATTRIBUTION,HABEAS_SWE,IN_REP_TO,KNOWN_MAILING_LIST=
,
              PGP_SIGNATURE,QUOTED_EMAIL_TEXT,REFERENCES,
              REPLY_WITH_QUOTES
        autolearn=3Dham version=3D2.55

Unfortantly, there is no way for me to train the instance of spamassassin=
=20
running at my ISP.

- --=20
PGP/GPG Fingerprint: 3B30 C6BE B1C6 9526 7A90  34E7 11DF 44F3 7217 7BC7
On pgp.mit.edu, import with `gpg --keyserver pgp.mit.edu --recv-key 72177=
BC7`
Also available at http://www.cal.net/~ryan/ryan_at_mother_dot_com.asc
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/qpAcEd9E83IXe8cRAjcrAJ9DJhwHrHHEQROX2cEu0Cr8L1Tx4QCeJjF4
9suAKYZ1USRUSWdfK/x79XA=3D
=3Dr3R6
-----END PGP SIGNATURE-----