[vox-tech] Training spamassassin's bayenessian filter
Ryan Castellucci
vox-tech@lists.lugod.org
Thu, 6 Nov 2003 10:17:00 -0800
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Thursday 06 November 2003 08:58 am, p@dirac.org wrote:
> On Thu 06 Nov 03, 8:29 AM, R. Douglas Barbieri <doug@dooglio.net> said=
:
> > On Wed, Nov 05, 2003 at 09:59:12PM -0800, Ryan Castellucci wrote:
> > > -----BEGIN PGP SIGNED MESSAGE-----
> > > Hash: SHA1
> > >
> > > On Wednesday 05 November 2003 09:24 pm, Ken Bloom wrote:
> > > > Will SpamAssassin's bayenessian be more effective if I train it o=
n
> > > > every message that comes through (even ones that it's built in te=
sts
> > > > have already rejected as spam) or only on false negatives?
> > >
> > > Yes, it's much more effective if you train it on all messages.
> >
> > Woah. Dumb question, but when did SpamAssassin go Bayesian? It's one =
of
> > the reasons I switched away from it to Bogofilter.
>
> i was wondering the same thing. it's actually a little difficult
> finding references to bayesian filtering on sa's website. if you do a
> google search, most of the results are on LUG mailing lists.
>
> according the sa site, version 2.5 had it.
>
>
> the version i'm using on one of the accounts i own on someone else's
> machine, 2.43, didn't have it.
>
> that's pretty cool. maybe someday /. will have a "bayesian filter
> shootout" to see who's most effective. ;-) but to be honest,
> bayesian filtering along with lexical parsing seems to be the most
> effective (incoming mail to dirac has both). sa's lexical filtering,
> for me at least, only catches the most obvious spams. i've had to bump
> up some of the score results to get anything resembling effective. i'm
> glad they introduced this new functionality.
>
> pete
The other neat thing spam assassin can do, with bayesian filtering, is=20
autolearning. If the score is above or below a configurable level, it=20
automaticaly trains on it, as spam or ham respectivly.
For example....
X-Spam-Status: No, hits=3D-10.9 required=3D6.0
tests=3DEMAIL_ATTRIBUTION,HABEAS_SWE,IN_REP_TO,KNOWN_MAILING_LIST=
,
PGP_SIGNATURE,QUOTED_EMAIL_TEXT,REFERENCES,
REPLY_WITH_QUOTES
autolearn=3Dham version=3D2.55
Unfortantly, there is no way for me to train the instance of spamassassin=
=20
running at my ISP.
- --=20
PGP/GPG Fingerprint: 3B30 C6BE B1C6 9526 7A90 34E7 11DF 44F3 7217 7BC7
On pgp.mit.edu, import with `gpg --keyserver pgp.mit.edu --recv-key 72177=
BC7`
Also available at http://www.cal.net/~ryan/ryan_at_mother_dot_com.asc
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
iD8DBQE/qpAcEd9E83IXe8cRAjcrAJ9DJhwHrHHEQROX2cEu0Cr8L1Tx4QCeJjF4
9suAKYZ1USRUSWdfK/x79XA=3D
=3Dr3R6
-----END PGP SIGNATURE-----