[vox-tech] SpamAssassin training
Ken Bloom
kbloom at gmail.com
Mon Jan 1 13:01:44 PST 2007
SpamAssassin includes a naive bayesian classifier that can be used to
recognize spam based on keywords (in a probabilistically trained way).
The results of classification using the bayesian classifier are boiled
down into one of several rules: BAYES_00, BAYES_05, BAYES_20, ...,
BAYES_95, BAYES_99. These rules have statically assigned scores.
Combined with a whole pelathora of other more complex rules (for things
like header bugs, DNSBLs, body formatting, etc...) the scores for any
rules a message triggers are added up and used to determine whether a
message is actually spam.
The scores for these rules can be customized manually in
~/.spamassain/user_prefs or systemwide in files in /etc/spamassassin.
Is there any utility for spamassassin that could be used to train the
scores for all of its rules automatically, in a bayesian or
support-vector-machine kind of way? Note that I'm not talking about
training the bayesian filter, as I just explained, I'm curious about
automatically training the step that comes after the bayesian filter.
--Ken Bloom
--
Ken Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.lugod.org/pipermail/vox-tech/attachments/20070101/c1d795cf/attachment.pgp
More information about the vox-tech
mailing list