[vox-tech] bogofilter newbie

vox-tech@lists.lugod.org vox-tech@lists.lugod.org
Tue, 23 Sep 2003 10:44:05 -0700


On Tue 23 Sep 03, 10:22 AM, Ken Herron <Kherron@newsguy.com> said:
> --On Tuesday, September 23, 2003 07:26:08 -0700 p@dirac.org wrote:
> 
> >1. update bogofilter's wordlists with every incoming message, using the
> >   -u option.  if i understand it, -u will first classify the spam, then
> >   update bogofilter's wordlist.  that seems like asking for trouble.
> >   if you filter to /dev/null based on bogofilter's output, how do you
> >   correct mistakes?  and it seems like mistakes here will cause more
> >   mistakes in the future.
> >
> >   i assume you do this with:
> >
> >   :0fw
> >   | bogofilter -f -p -u -l -e -v
> >
> >   also, shouldn't there be a "c" in the procmail colon line?  how does
> >   mail get past this recipe?  isn't it considered "delivered" when an
> >   email matches a recipe unless you use ":0c"?
> 
> A procmail recipe tagged with "f" is a filtering recipe. Procmail pipes 
> the message through the specified program, then continues on using the 
> filtered version of the message.  It's not a delivering recipe, so "c" 
> isn't needed.
 
aha, thanks!  i guess i didn't quite understand what "cosider the pipe
as a filter" meant in man procmailrc, but you made it very clear indeed.
thanks!

> Incoming mail is piped through this set of rules:
> 
>        :0 fw
>        | /usr/bin/bogofilter -u -2 -p -e
 
fwiw, i just read a few hours ago that -u was bad to use.  see the
bottom of page

   http://www.bgl.nu/bogofilter/tuning.html

that's one of the things i'm not so fond about bogofilter.  so many
options.  so many ways of doing things.  it's a headache.  :(

> It's a good idea to collect your spam rather than deleting it. You might 
> want to delete your wordlist one day and build a new one; you'll need a 
> collection of current spam to do that. More important, any time 
> bogofilter makes a mistake you need to correct it, whether it was a false 
> positive or false negative. I can't remember the last time I found 
> non-spam in my spam folder, but it does happen from time to time.

yeah.  that was the reason why the guy says -u is bad.  he makes the
claim that mistakes snowball with -u.  he seemed pretty adamant about
it, so for now i'm doing it manually from mutt:

macro pager S "<pipe-entry>bogofilter -s"
macro index S "<pipe-entry>bogofilter -s"
macro pager N "<pipe-entry>bogofilter -n"
macro index N "<pipe-entry>bogofilter -n"

i still need to feed bogofilter the email, just so i can look at how the
spamicity levels are doing.  i just won't be auto-feeding bogo's word
list based on the spamicity; i think my rule will look something like:

   :0 fw
   | /usr/bin/bogofilter -p -e

but i'm still in the middle of reading docs and stratagies.

i've read that it might be better not to feed other peoples' wordlists
into your own worldlist.  the idea behind it is that bogofilter should
be auto-correcting, and identify spam based on the type of spam you're
currently receiving.  but i've also read the opposite too, that it IS
good to share wordlists and spam.  trying to sort through all this
info...


thanks for the email ken!  bogofilter is a tremendous topic...

pete

-- 
GPG Instructions: http://www.dirac.org/linux/gpg
GPG Fingerprint: B9F1 6CF3 47C4 7CD8 D33E 70A9 A3B9 1945 67EA 951D