[vox-tech] bogofilter question

vox-tech@lists.lugod.org vox-tech@lists.lugod.org
Tue, 14 Oct 2003 11:11:47 -0700


dear fellow bogofilter users,

i'm in the process of training bogofilter.  i've decided to not do it
automatically, with the -u(pdate) option, since the docs warn not to.
instead, everytime an email comes in, i've aliased ^n to pass it to
bogofilter as nospam and ^s to pass it to bogofilter as spam.  so far,
i've got about 600 spams and 600 "hams" (as the bogofilter docs call
it):

p@gabriel$ bogoutil -w ~/.bogofilter .MSG_COUNT
                       spam   good
.MSG_COUNT              559    575

side note: i was hardly surprised that half my incoming email is spam.
according to the docs, i'm lucky since you want these numbers to be
roughly the same.


now at some point, i'm going to have to edit .procmail to start sending
email to /dev/null if it sees:

   * ^X-Bogosity:.*Yes

which means that i'll only be training bogofilter on false negatives
(and unfortunately, false positives will be gone, but the whole point is
to not see spam anymore, not just to save them in some file that i have
sort through periodically).

the bogofilter docs recommend that i should do this at about 10,000
emails.  a bogofilter website (one of the developers) said this number
should be more like 20,000.

that's absurd.  i've only seen a false positive once or twice, when
i first started to use bogofilter.  false negatives are rare.  maybe one
or two a week.

are there any more experienced bogofilter people out there who thought
about this issue?  if so, what was your conclusion?  i can't see doing
this for much past 1000 emails in each ham/spam bin.


lastly, the docs recommend not to share databases with other people
because the whole point is to tailor bogofilter for the type of spam and
ham that arrives in YOUR inbox.  not other people's inboxes.  otherwise,
you might as well use a lexical analyzer like spamcop.  are there any
experienced bogofilter users here that have thought about this issue?  i
suspect the docs may overstate this claim.  we all get offered XXX
videos, penis enlargements and international bank transfers.  but then
again, i'm still vaguely a bogofilter newbie, so i'd like some guidance
if anybody has actually thought about this issue.

thanks,
pete

-- 
GPG Instructions: http://www.dirac.org/linux/gpg
GPG Fingerprint: B9F1 6CF3 47C4 7CD8 D33E 70A9 A3B9 1945 67EA 951D