[vox-tech] bogofilter newbie
Ken Herron
vox-tech@lists.lugod.org
Tue, 23 Sep 2003 10:22:44 -0700
--On Tuesday, September 23, 2003 07:26:08 -0700 p@dirac.org wrote:
> 1. update bogofilter's wordlists with every incoming message, using the
> -u option. if i understand it, -u will first classify the spam, then
> update bogofilter's wordlist. that seems like asking for trouble.
> if you filter to /dev/null based on bogofilter's output, how do you
> correct mistakes? and it seems like mistakes here will cause more
> mistakes in the future.
>
> i assume you do this with:
>
> :0fw
> | bogofilter -f -p -u -l -e -v
>
> also, shouldn't there be a "c" in the procmail colon line? how does
> mail get past this recipe? isn't it considered "delivered" when an
> email matches a recipe unless you use ":0c"?
A procmail recipe tagged with "f" is a filtering recipe. Procmail pipes
the message through the specified program, then continues on using the
filtered version of the message. It's not a delivering recipe, so "c"
isn't needed.
I seeded bogofilter just like you did. I use maildirs for my email so
every message is in a separate file, so I built a big list of every
message less than a year old, divided them into spam & non-spam, and
piped each set into bogofilter.
Incoming mail is piped through this set of rules:
:0 fw
| /usr/bin/bogofilter -u -2 -p -e
# Spam? Save it in the spam folder
:0
* ^X-Bogosity: (yes|spam)
$SPAM
It's a good idea to collect your spam rather than deleting it. You might
want to delete your wordlist one day and build a new one; you'll need a
collection of current spam to do that. More important, any time
bogofilter makes a mistake you need to correct it, whether it was a false
positive or false negative. I can't remember the last time I found
non-spam in my spam folder, but it does happen from time to time.
You'll need to find a method of feeding mail back into bogofilter that
works for you. I copy the mail into a special mailbox that's swept by a
cron job several times per day. These messages are fed back into procmail
using a special set of rules:
# Messages labelled spam. Tell bogofilter it's not, and save to INBOX
:0
* ^X-Bogosity: (Spam|Yes)
{
:0 c
| /usr/bin/bogofilter -Sn
:0
$DEFAULT
}
# Messages not labelled spam.
:0 E
{
:0 c
* ^X-Bogosity: (ham|no)
| /usr/bin/bogofilter -Ns
:0
$SPAM
}
Note I'm not using bogofiler as a filter this time. Without -p
(passthrough mode) it won't output a new copy of the message with the
corrected spam header.
--
"We actually do 100,000 pages or more a day in Bork"
-- Marissa Mayer, Google
Kenneth Herron Kherron@newsguy.com 916-366-7338