[vox-tech] bogofilter newbie

Ken Herron vox-tech@lists.lugod.org
Tue, 23 Sep 2003 10:22:44 -0700


--On Tuesday, September 23, 2003 07:26:08 -0700 p@dirac.org wrote:

> 1. update bogofilter's wordlists with every incoming message, using the
>    -u option.  if i understand it, -u will first classify the spam, then
>    update bogofilter's wordlist.  that seems like asking for trouble.
>    if you filter to /dev/null based on bogofilter's output, how do you
>    correct mistakes?  and it seems like mistakes here will cause more
>    mistakes in the future.
>
>    i assume you do this with:
>
>    :0fw
>    | bogofilter -f -p -u -l -e -v
>
>    also, shouldn't there be a "c" in the procmail colon line?  how does
>    mail get past this recipe?  isn't it considered "delivered" when an
>    email matches a recipe unless you use ":0c"?

A procmail recipe tagged with "f" is a filtering recipe. Procmail pipes 
the message through the specified program, then continues on using the 
filtered version of the message.  It's not a delivering recipe, so "c" 
isn't needed.

I seeded bogofilter just like you did. I use maildirs for my email so 
every message is in a separate file, so I built a big list of every 
message less than a year old, divided them into spam & non-spam, and 
piped each set into bogofilter.

Incoming mail is piped through this set of rules:

        :0 fw
        | /usr/bin/bogofilter -u -2 -p -e

        # Spam? Save it in the spam folder
        :0
        * ^X-Bogosity: (yes|spam)
        $SPAM

It's a good idea to collect your spam rather than deleting it. You might 
want to delete your wordlist one day and build a new one; you'll need a 
collection of current spam to do that. More important, any time 
bogofilter makes a mistake you need to correct it, whether it was a false 
positive or false negative. I can't remember the last time I found 
non-spam in my spam folder, but it does happen from time to time.

You'll need to find a method of feeding mail back into bogofilter that 
works for you. I copy the mail into a special mailbox that's swept by a 
cron job several times per day. These messages are fed back into procmail 
using a special set of rules:

# Messages labelled spam. Tell bogofilter it's not, and save to INBOX
:0
* ^X-Bogosity: (Spam|Yes)
{
        :0 c
        | /usr/bin/bogofilter -Sn

        :0
        $DEFAULT
}

# Messages not labelled spam.
:0 E
{
        :0 c
        * ^X-Bogosity: (ham|no)
        | /usr/bin/bogofilter -Ns

        :0
        $SPAM
}

Note I'm not using bogofiler as a filter this time. Without -p 
(passthrough mode) it won't output a new copy of the message with the 
corrected spam header.
-- 
"We actually do 100,000 pages or more a day in Bork"
    -- Marissa Mayer, Google
Kenneth Herron  Kherron@newsguy.com     916-366-7338