[vox-tech] spam to defeat bayesian filtering?

Peter Jay Salzman vox-tech@lists.lugod.org
Thu, 18 Dec 2003 08:48:06 -0800


hi all,

between bl.spamcop, ORDB and bogofilter, the only spam i'm getting these
days are pieces like this.  it appears to be an attempt to pollute
bayesian spam filters like bogofilter and spam assassin.

i don't do automatic training anymore (convention wisdom says to train
manually so your database doesn't drift due to false positives and false
negatives).  i have NOT been training bogofilter on spams like this.
i've mostly been forwarding them to spamcop and for particularly
egregious spamhauses, dropping their IP blocks into hosts.deny (i've
wrapped exim with tcpd).

but honestly, i haven't given much thought to this.

has anybody thought about these types of spams in relation to bayesian
filters?  or perhaps read an article written by someone who's given the
matter some thought?

should we train on these types of emails or not?

and if not, are there ways to combat this type of spam besides spam
assassin's lexical parsing?

one thing i've noticed about these types of spam.  they don't have
sentences.  no punctuation, no capitalization (oops!), and no sense of
grammar.  i'm wondering if the next tool to combat spam will look
something like the Z-interpreter used by the old-style infocom text
adventures.  ;-)

thanks,
pete



----- Forwarded message from Winnie  <qaxdyrr@tom.com> -----

Return-path: qaxdyrr@tom.com
Envelope-to: p@dirac.org
Delivery-date: Thu, 18 Dec 2003 08:30:50 -0800
Received: from 213.213.239.124.brutele.be ([213.213.239.124] ident=xitjpqrvop)
	by gabriel.localdomain with smtp (Exim 3.36 #1 (Debian))
	id 1AX131-0007di-00
	for <p@dirac.org>; Thu, 18 Dec 2003 08:30:50 -0800
Received: from [213.213.239.124] by 101.30.124.94 with HTTP;
	Wed, 17 Dec 2003 22:27:50 -0600
From: Winnie  <qaxdyrr@tom.com>
To: p@dirac.org
Subject: Re: HPRDD, the master threw
Mime-Version: 1.0
X-Mailer: mPOP Web-Mail 2.19
X-Originating-IP: [80.28.112.19]
Date: Thu, 18 Dec 2003 09:31:50 +0500
Reply-To: Winnie  <qaxdyrr@tom.com>
Content-Type: multipart/alternative;
	boundary="--ALT--PIML01481750780296"
Message-Id: <KRONGLR-0009246905811@response>
X-Bogosity: No, tests=bogofilter, spamicity=0.609282, version=0.15.8
   int  cnt   prob  spamicity histogram
  0.00    8 0.031480 0.009831 ########
  0.10    1 0.176640 0.016506 #
  0.20    1 0.253007 0.026157 #
  0.30    6 0.373008 0.102732 ######
  0.40    0 0.000000 0.102732 
  0.50    0 0.000000 0.102732 
  0.60    2 0.612572 0.145412 ##
  0.70    4 0.737999 0.239923 ####
  0.80    3 0.852553 0.311769 ###
  0.90   21 0.987941 0.583047 #####################

   Free Cable# TV

   [IMG] delphi apex turtleback tartary carmichael isotopic bonito house atop
   deducible radiography
   cultivable barricade epiphysis set hettie consent patrician baritone meyer
   nashua browse indecipherable steelmake caputo learn trimer wu tambourine
   beef borate marina benthic allah bass descent enid avery airpark tithe
   union million federal genteel cabaret

----- End forwarded message -----

-- 
Make everything as simple as possible, but no simpler.  -- Albert Einstein
GPG Instructions: http://www.dirac.org/linux/gpg
GPG Fingerprint: B9F1 6CF3 47C4 7CD8 D33E 70A9 A3B9 1945 67EA 951D