[vox-tech] The Great Spam Investigation

Mark K. Kim vox-tech@lists.lugod.org
Sun, 25 Apr 2004 21:13:17 -0700 (PDT)


Do you know how much the spams filtered out are false positives?  I
sometimes get curious about that.

BTW, if you reject all e-mails sent from you to you, do you not get copies
of e-mails you send to LUGOD mailing lists?  Did you count those e-mails
in your spam stats?

Bill, are you receiving this e-mail?!  Whitelist me!  =)

-Mark


On Sun, 25 Apr 2004, Peter Jay Salzman wrote:

> Introduction
> ============
>
> For years, I've used Exim 3, just because that's what got installed when I
> first installed Debian, back in the "slink and a half"/Potato days.
>
> A few days ago, I installed Postfix 2.0, mostly because every time Rod Roark
> posts an email to vox-tech about Postfix, I get jealous.  Initially, I was
> going to install Exim 4, but Exim 4's configuration looked very obfuscated.
> I might have stayed with Exim had there been an easy way to convert my Exim 3
> configuration to Exim 4 that didn't require reading.  But alas, that looked
> complicated too.  So I converted to Postfix.
>
> It has been a while since I really looked at all the spam coming into
> dirac.org, so to celebrate installing Postfix, I did a 24 hour test.  This is
> the result of that test.
>
> A bit about dirac.org.  It's a domain connected to the internet via DSL.
> There are only two users: me and my wife.  Being the founder and past
> president of The Linux User's Group of Davis, and having a strong presence on
> the web and USENET, I get a lot of spam.  I think it's downright cute when
> she complains about the 2 or 3 pieces she gets a day...  :)
>
> Mail at dirac.org comes from two paths:
>
> 	1. Directly to dirac.org
> 	2. From my school account at lifshitz.ucdavis.edu
>
> Mail at lifshitz passes through spamassassin.  If it gets marked as spam, it
> gets deleted offhand.  If not, it's forwarded to dirac.org.  Therefore, in
> following statistics, spam caught on lifshitz is not included.  My real email
> to spam ratio is actually lower than advertised.  Keep that in mind.
>
> Mail at dirac.org passes through my new Postfix spam controls.  Any email
> that remains gets delivered to procmail, which first sends it to Bogofilter, a
> spam filter based on Bayesian statistics.  Then it goes through a few
> procmail recipes that I wrote.  Then it gets delivered to my inbox.
>
> In what follows, keep in mind that I list the tests in order.  For
> example, the RBL bl.spamcop.net gets "first crack" at incoming mail, so
> it's bound to catch more spam than any other RBLs.  No doubt if
> cbl.abuseat.org got "first crack", that RBL would have the highest spam
> catching rate.  Keep that in mind when looking at the numbers.  Procmail
> and Bogofilter are both powerful, they just get the "leftovers" after
> Postfix does its stuff.
>
> Lastly, note that only 4 spams to dirac.org actually made it into my inbox
> within a 24 hour period.  I list their "spamicity", determined by Bogofilter.
> You might wonder how "viagra" or, more accurately, "v1agra" makes its way
> past Bogofilter.  I've been receiving spam that contains poetry from such
> giants as John Keats and even the lyrics to "Stairway From Heaven".  I am
> loathe to pass those spams on to Bogofilter.
>
> Enough yapping.  I have a lot of important work to get finished.  On to
> the interesting stuff...
>
>
>
>
>
> Raw Data
> ========
>
> I) SMTP Conversation Dropped Before Spam Gets Delivered
>
> 	A) HELO rejected
>
> 		1. Sender claimed he was "dirac.org" or "localhost":        51
> 		2. RBL: bl.spamcop.net:                                    179
> 		3. RBL: list.dsbl.org:                                      20
> 		4. RBL: relays.ordb.org:                                     0
> 		5. RBL: cbl.abuseat.org:                                     7
> 		6. RBL: sbl.spamhaus.org:                                    0
> 		7. RBL: opm.blitzed.org:                                     0
> 		4. RBL: dul.dnsbl.sorbs.net:                                 3
>
> 	B) MAIL FROM rejected
>
> 		1. Sender did not use fully qualified hostname:             65
> 		2. Sender did not use fully qualified address:               1
> 		3. Sender domain does not exist:                             7
>
> 	C) RCPT TO rejected
>
> 		1. Sender attempted to have spam relayed:                    1
> 		2. Attempt to deliver to unknown dirac.org account:          4
>
>
> II) SMTP Conversation Completed, But MTA Discards Spam Before
> 	 Delivery to MUA.
>
> 	A) Body rule /^TVqQAAMAAAAEAAAA\/\/8AALg/, which must be
> 		contained in every win32 program.  Nobody should be
> 		sending me win32 executables, so this must be a virus:       9
>
>
> III) Spam Delivered to MUA But Not Delivered To My Inbox
>
> 	A) Spam caught by Bogofilter:                                   7
> 	B) Spam caught by procmail rule
> 		* charset=.*(koi8|windows-125[01345678]|big-?5)              1
>
>
> IV) Non-UCE Delivered To My Inbox
>
> 	A) Real Email (slow email day!):                               19
> 	B) Bounces because of a virus forging its "From:" header
> 		to say it came from p@dirac.org:                             5
>
> V) UCE Delivered To My Inbox
>
> 	A) Spam delivered directly to dirac.org                         4
>
> 		1. spamicity: 0.519249, unknown language
> 		2. spamicity: 0.501567, unknown language
> 		3. spamicity: 0.919377, viagra
> 		4. spamicity: 0.500561, VCD's, unknown language
>
> 	B) UCE delivered from psalzman@lifshitz.ucdavis.edu             3
>
>
>
>
> Results
> =======
>
> Spams will include bounce messages due to viruses forging their headers to
> make it look like their from dirac.org, as well as the uhhh.... "helpful"
> messages I get from hosts that tell me that "my" email was not delivered
> because it contained a virus.  I consider the idiotic administrators of these
> systems to be another source of unwanted email, and therefore, not much
> different from UCE.  Honestly, this is a DOS waiting to happen.  Sheesh.
>
>
> Total emails sent to dirac.org:               386
>
> 	Total spams sent to dirac.org:             367
>
> 		Total spams caught                      355
>
> 			Total spam caught by Postfix:        347
> 				Total spam caught by RBL:         209
> 			Total spam caught by Bogofilter:       7
> 			Total spam caught by procmail:         1
>
> 		Total spams uncaught                     12
>
> 	Total "real" email delivered:               19
>
>
>
>
> Email that is spam:                     95%
> Email that is not spam:                  5%
>
> Spam caught before delivered to MTA:    95%
> Spam caught before delivered to inbox:  97%
> Spam delivered to my inbox:              3%    <-- what I care about
>
> Spam caught by RBLs:                    57%    <-- nice!
> Spam claiming it came from "me":        15%
> Spam with improper SMTP envelope:       18%
> Spam giving non-existant domain
> 	in SMTP envelope:                     2%    <-- dumbest of the dumb
>
>
>
> Conclusions
> ===========
>
> First, I knew that I had a high spam to email ratio, but I was shocked
> to see that my spam to ham ratio was 20 to 1.
>
> Second, I'm quite pleased with the results.  Postfix along with RBLs
> shot down most of the crud.  Only a very small trickle passed through.
> I'm convinced more than ever that Postfix + RBL is the way to go for
> spam control.  This is more preferable than relying on spam assassin,
> bogofilter and procmail as a first line of defense, since they sap up
> more system resources.
>
> As a last note, I'm nearly certain that if I had spam assassin installed on
> dirac.org, my total spam delivered count would've been truly, truly zero.
>
>
>
> Thanks
> ======
>
> First, thanks to the authors of all the open source software that enables me
> to protect my inbox and valuable time.  You guys rock.  No, seriously, you
> guys are really awesome.  Thank you.
>
> I'd like to thank Rod Roark for getting me to use Postfix in the first
> place.  Henry House introduced me to Bogofilter.  Mike Egan and Henry
> House introduced me to Procmail, oh so long ago.
> _______________________________________________
> vox-tech mailing list
> vox-tech@lists.lugod.org
> http://lists.lugod.org/mailman/listinfo/vox-tech
>

-- 
Mark K. Kim
AIM: markus kimius
Homepage: http://www.cbreak.org/
Xanga: http://www.xanga.com/vindaci
Friendster: http://www.friendster.com/user.jsp?id=13046
PGP key fingerprint: 7324 BACA 53AD E504 A76E  5167 6822 94F0 F298 5DCE
PGP key available on the homepage