[vox-tech] The Great Spam Investigation
Mark K. Kim
vox-tech@lists.lugod.org
Sun, 25 Apr 2004 21:13:17 -0700 (PDT)
Do you know how much the spams filtered out are false positives? I
sometimes get curious about that.
BTW, if you reject all e-mails sent from you to you, do you not get copies
of e-mails you send to LUGOD mailing lists? Did you count those e-mails
in your spam stats?
Bill, are you receiving this e-mail?! Whitelist me! =)
-Mark
On Sun, 25 Apr 2004, Peter Jay Salzman wrote:
> Introduction
> ============
>
> For years, I've used Exim 3, just because that's what got installed when I
> first installed Debian, back in the "slink and a half"/Potato days.
>
> A few days ago, I installed Postfix 2.0, mostly because every time Rod Roark
> posts an email to vox-tech about Postfix, I get jealous. Initially, I was
> going to install Exim 4, but Exim 4's configuration looked very obfuscated.
> I might have stayed with Exim had there been an easy way to convert my Exim 3
> configuration to Exim 4 that didn't require reading. But alas, that looked
> complicated too. So I converted to Postfix.
>
> It has been a while since I really looked at all the spam coming into
> dirac.org, so to celebrate installing Postfix, I did a 24 hour test. This is
> the result of that test.
>
> A bit about dirac.org. It's a domain connected to the internet via DSL.
> There are only two users: me and my wife. Being the founder and past
> president of The Linux User's Group of Davis, and having a strong presence on
> the web and USENET, I get a lot of spam. I think it's downright cute when
> she complains about the 2 or 3 pieces she gets a day... :)
>
> Mail at dirac.org comes from two paths:
>
> 1. Directly to dirac.org
> 2. From my school account at lifshitz.ucdavis.edu
>
> Mail at lifshitz passes through spamassassin. If it gets marked as spam, it
> gets deleted offhand. If not, it's forwarded to dirac.org. Therefore, in
> following statistics, spam caught on lifshitz is not included. My real email
> to spam ratio is actually lower than advertised. Keep that in mind.
>
> Mail at dirac.org passes through my new Postfix spam controls. Any email
> that remains gets delivered to procmail, which first sends it to Bogofilter, a
> spam filter based on Bayesian statistics. Then it goes through a few
> procmail recipes that I wrote. Then it gets delivered to my inbox.
>
> In what follows, keep in mind that I list the tests in order. For
> example, the RBL bl.spamcop.net gets "first crack" at incoming mail, so
> it's bound to catch more spam than any other RBLs. No doubt if
> cbl.abuseat.org got "first crack", that RBL would have the highest spam
> catching rate. Keep that in mind when looking at the numbers. Procmail
> and Bogofilter are both powerful, they just get the "leftovers" after
> Postfix does its stuff.
>
> Lastly, note that only 4 spams to dirac.org actually made it into my inbox
> within a 24 hour period. I list their "spamicity", determined by Bogofilter.
> You might wonder how "viagra" or, more accurately, "v1agra" makes its way
> past Bogofilter. I've been receiving spam that contains poetry from such
> giants as John Keats and even the lyrics to "Stairway From Heaven". I am
> loathe to pass those spams on to Bogofilter.
>
> Enough yapping. I have a lot of important work to get finished. On to
> the interesting stuff...
>
>
>
>
>
> Raw Data
> ========
>
> I) SMTP Conversation Dropped Before Spam Gets Delivered
>
> A) HELO rejected
>
> 1. Sender claimed he was "dirac.org" or "localhost": 51
> 2. RBL: bl.spamcop.net: 179
> 3. RBL: list.dsbl.org: 20
> 4. RBL: relays.ordb.org: 0
> 5. RBL: cbl.abuseat.org: 7
> 6. RBL: sbl.spamhaus.org: 0
> 7. RBL: opm.blitzed.org: 0
> 4. RBL: dul.dnsbl.sorbs.net: 3
>
> B) MAIL FROM rejected
>
> 1. Sender did not use fully qualified hostname: 65
> 2. Sender did not use fully qualified address: 1
> 3. Sender domain does not exist: 7
>
> C) RCPT TO rejected
>
> 1. Sender attempted to have spam relayed: 1
> 2. Attempt to deliver to unknown dirac.org account: 4
>
>
> II) SMTP Conversation Completed, But MTA Discards Spam Before
> Delivery to MUA.
>
> A) Body rule /^TVqQAAMAAAAEAAAA\/\/8AALg/, which must be
> contained in every win32 program. Nobody should be
> sending me win32 executables, so this must be a virus: 9
>
>
> III) Spam Delivered to MUA But Not Delivered To My Inbox
>
> A) Spam caught by Bogofilter: 7
> B) Spam caught by procmail rule
> * charset=.*(koi8|windows-125[01345678]|big-?5) 1
>
>
> IV) Non-UCE Delivered To My Inbox
>
> A) Real Email (slow email day!): 19
> B) Bounces because of a virus forging its "From:" header
> to say it came from p@dirac.org: 5
>
> V) UCE Delivered To My Inbox
>
> A) Spam delivered directly to dirac.org 4
>
> 1. spamicity: 0.519249, unknown language
> 2. spamicity: 0.501567, unknown language
> 3. spamicity: 0.919377, viagra
> 4. spamicity: 0.500561, VCD's, unknown language
>
> B) UCE delivered from psalzman@lifshitz.ucdavis.edu 3
>
>
>
>
> Results
> =======
>
> Spams will include bounce messages due to viruses forging their headers to
> make it look like their from dirac.org, as well as the uhhh.... "helpful"
> messages I get from hosts that tell me that "my" email was not delivered
> because it contained a virus. I consider the idiotic administrators of these
> systems to be another source of unwanted email, and therefore, not much
> different from UCE. Honestly, this is a DOS waiting to happen. Sheesh.
>
>
> Total emails sent to dirac.org: 386
>
> Total spams sent to dirac.org: 367
>
> Total spams caught 355
>
> Total spam caught by Postfix: 347
> Total spam caught by RBL: 209
> Total spam caught by Bogofilter: 7
> Total spam caught by procmail: 1
>
> Total spams uncaught 12
>
> Total "real" email delivered: 19
>
>
>
>
> Email that is spam: 95%
> Email that is not spam: 5%
>
> Spam caught before delivered to MTA: 95%
> Spam caught before delivered to inbox: 97%
> Spam delivered to my inbox: 3% <-- what I care about
>
> Spam caught by RBLs: 57% <-- nice!
> Spam claiming it came from "me": 15%
> Spam with improper SMTP envelope: 18%
> Spam giving non-existant domain
> in SMTP envelope: 2% <-- dumbest of the dumb
>
>
>
> Conclusions
> ===========
>
> First, I knew that I had a high spam to email ratio, but I was shocked
> to see that my spam to ham ratio was 20 to 1.
>
> Second, I'm quite pleased with the results. Postfix along with RBLs
> shot down most of the crud. Only a very small trickle passed through.
> I'm convinced more than ever that Postfix + RBL is the way to go for
> spam control. This is more preferable than relying on spam assassin,
> bogofilter and procmail as a first line of defense, since they sap up
> more system resources.
>
> As a last note, I'm nearly certain that if I had spam assassin installed on
> dirac.org, my total spam delivered count would've been truly, truly zero.
>
>
>
> Thanks
> ======
>
> First, thanks to the authors of all the open source software that enables me
> to protect my inbox and valuable time. You guys rock. No, seriously, you
> guys are really awesome. Thank you.
>
> I'd like to thank Rod Roark for getting me to use Postfix in the first
> place. Henry House introduced me to Bogofilter. Mike Egan and Henry
> House introduced me to Procmail, oh so long ago.
> _______________________________________________
> vox-tech mailing list
> vox-tech@lists.lugod.org
> http://lists.lugod.org/mailman/listinfo/vox-tech
>
--
Mark K. Kim
AIM: markus kimius
Homepage: http://www.cbreak.org/
Xanga: http://www.xanga.com/vindaci
Friendster: http://www.friendster.com/user.jsp?id=13046
PGP key fingerprint: 7324 BACA 53AD E504 A76E 5167 6822 94F0 F298 5DCE
PGP key available on the homepage