[vox-tech] Greylisting and LUGOD

Karsten M. Self kmself at ix.netcom.com
Sun Sep 19 19:37:39 PDT 2004


on Wed, Sep 15, 2004 at 09:38:56AM -0700, Rod Roark (rod at sunsetsystems.com) wrote:
> On Tuesday 14 September 2004 11:38 pm, Karsten M. Self wrote:
> > on Mon, Sep 13, 2004 at 09:39:15AM -0700, Rod Roark (rod at sunsetsystems.com) wrote:
> 
> [...]
> > >     Mostly this is of interest to the
> > >     officers, as the mailing lists already require
> > >     registration in order to post; however spammers might
> > >     easily forge the FROM header to abuse this.
> > 
> > Note that the greylisting is based on a tuple of which at least one
> > element (immediate upstream IP) is difficult or impossible to reliably
> > forge.
> 
> Not sure if we are on the same page here.  I was referring to the fact
> that (not considering spam filtering) it's trivial to post to one of
> the mailing lists by forging the "from:" header.

Sure.

Which is why a split content/context filter's more reliable.

I've received a number of spams (or virms) in the past couple of months
from known, whitelisted addresses.  I'm pretty sure, say, that Don Marti
hasn't taken up spamming and isn't running an MS OS.
 
> > > (2) Mail from first-time posters, or from those who post
> > >     less frequently than once per month, would likely be
> > >     delayed by an hour or so.
> > 
> > Possibly.
> 
> Currently I'm experimenting with a 15-second period for
> greylisting.  So far it appears that most MTA clients are
> set to retry after either 1 minute or 1 hour.  The really
> busy ones are quite unpredictable; worst case I've seen is
> about 3 hours.

Check your retry interval in your MTA.  Exim, if typical, uses the
following:

    # This single retry rule applies to all domains and all errors. It
    # specifies retries every 15 minutes for 2 hours, then increasing
    # retry intervals, starting at 2 hours and increasing each time by a
    # factor of 1.5, up to 16 hours, then retries every 8 hours until 4
    # days have passed since the first

...so 15 _minutes_ might be a better value.  I haven't empirically
tested this, however.
 
> [insightful but long analysis of aggregation snipped]
> 
> > Which suggests a very cheap mode of cutting into spam volumes markedly
> > by employing ASNs, CIDRs, or similar IP aggregates (though I'm aware of
> > none) in generating reputation data, and effecting firewalling,
> > probabalistic rejection (you reject traffic from an ASN directly
> > proportional to the probability it's spam), rate-limiting, etc.
> > Backing off from a black-and-white allow/deny mode gives legit mail a
> > fighting chance....
> 
> So this "probability" would necessarily only be part of a
> SpamAssassin-style weighting system.  Most of us hate to lose any
> legitimate mail at all, so rejecting all mail from some IP block
> solely because, say, 75% of that block's mail is spam, would be quite
> unacceptable.

It's data.  How you use it is up to you.

Point being that for, say, Kornet, the Bayes probability associated with
it was IIRC ~98%+ (and most of the non-spam was likely admin bounce
messages from attempts to deliver to abuse/reporting addresses).  For
_many_ of the high-spam originating ASNs / CIDRs, you'll find similar
stats, and if I understand SA's Bayesian rules database correctly, the
data should be available to you.  I'm having a little trouble with this
at present.  But 'sa-learn --dump <option>' should give you the current
tokenset.

Other alternatives are to block _all_ mail from some points of origin,
as I'd recommend doing for the top spam sources.  They simply are so
badly managed, or so overtly and intentionally promoting spammers, that
they have no business serving legitimate traffic.  The Internet Death
Penalty has been applied in the past, it's a harsh and brute tool.  It's
also very highly effective.

Or you could use an in-between option.  Explicitly whitelist known good
point sources, throttle or rate-limit other known addresses.

While the "don't lose a single good email" mantra is popular, it's
unrealistic.  Example:  I've recently recovered from a basically
unmediated mail experience:  every single email received was being
dumped into a single folder (result of systems issues and not having any
mail, let along spam, filtering in place).

Over the course of some six weeks, over 28k mails piled up.  Think about
that.

On recovering my systems, I ran the 28k+ mails through procmail for
filtering, spam assessment, etc.  I run some intensive checks and
numerous remote lookups, resulting in a rather slow process chain.
Took over six days for that to complete (my daily mail processing limit
would appear to be about 2-4k mails daily).

I found myself responding to several messages, including from known
addresses, which had been sent during that interval, many several weeks
old.   The senders of these mails had no idea if the mail was lost, in
transit, ignored, or what.  This is what's known in the biz as "silent
failure mode".  A Very Bad Thing[tm].

Even where it's annoying as all hell, explicit IP (or CIDR or ASN)
rejection serves two useful functions:

  - It's immediate.

  - For well-constructed, standards-based mail clients, it results in a
    well-defined error message.  While the basis for rejection may not
    be appropriate, it's very clear that mail was, in fact, rejected.
    This allows the sender to, in a timely fashion, attempt some other
    means of contacting you.

A partially-effective, but explicit, system is better than none at all,
or one which is partially effective but has soft failure modes (e.g.:
challenge-response).


 
> > Which all sounds well and good.
> > 
> > The question, though, is how much spam are you getting?
> 
> It varies a *lot* from day to day.  Stats for yesterday:
> 
>   917 incoming messages
>   706 of these blocked via DNSBLs and custom blacklists
>    45 blocked by the newly-implemented greylisting (never re-sent)
>    85 delayed by greylisting and later delivered
>    81 delivered without delay

 
> I have not inspected all of the delivered messages, as many
> of them are not mine to view.  But based on my own portion
> of these I estimate that about 5% are spam.  Without the
> greylisting it would have been about 21% (and without any
> filtering at all, 82%).

Sounds like 81.8% spam, 18.1% ham, filtered, with a 5% false-negative
rate on spam filtering.  Since you're already using DNSBLs pretty
extensively, I suspect we're largely in violent agreement here.

 
> [...]
> > On the other hand, content/context based filtering gets expensive both
> > CPU and time-wise, particularly if you're making extensive use of DNSBLs
> > (they're useful data sources, they're time-intensive).   It takes me
> > 10-20 seconds to determine spam or ham on my own system, on a high-speed
> > line, via Spamassassin.  I'm faster doing it manually, but I'm not going
> > to sit in hour after hour, day in and day out.  So the machine does it.
> 
> Actually I find that use of DNSBLs is very fast, on the order of a
> second or so per message.  This is probably helped greatly by the fact
> that I run DNS on the same machine as the mail server.

Interesting.  I've got caching DNS here, but get about 6-15 seconds per
message in spamassassin.  Could be that the large volume of rejected
mail you've got up-front would take longer to run through DNSBLs in SA.


Peace.

-- 
Karsten M. Self <kmself at ix.netcom.com>        http://kmself.home.netcom.com/
 What Part of "Gestalt" don't you understand?
    gconf-editor:  reimplementation of the MS Windows Registry for
    GNU/Linux, with the concommitant problems of undocumented settings,
    cryptic keys, inability to comment settings, and use of a single,
    specialized application to access the configuration settings.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://ns1.livepenguin.com/pipermail/vox-tech/attachments/20040919/e8e1d25c/attachment.bin


More information about the vox-tech mailing list