[vox-tech] Greylisting and LUGOD

Wed Sep 15 09:38:56 PDT 2004

On Tuesday 14 September 2004 11:38 pm, Karsten M. Self wrote:
> on Mon, Sep 13, 2004 at 09:39:15AM -0700, Rod Roark (rod at sunsetsystems.com) wrote:

[...]
> >     Mostly this is of interest to the
> >     officers, as the mailing lists already require
> >     registration in order to post; however spammers might
> >     easily forge the FROM header to abuse this.
> 
> Note that the greylisting is based on a tuple of which at least one
> element (immediate upstream IP) is difficult or impossible to reliably
> forge.

Not sure if we are on the same page here.  I was referring
to the fact that (not considering spam filtering) it's
trivial to post to one of the mailing lists by forging the
"from:" header.

> > (2) Mail from first-time posters, or from those who post
> >     less frequently than once per month, would likely be
> >     delayed by an hour or so.
> 
> Possibly.

Currently I'm experimenting with a 15-second period for
greylisting.  So far it appears that most MTA clients are
set to retry after either 1 minute or 1 hour.  The really
busy ones are quite unpredictable; worst case I've seen is
about 3 hours.

> > (3) This *might* allow me to eliminate the current blocking
> >     of mail from dynamic IPs.
> 
> ...iff (sic) the IP isn't a candidate for blocking under other criteria.

Of course.

> > Comments?
> 
> Sure.

[insightful but long analysis of aggregation snipped]

> Which suggests a very cheap mode of cutting into spam volumes markedly
> by employing ASNs, CIDRs, or similar IP aggregates (though I'm aware of
> none) in generating reputation data, and effecting firewalling,
> probabalistic rejection (you reject traffic from an ASN directly
> proportional to the probability it's spam), rate-limiting, etc.
> Backing off from a black-and-white allow/deny mode gives legit mail a
> fighting chance....

So this "probability" would necessarily only be part of a
SpamAssassin-style weighting system.  Most of us hate to
lose any legitimate mail at all, so rejecting all mail from
some IP block solely because, say, 75% of that block's mail
is spam, would be quite unacceptable.

> Which all sounds well and good.
> 
> The question, though, is how much spam are you getting?

It varies a *lot* from day to day.  Stats for yesterday:

  917 incoming messages
  706 of these blocked via DNSBLs and custom blacklists
   45 blocked by the newly-implemented greylisting (never re-sent)
   85 delayed by greylisting and later delivered
   81 delivered without delay

I have not inspected all of the delivered messages, as many
of them are not mine to view.  But based on my own portion
of these I estimate that about 5% are spam.  Without the
greylisting it would have been about 21% (and without any
filtering at all, 82%).

[...]
> On the other hand, content/context based filtering gets expensive both
> CPU and time-wise, particularly if you're making extensive use of DNSBLs
> (they're useful data sources, they're time-intensive).   It takes me
> 10-20 seconds to determine spam or ham on my own system, on a high-speed
> line, via Spamassassin.  I'm faster doing it manually, but I'm not going
> to sit in hour after hour, day in and day out.  So the machine does it.

Actually I find that use of DNSBLs is very fast, on the
order of a second or so per message.  This is probably
helped greatly by the fact that I run DNS on the same
machine as the mail server.

[...]
> You're going to
> need content filtering.

So far my above-mentioned results are without any content
filtering at all, other than some Postfix body checks to
catch common viruses and executable attachments.  Of course
that's just today; the future, as you note, will become
vastly more complex.

Content filtering is something I *really* want to avoid as
long as possible.

> Rod, does that answer your question ;-)

I forget... did I have a question?  ;-)

Thanks,

-- Rod