Re: [Dovecot] Spam filtering (was: Re: Sieve mails with decoded subject)

11 Dec 2009


      On 12/10/2009 2:28 PM, Johannes Bauer wrote:
...
Eduardo M KALINOWSKI schrieb:
...
On Qui, 10 Dez 2009, Johannes Bauer wrote:
...
I'm thinking about filtering all such encoded subjects (as there's no
reason to encode them US-ASCII), but suppose it were UTF-8 or something:
how can I filter on the actual content, not the encoded subject? Surely
someone has solved that problem already?
Yes, such as the guys behind SpamAssassin, or dspam, or any of the many
spam filtering programs that exist. Actually, they make much more
complicated decisions instead of only looking for bad words in the
subject field. I'd suggest you try installing one of them.
I had SpamAssassin running once and was pretty disappointed. All those
complicated rules and scoring and "smart" bayesian filtering did not
work very well, although I taught it in around 50k mails right from
wrong. I had both lots of false-positives and lots of false-negatives,
which was kind of annoying.
However, analyzing 274 spam mails I deleted in the last 5 months I can
conclude that by using that extremely simple filter list I'd catch 258
of them (that's 94%). So I'd like to stick to KISS in this case.
From what I've seen, SA has been extremely good and accurate for us.
We use amavisd-new to interface, but SA is at the end of a long chain of
checks.
Between the (3) HELO checks, clamav-milter, and a SPF policy daemon,
we're killing ~60% of all connections at SMTP time.  (I analyzed that in
November, instead of 65/day hitting my inbox I would've seen 6x that
amount if it wasn't for those checks.  So ~80% of all spam was getting
blocked at SMTP time.)  If we were to pay for the Spamhaus Zen list, we
could probably boost that percentage to 90%.
All of the domains we do business with get a -2 or -4 score using
amavisd-new.  Specific addresses get a larger negative score.  I ran a
few thousand spam & ham messages at the SA bayes filter, then turned it
on.  We tag messages with a [spam] flag at 5.0 and quarantine at 9.0.
Tagged messages go to the user's Inbox, quarantined messages get sieve'd
into a sub-folder in the user's mailbox.
So far (in a month), no false positives.  Or at least none that people
have complained were quarantined when they should not have been.  I'm
considering lowering the quarantine threshold next month.
It's been nice to have my Inbox back, without 65 spams/day cluttering it
up.  Now I might see 2-5 per day that slip through without getting
tagged as borderline spam (at 5.0 or higher).  Those are mostly zero-day
spam that haven't made it to the URIBLs or DNSBLs yet.
I'm still debating grey-listing, Razor, DCC or paying for the Spamhaus
Zen list.
Compared to another, commercial, product that we were using a few years
ago, SA is very very good.  Not perfect, but really does a good job of
classifying things with decent accuracy.

Re: [Dovecot] Spam filtering (was: Re: Sieve mails with decoded subject)

Thomas Harold