[Dovecot] Spam filtering (was: Re: Sieve mails with decoded subject)

11 Dec 2009


      Hi,
On Thu, 10 Dec 2009 20:28:27 +0100, Johannes Bauer wrote:
...
Eduardo M KALINOWSKI schrieb:
...
On Qui, 10 Dez 2009, Johannes Bauer wrote:
...
I'm thinking about filtering all such encoded subjects (as there's no
reason to encode them US-ASCII), but suppose it were UTF-8 or
something:
how can I filter on the actual content, not the encoded subject?
Surely
someone has solved that problem already?
Yes, such as the guys behind SpamAssassin, or dspam, or any of the many
spam filtering programs that exist. Actually, they make much more
complicated decisions instead of only looking for bad words in the
subject field. I'd suggest you try installing one of them.
I had SpamAssassin running once and was pretty disappointed. All those
complicated rules and scoring and "smart" bayesian filtering did not
work very well, although I taught it in around 50k mails right from
wrong. I had both lots of false-positives and lots of false-negatives,
which was kind of annoying.
However, analyzing 274 spam mails I deleted in the last 5 months I can
conclude that by using that extremely simple filter list I'd catch 258
of them (that's 94%). So I'd like to stick to KISS in this case.
That must have been a configuration issue - SpamAssassin works pretty
well, if configured correctly - but I admit, it's a monster (both in terms
of configuration and resource usage).
You could go for bogofilter (purely Bayesian). I'm using it for years on
my private mail server with very good results. I like to use the tri-state
filtering, where there is not only one threshold value, but two. A
certainty of a mail being spam ("bogosity") of 0.35 and below goes into my
inbox, mails with a bogosity value between 0.35 and 0.65 go into
Spam/Unsure, and everything above 0.65 goes directly into Spam. That way I
have something like 10-20 mails per week in Spam/Unsure that are usually
false negatives, rarely false positives (currently around 1000 mails per
week end up in Spam). To my knowledge there has never been a false positive
in Spam.
Of course initial training is necessary. For ongoing training / feedback I
have set up a Spam/Learn-Spam and Spam/Learn-Ham mailbox into which I move
false negatives/positives. A cron script then runs the mails found in those
(maildir) mailboxes through bogofilter again, with the command line option
for classifying the mail as Spam/Ham and moves them to the correct mailbox
(Spam/inbox) afterwards. This works well in all MUAs, because it only
requires IMAP functionality to train the filter.
The solution was inspired by a Gentoo Wiki article
(http://www.gentoo-wiki.info/Bogofilter).
Patrick.
--
STAR Software (Shanghai) Co., Ltd.            http://www.star-group.net/
Phone:    +86 (21) 3462 7688 x 826             Fax:   +86 (21) 3462 7779
PGP key E883A005 https://stshacom1.star-china.net/keys/patrick_nagel.asc
Fingerprint:           E09A D65E 855F B334 E5C3 5386 EF23 20FC E883 A005

[Dovecot] Spam filtering (was: Re: Sieve mails with decoded subject)

Patrick Nagel