[Dovecot] Spam filtering (was: Re: Sieve mails with decoded subject)

Patrick Nagel patrick.nagel at star-group.net
Fri Dec 11 06:51:19 EET 2009


Hi,

On Thu, 10 Dec 2009 20:28:27 +0100, Johannes Bauer wrote:
> Eduardo M KALINOWSKI schrieb:
>> On Qui, 10 Dez 2009, Johannes Bauer wrote:
>>> I'm thinking about filtering all such encoded subjects (as there's no
>>> reason to encode them US-ASCII), but suppose it were UTF-8 or
something:
>>> how can I filter on the actual content, not the encoded subject?
Surely
>>> someone has solved that problem already?
>> 
>> Yes, such as the guys behind SpamAssassin, or dspam, or any of the many
>> spam filtering programs that exist. Actually, they make much more
>> complicated decisions instead of only looking for bad words in the
>> subject field. I'd suggest you try installing one of them.
> 
> I had SpamAssassin running once and was pretty disappointed. All those
> complicated rules and scoring and "smart" bayesian filtering did not
> work very well, although I taught it in around 50k mails right from
> wrong. I had both lots of false-positives and lots of false-negatives,
> which was kind of annoying.
> 
> However, analyzing 274 spam mails I deleted in the last 5 months I can
> conclude that by using that extremely simple filter list I'd catch 258
> of them (that's 94%). So I'd like to stick to KISS in this case.

That must have been a configuration issue - SpamAssassin works pretty
well, if configured correctly - but I admit, it's a monster (both in terms
of configuration and resource usage).

You could go for bogofilter (purely Bayesian). I'm using it for years on
my private mail server with very good results. I like to use the tri-state
filtering, where there is not only one threshold value, but two. A
certainty of a mail being spam ("bogosity") of 0.35 and below goes into my
inbox, mails with a bogosity value between 0.35 and 0.65 go into
Spam/Unsure, and everything above 0.65 goes directly into Spam. That way I
have something like 10-20 mails per week in Spam/Unsure that are usually
false negatives, rarely false positives (currently around 1000 mails per
week end up in Spam). To my knowledge there has never been a false positive
in Spam.

Of course initial training is necessary. For ongoing training / feedback I
have set up a Spam/Learn-Spam and Spam/Learn-Ham mailbox into which I move
false negatives/positives. A cron script then runs the mails found in those
(maildir) mailboxes through bogofilter again, with the command line option
for classifying the mail as Spam/Ham and moves them to the correct mailbox
(Spam/inbox) afterwards. This works well in all MUAs, because it only
requires IMAP functionality to train the filter.

The solution was inspired by a Gentoo Wiki article
(http://www.gentoo-wiki.info/Bogofilter).

Patrick.

-- 
STAR Software (Shanghai) Co., Ltd.            http://www.star-group.net/
Phone:    +86 (21) 3462 7688 x 826             Fax:   +86 (21) 3462 7779

PGP key E883A005 https://stshacom1.star-china.net/keys/patrick_nagel.asc
Fingerprint:           E09A D65E 855F B334 E5C3 5386 EF23 20FC E883 A005


More information about the dovecot mailing list