[Dovecot] Using MySQL to store email?
Eric S. Johansson
esj at harvee.org
Wed Jun 7 23:58:44 EEST 2006
Jan Kundrát wrote:
> Marc Perkel wrote:
>> For example, a new message comes in and you find that sender matches
>> email in 100 people's spam folders and none in any other folder? It can
>> be classified as spam. If however the from address matches ham in people
>> folder and no spam then you can probably deliver it without spam scanning.
>
> It's called auto-whitelisting and smart spam scanners should do that.
>
actually, auto white listing is any one of a number of techniques used
to eliminate false positives from "known parties". I use one in camram
where anyone you send e-mail to is automatically white listed. To
distinguish that from the often confusing auto white listing
terminology, I call it "friends list". It works exceedingly well and
haven't had any significant problems even when the site has been
infected with zombies. With any automatic white listing tool, you need
the human feedback which says "this is spam". The human feedback
enables automatic elimination of the entry from the auto white list, and
blacklisting the IP address the message came from (you did preserve the
source IP address as a new header in the message, didn't you?).
The analysis techniques suggested originally is classically naïve. A
technique I'm playing with that appears to work much better is to use
the output of the content filter to predict whether a message is good or
bad. all of the bad messages are placed into a dumpster and expired
after five days. If a message is left in the dumpster, the IP address
is listed as a "bad source".
Any messages that passes the content filter, friends filter, or spam
filter is recorded as "good source". If the ratio of good source to bad
source drops below 80%, the site is listed as contaminated and
automatically dumped in the spam trap for human analysis. If the ratio
drops below 40%, it's listed as spam and all messages are brown listed.
the main downside of this technique is that it does increase the
workload for the user (more content in the spam trap) and it does seem
to work better if you have multiple sources for feeding the good/bad
ratio analysis
my two cents worth.
More information about the dovecot
mailing list