[Dovecot] deploying dspam
Tom Allison
tallison at tacocat.net
Thu Dec 16 13:13:36 EET 2004
Mark E. Mallett wrote:
> On Thu, Dec 16, 2004 at 09:58:53AM +1100, Curtis Maloney wrote:
>
>>It never came across to me that you were wanting something specific with
>>dpsam... more that you wanted an explicit trigger for when a user decided
>>something was/wasn't SPAM. And I, personally, love the idea.
>
>
> I'd still like to see more general hooks on moving into and out of
> folders, or ways to "redeliver" email, or folders that could act as
> pipes, e.g. as mentioned in this thread:
>
> http://www.dovecot.org/list/dovecot/2003-July/001973.html
>
> mm
Here's how I use training with dovecot. It's hardly related to dovecot,
but we've strayed this far, I thought I would attempt something that
might become related again.
bogofilter does a test on email, without an database updates. This
keeps the database smaller and since it doesn't change I believe it's
cached.
bogofilter goes into three categories: (H)am, (U)nsure, (S)pam.
Ham is copied into a folder, "Ham" and delivered as usual.
Unsure is copied into a folder, "Unsure" and delivered as usual.
Spam is delivered into a folder, "Spam"
The rest is done through crontabs.
crontab: All email in Ham, Spam that is >4 days old is automatically
moved out of the IMAP system (mbox actually, but it's no longer IMAP
accessable).
the human: moves Ham/Spam/Unsure into seperate folders, NewHam, NewSpam
crontab: All email in NewHam, NewSpam is checked for learning. If the
bogofilter score (H/U/S) doesn't match the folder it's placed in it's
used for training. In other words if the score is Unsure or Ham and
it's in folder NewSpam then $score != $folder and it's used for retraining.
I like this method because the crontabs can be run at night when the
load is small.
If you trigger training based on a mail copy, what happens when someone
dumps 400 emails into a folder all at once? What happens when 30 people
do this all at the same time? It might not suit a smaller system at
peak hours to have this done.
I would prefer to impliment a system where you can queue up the training
in large numbers, but the actual training is done in a managed approach.
Over time, the actual amount of training that occurs on a daily basis
is on the order of <1 per week so it's not time critical that training
be done. At first, I ran it hourly. Now I run it at midnight only.
But on a large system, I would never deploy something without an initial
wordlist to provide some filtering which would also make hourly jobs
unneccessary.
So where does dovecot fall into all of this?
I don't know. I really can't make an arguement for doing anything to an
IMAP server that would help with any of this without also making for
potential problems. Dumping mail into pipes would lead to an
unrecoverable condition if there was a human error (wrong pipe).
Perhaps the only thing would be to ask if moving email through the file
system will really screw up the dovecot indexes. Sometimes dovecot
reports some pretty strange number of messages in these folders.
More information about the dovecot
mailing list