Mark E. Mallett wrote:
On Thu, Dec 16, 2004 at 09:58:53AM +1100, Curtis Maloney wrote:
It never came across to me that you were wanting something specific with dpsam... more that you wanted an explicit trigger for when a user decided something was/wasn't SPAM. And I, personally, love the idea.
I'd still like to see more general hooks on moving into and out of folders, or ways to "redeliver" email, or folders that could act as pipes, e.g. as mentioned in this thread:
http://www.dovecot.org/list/dovecot/2003-July/001973.html
mm
Here's how I use training with dovecot. It's hardly related to dovecot, but we've strayed this far, I thought I would attempt something that might become related again.
bogofilter does a test on email, without an database updates. This keeps the database smaller and since it doesn't change I believe it's cached.
bogofilter goes into three categories: (H)am, (U)nsure, (S)pam.
Ham is copied into a folder, "Ham" and delivered as usual. Unsure is copied into a folder, "Unsure" and delivered as usual. Spam is delivered into a folder, "Spam"
The rest is done through crontabs.
crontab: All email in Ham, Spam that is >4 days old is automatically moved out of the IMAP system (mbox actually, but it's no longer IMAP accessable).
the human: moves Ham/Spam/Unsure into seperate folders, NewHam, NewSpam
crontab: All email in NewHam, NewSpam is checked for learning. If the bogofilter score (H/U/S) doesn't match the folder it's placed in it's used for training. In other words if the score is Unsure or Ham and it's in folder NewSpam then $score != $folder and it's used for retraining.
I like this method because the crontabs can be run at night when the load is small.
If you trigger training based on a mail copy, what happens when someone dumps 400 emails into a folder all at once? What happens when 30 people do this all at the same time? It might not suit a smaller system at peak hours to have this done.
I would prefer to impliment a system where you can queue up the training in large numbers, but the actual training is done in a managed approach. Over time, the actual amount of training that occurs on a daily basis is on the order of <1 per week so it's not time critical that training be done. At first, I ran it hourly. Now I run it at midnight only. But on a large system, I would never deploy something without an initial wordlist to provide some filtering which would also make hourly jobs unneccessary.
So where does dovecot fall into all of this?
I don't know. I really can't make an arguement for doing anything to an IMAP server that would help with any of this without also making for potential problems. Dumping mail into pipes would lead to an unrecoverable condition if there was a human error (wrong pipe).
Perhaps the only thing would be to ask if moving email through the file system will really screw up the dovecot indexes. Sometimes dovecot reports some pretty strange number of messages in these folders.