On 8/11/2014 11:42 AM, Jeff Rice wrote:
Hello, I'm trying to work out a way to have my Sieve filter save a "pristine" version of email messages as a backup, primarily to use for training the spam filter. I would like is to have every message saved into a single, site-wide directory (in the global sieve) before being processed additionally and delivered. The messages in that directory will be used to train the spam filter without having to worry about removing Spamassassin headers and so forth.
Provided I understand you correctly, my first thought is that saving a duplicate copy of every single message that arrives on this system seems wasteful.
Why not save only the messages that would actually be useful for spam training purposes?
I thought fileinto :copy might do what I wanted, but this creates a backup directory individually for each user. That's unmanageable for the spam training process I use. redirect *could* work, but that adds a header during the process so the email saved would not be "pristine".
I'm thinking of using the extprograms plugin to pipe to a program that will do a simple copy. That feels very hackish, however, and I'm hoping there is a more elegant solution.
There is; the Dovecot Antispam plug-in. It does exactly what you describe, and it addresses the problem of storing a duplicate copy of all messages.
In short, when a user drags a message from any folder to "Junk", you'll receive a "pristine" copy of the message at any local address you specify, delivered to any folder you specify (e.g., "Train as SPAM") within that "training user's" mailbox.
Conversely, when a user drags a message from "Junk" to any other folder, you'll receive a copy of the message in your "Train as HAM" folder.
Then, you can point your anti-spam solution's training executable to these two "pristine master corpus" folders.
If you ever need to reclassify messages, or expunge them, doing so is trivial with this master corpus approach.
Am I missing something obvious here?
Thanks! Jeff
Happy to provide a sample script for the antispam plugin's mailtrain back-end, as that's the one I use.
Cheers,
-Ben