Ben Johnson mailto:ben@indietorrent.org August 11, 2014 at 5:52 PM On 8/11/2014 11:42 AM, Jeff Rice wrote:
I'm trying to work out a way to have my Sieve filter save a "pristine" version of email messages as a backup, primarily to use for training the spam filter. I would like is to have every message saved into a single, site-wide directory (in the global sieve) before being processed additionally and delivered. The messages in that directory will be used to train the spam filter without having to worry about removing Spamassassin headers and so forth.
Provided I understand you correctly, my first thought is that saving a duplicate copy of every single message that arrives on this system seems wasteful.
A bit wasteful, but disk space is cheap and it's a limited, rolling backup. The value of retraining goes down significantly as time passes, so I'm not planning on keeping messages there for an extended period of time.
Cron will clean out older messages after a set period of time.
I'm thinking of using the extprograms plugin to pipe to a program that will do a simple copy. That feels very hackish, however, and I'm hoping there is a more elegant solution.
There is; the Dovecot Antispam plug-in. It does exactly what you describe, and it addresses the problem of storing a duplicate copy of all messages.
In short, when a user drags a message from any folder to "Junk", you'll receive a "pristine" copy of the message at any local address you specify, delivered to any folder you specify (e.g., "Train as SPAM") within that "training user's" mailbox.
Hmm. Perhaps I'm just dense, but I don't see this behavior documented in the Antispam plugin docs. I'm happy to be corrected if I've misunderstood. I'd rather use an existing tool if possible.
What I can see that Antispam will train on the version of the message the user drags into the "Junk" folder. But that message may have had headers added by a sieve filter or Spamassassin, for example. By "pristine", I mean "as received" by the LDA.
CRM114's "reaver_cache" is along the lines of what I'm thinking of.
Jeff
Jeff Rice mailto:list1@jrice.me August 11, 2014 at 11:42 AM Hello, I'm trying to work out a way to have my Sieve filter save a "pristine" version of email messages as a backup, primarily to use for training the spam filter. I would like is to have every message saved into a single, site-wide directory (in the global sieve) before being processed additionally and delivered. The messages in that directory will be used to train the spam filter without having to worry about removing Spamassassin headers and so forth.
I thought fileinto :copy might do what I wanted, but this creates a backup directory individually for each user. That's unmanageable for the spam training process I use. redirect *could* work, but that adds a header during the process so the email saved would not be "pristine".
I'm thinking of using the extprograms plugin to pipe to a program that will do a simple copy. That feels very hackish, however, and I'm hoping there is a more elegant solution.
Am I missing something obvious here?
Thanks! Jeff