Sieve: Saving "pristine" messages for backups and spam training

Jeff Rice list1 at jrice.me
Mon Aug 11 23:17:09 UTC 2014



> Ben Johnson <mailto:ben at indietorrent.org>
> August 11, 2014 at 5:52 PM
> On 8/11/2014 11:42 AM, Jeff Rice wrote:
>> I'm trying to work out a way to have my Sieve filter save a "pristine"
>> version of email messages as a backup, primarily to use for training the
>> spam filter.  I would like is to have every message saved into a single,
>> site-wide directory (in the global sieve) before being processed
>> additionally and delivered.  The messages in that directory will be used
>> to train the spam filter without having to worry about removing
>> Spamassassin headers and so forth.
>
> Provided I understand you correctly, my first thought is that saving a
> duplicate copy of every single message that arrives on this system seems
> wasteful.
>
A bit wasteful, but disk space is cheap and it's a limited, rolling 
backup.  The value of retraining goes down significantly as time passes, 
so I'm not planning on keeping messages there for an extended period of 
time.

Cron will clean out older messages after a set period of time.
>> I'm thinking of using the extprograms plugin to pipe to a program that
>> will do a simple copy.  That feels very hackish, however, and I'm hoping
>> there is a more elegant solution.
>
> There is; the Dovecot Antispam plug-in. It does exactly what you
> describe, and it addresses the problem of storing a duplicate copy of
> all messages.
>
> In short, when a user drags a message from any folder to "Junk", you'll
> receive a "pristine" copy of the message at any local address you
> specify, delivered to any folder you specify (e.g., "Train as SPAM")
> within that "training user's" mailbox.
>
Hmm.  Perhaps I'm just dense, but I don't see this behavior documented 
in the Antispam plugin docs.  I'm happy to be corrected if I've 
misunderstood.  I'd rather use an existing tool if possible.

What I can see that Antispam will train on the version of the message 
the user drags into the "Junk" folder.  But that message may have had 
headers added by a sieve filter or Spamassassin, for example.  By 
"pristine", I mean "as received" by the LDA.

CRM114's "reaver_cache" is along the lines of what I'm thinking of.

Jeff

> Jeff Rice <mailto:list1 at jrice.me>
> August 11, 2014 at 11:42 AM
> Hello,
> I'm trying to work out a way to have my Sieve filter save a "pristine" 
> version of email messages as a backup, primarily to use for training 
> the spam filter.  I would like is to have every message saved into a 
> single, site-wide directory (in the global sieve) before being 
> processed additionally and delivered.  The messages in that directory 
> will be used to train the spam filter without having to worry about 
> removing Spamassassin headers and so forth.
>
> I thought fileinto :copy might do what I wanted, but this creates a 
> backup directory individually for each user.  That's unmanageable for 
> the spam training process I use. redirect *could* work, but that adds 
> a header during the process so the email saved would not be "pristine".
>
> I'm thinking of using the extprograms plugin to pipe to a program that 
> will do a simple copy.  That feels very hackish, however, and I'm 
> hoping there is a more elegant solution.
>
> Am I missing something obvious here?
>
> Thanks!
> Jeff


More information about the dovecot mailing list