On Tue, 2009-10-27 at 19:28 -0400, Timo Sirainen wrote:
On Tue, 2009-09-01 at 22:20 +0200, Karsten Bräckelmann wrote:
The mail that is being trained is different than its respective source in the mbox file. The trained one shows added, trailing carriage-return chars for all headers, which are not in the headers in the mbox file.
This breaks sa-learn -- both these variations are different, and SA would learn *both* when run against each one separately.
How comes? Any insight?
Probably because incoming mails have CRLF linefeeds. Antispam plugin could drop these by wrapping the mail_get_stream()'s returned input stream to i_stream_create_lf().
I'm not sure this is what we want -- shouldn't we keep it as pristine as possible?
However, I don't understand Karsten anyway, which message is "the trained one"? Karsten, please list the three relevant messages: the one first handed to SA _before_ dovecot gets involved, the one stored, and the one handed to SA via antispam.
johannes