[Dovecot] antispam-plugin 1.2 and trailing carriage-returns
Guys,
Dovecot 1.0.15 [1], just built the latest antispam-plugin 1.2 (tarball) for testing, mailtrain backend for SA integration. Both built from custom spec files.
The mail that is being trained is different than its respective source in the mbox file. The trained one shows added, trailing carriage-return chars for all headers, which are not in the headers in the mbox file.
This breaks sa-learn -- both these variations are different, and SA would learn *both* when run against each one separately.
How comes? Any insight? How could I fix this, other than wrapping the sa-learn inside another shell script and have sed strip off the noise? This becomes more of an issue, once I switch from sa-learn to the lightning-fast spamc training variant.
TIA
guenther
[1] Yes, I know, sorry. Don't want to change everything at the same time, and the target system I'm experimenting for runs that version, too.
-- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
*nudge* Anyone? Since Timo seems to be on a list processing spree lately, here's hoping. :)
On Tue, 2009-09-01 at 22:20 +0200, Karsten Bräckelmann wrote:
Guys,
Dovecot 1.0.15 [1], just built the latest antispam-plugin 1.2 (tarball) for testing, mailtrain backend for SA integration. Both built from custom spec files.
The mail that is being trained is different than its respective source in the mbox file. The trained one shows added, trailing carriage-return chars for all headers, which are not in the headers in the mbox file.
This breaks sa-learn -- both these variations are different, and SA would learn *both* when run against each one separately.
How comes? Any insight? How could I fix this, other than wrapping the sa-learn inside another shell script and have sed strip off the noise? This becomes more of an issue, once I switch from sa-learn to the lightning-fast spamc training variant.
TIA
guenther
[1] Yes, I know, sorry. Don't want to change everything at the same time, and the target system I'm experimenting for runs that version, too.
-- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
On Tue, 2009-09-01 at 22:20 +0200, Karsten Bräckelmann wrote:
The mail that is being trained is different than its respective source in the mbox file. The trained one shows added, trailing carriage-return chars for all headers, which are not in the headers in the mbox file.
This breaks sa-learn -- both these variations are different, and SA would learn *both* when run against each one separately.
How comes? Any insight?
Probably because incoming mails have CRLF linefeeds. Antispam plugin could drop these by wrapping the mail_get_stream()'s returned input stream to i_stream_create_lf().
On Tue, 2009-10-27 at 19:28 -0400, Timo Sirainen wrote:
On Tue, 2009-09-01 at 22:20 +0200, Karsten Bräckelmann wrote:
The mail that is being trained is different than its respective source in the mbox file. The trained one shows added, trailing carriage-return chars for all headers, which are not in the headers in the mbox file.
This breaks sa-learn -- both these variations are different, and SA would learn *both* when run against each one separately.
How comes? Any insight?
Probably because incoming mails have CRLF linefeeds. Antispam plugin could drop these by wrapping the mail_get_stream()'s returned input stream to i_stream_create_lf().
I'm not sure this is what we want -- shouldn't we keep it as pristine as possible?
However, I don't understand Karsten anyway, which message is "the trained one"? Karsten, please list the three relevant messages: the one first handed to SA _before_ dovecot gets involved, the one stored, and the one handed to SA via antispam.
johannes
On Oct 28, 2009, at 3:02 AM, Johannes Berg wrote:
Probably because incoming mails have CRLF linefeeds. Antispam plugin could drop these by wrapping the mail_get_stream()'s returned input stream to i_stream_create_lf().
I'm not sure this is what we want -- shouldn't we keep it as
pristine as possible?
The linefeeds don't really matter much. For example IMAP and SMTP
require sending CRLF linefeeds and Dovecot always converts them to
just LFs before saving the messages. Then depending on how the input
comes, it may have CRLF or LF linefeeds.
On Wed, 2009-10-28 at 03:07 -0400, Timo Sirainen wrote:
On Oct 28, 2009, at 3:02 AM, Johannes Berg wrote:
Probably because incoming mails have CRLF linefeeds. Antispam plugin could drop these by wrapping the mail_get_stream()'s returned input stream to i_stream_create_lf().
I'm not sure this is what we want -- shouldn't we keep it as
pristine as possible?The linefeeds don't really matter much. For example IMAP and SMTP
require sending CRLF linefeeds and Dovecot always converts them to
just LFs before saving the messages. Then depending on how the input
comes, it may have CRLF or LF linefeeds.
Indeed. But I think Karsten is saying that his mail comes with CRLF via SMTP, and is seen by SA with CRLF, and then when it comes back to SA via antispam, it now has just LF. In a sense, dovecot is normalising linefeeds to LF, which seems to be causing him problems.
johannes
participants (3)
-
Johannes Berg
-
Karsten Bräckelmann
-
Timo Sirainen