Hi,
I'm using Dovecot for my mail with pigeonhole's sieve extension, from debian stable (2.3.4.1 (f79e8e7e4)).
One of the sieve stanzas that I use is for duplicate elimination:
if duplicate { discard; stop; }
This mostly works fine, but I seem to have trouble with some messages that come from certain domains where duplicates are not eliminated.
From /var/log/mail.log, I see:
Feb 18 12:48:50 mail dovecot: lmtp(24320:linux@xxx.armlinux.org.uk): sieve: msgid=? VI1PR04MB513558BF77192255CBE12102B0110@VI1PR04MB5135.eurprd04.prod.outlook.com: stored mail into mailbox 'INBOX' Feb 18 12:49:42 mail dovecot: lmtp(24320:linux@xxx.armlinux.org.uk): sieve: msgid=VI1PR04MB513558BF77192255CBE12102B0110@VI1PR04MB5135.eurprd04.prod.outlook.com: stored mail into mailbox 'INBOX'
The first was received direct from the recipient with a message-id line formatted thusly:
Message-ID: VI1PR04MB513558BF77192255CBE12102B0110@VI1PR04MB5135.eurprd04.prod.outlook.com
The second was received from the mailing list with a message-id line formatted thusly:
Message-ID: VI1PR04MB513558BF77192255CBE12102B0110@VI1PR04MB5135.eurprd04.prod.outlook.com
It would appear that the parsed message-id value that dovecot uses includes white-space (including newline characters).
RFC5322 gives the message-id header format as:
message-id = "Message-ID:" msg-id CRLF
msg-id = [CFWS] "<" id-left "@" id-right ">" [CFWS]
It goes on to say:
The message identifier (msg-id) itself MUST be a globally unique
identifier for a message.
and:
Semantically, the angle bracket characters are not part of the
msg-id; the msg-id is what is contained between the two angle bracket
characters.
However, it seems dovecot sieve is using the entire content of the msg-id, including CFWS, as the message id used for detecting duplicate messages. This seems wrong, and appears to lead to duplicates not being detected, and thus seems like a bug.
Is there a workaround for this, and/or can it be changed?
-- Russell King