[Dovecot] maildir and end-of-line encoding
Hi.
I just wondered, the following:
My MDA may get mails that use LF or CR/LF end of line encodings and deliver them into maildirs.
I couldn't find any information about, whether one should or must convert all into one format, cause AFAIK at least on the IMAP side, CR/LF is always used?
How does this work on the maildir/backend side of dovcot? Can it work with both and simply automatically convert LF into CR/LF?
Thanks, Chris.
On 31.10.2012, at 3.50, Christoph Anton Mitterer wrote:
I just wondered, the following:
My MDA may get mails that use LF or CR/LF end of line encodings and deliver them into maildirs.
I couldn't find any information about, whether one should or must convert all into one format, cause AFAIK at least on the IMAP side, CR/LF is always used?
How does this work on the maildir/backend side of dovcot? Can it work with both and simply automatically convert LF into CR/LF?
Dovecot automatically adds CRs where necessary. Even within the same file there can be mixed LF/CRLF lines.
On Wed, 2012-11-07 at 17:33 +0200, Timo Sirainen wrote:
Dovecot automatically adds CRs where necessary. Even within the same file there can be mixed LF/CRLF lines. Can you detail this a bit, or point me to the specific code areas?
Is only CR added? Or also LF?
What happens e.g. when LFCR is found? Is that then "doubled" to CRLFCR or even CRLFCRLF?
When does it "add" these chars? Only when using dovecot-lda? Or also when some other MDA places files into e.g. a maildir?
I did some reading on the RFC 5322 which says:
new mails must not have single CR or LF, both may only occur as CRL
but from the previous RFCs, it allows existing messages to have CR and LF alone, in which case they are not newlines as CRLF, but rather the CR and LF characters in the their meaning as control characters.
So from that point of view... automatic conversion may actually "corrupt" things in a strict sense. (One should hope of course, that only few people use(d) CR or LF alone to get their control character meaning... but rather that these are just cases of accidents.)
I agree with you that mails should be stored with CRLF, as this is their native format.... and I found nothing on the maildir[++] standards that would forbid that (neither that would encourage it). But for mbox there are "definitions" that _always_ LF is used (AFAIU, even on non-UNIX platforms.
I went through my mails and basically I found everything: CR, LF, CRLF and even LFCR. Now I have no real idea how to deal with that? Keep all as is? Make all LFs CRLFs and/or all CFs to CRLFs? What about the LFCRs? Handle them as group and perhaps swap them to CRLF. Or doing the same as with single LFs and CRs.
Cheers, Chris.
On 8.11.2012, at 4.57, Christoph Anton Mitterer wrote:
On Wed, 2012-11-07 at 17:33 +0200, Timo Sirainen wrote:
Dovecot automatically adds CRs where necessary. Even within the same file there can be mixed LF/CRLF lines. Can you detail this a bit, or point me to the specific code areas?
- Is only CR added? Or also LF?
If CR is alone, it's not treated as newline. So only CRs may be added before LF.
- What happens e.g. when LFCR is found? Is that then "doubled" to CRLFCR or even CRLFCRLF?
CRLFCR
- When does it "add" these chars? Only when using dovecot-lda? Or also when some other MDA places files into e.g. a maildir?
When saving a mail, based on mail_save_crlf setting the CRs are either added or removed when writing the mail to disk. When reading a mail and sending to IMAP/POP3 client the CRs are always added. (doveadm fetch text doesn't add/remove CRs I think.)
I did some reading on the RFC 5322 which says:
new mails must not have single CR or LF, both may only occur as CRL
but from the previous RFCs, it allows existing messages to have CR and LF alone, in which case they are not newlines as CRLF, but rather the CR and LF characters in the their meaning as control characters.
- So from that point of view... automatic conversion may actually "corrupt" things in a strict sense. (One should hope of course, that only few people use(d) CR or LF alone to get their control character meaning... but rather that these are just cases of accidents.)
SMTP and IMAP protocols are the only normal ways to get messages into a system. Both of them require CRLF newlines. So there's really no way for Dovecot to ever see valid LF-only newlines. One exception is Content-Type: binary, but that's not really supported by Dovecot (or any commonly used SMTP servers either I think).
- I agree with you that mails should be stored with CRLF, as this is their native format.... and I found nothing on the maildir[++] standards that would forbid that (neither that would encourage it). But for mbox there are "definitions" that _always_ LF is used (AFAIU, even on non-UNIX platforms.
mbox isn't really standardized. Anyway, storing mails with CRLF allows some optimizations, but if the mails aren't stored compressed it wastes a bit of disk space.
- I went through my mails and basically I found everything: CR, LF, CRLF and even LFCR. Now I have no real idea how to deal with that? Keep all as is? Make all LFs CRLFs and/or all CFs to CRLFs? What about the LFCRs? Handle them as group and perhaps swap them to CRLF. Or doing the same as with single LFs and CRs.
Why do you need to do something about them? Dovecot should handle all of them fine.
participants (2)
-
Christoph Anton Mitterer
-
Timo Sirainen