[Dovecot] SORT(DATE) and missing Date headers
Hello,
Occasionally, I get mail that's missing a Date header. The usual suspects are iTunes weekly mailings and NYTimes email-to-a-friend articles. I use qmail, which doesn't "fix" these malformed emails by adding a Date header (like Sendmail does), so when they get to my mailbox, they're just as badly formed as they were when they were sent, which is to say, they have no Date header. Unfortunately, this means that SORT (DATE) puts them at the beginning of the list---and in a mailbox with lots of old mail, that's usually the wrong spot.
Now, my email clients will do some basic guesswork about when the message was sent, usually by falling back on the timestamp of the first-added Received header. So, they will display a date. But they have to then sort all messages themselves in order to get a valid sorting. Would it be possible to get Dovecot to fall back to Received header parsing whenever a message is missing a Date header? The idea is that SORT (DATE) would then become useful even in the face of commonly malformed email. How hard would this be to hack into the current Dovecot source? Does this maybe exist already (e.g. as SORT (X-DATE)?
~Kyle
There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -- Ed Howdershelt
We recently encountered this with a new VOIP voicemail system. However, using Thunderbird at least, the time the message file was written is used (probably using "INTERNALDATE").
This might only apply if you're using Maildir's, but it shows that the behavior you're seeing might be specific to Mutt.
Rich
Kyle Wheeler wrote:
Hello,
Occasionally, I get mail that's missing a Date header. The usual suspects are iTunes weekly mailings and NYTimes email-to-a-friend articles. I use qmail, which doesn't "fix" these malformed emails by adding a Date header (like Sendmail does), so when they get to my mailbox, they're just as badly formed as they were when they were sent, which is to say, they have no Date header. Unfortunately, this means that SORT (DATE) puts them at the beginning of the list---and in a mailbox with lots of old mail, that's usually the wrong spot.
Now, my email clients will do some basic guesswork about when the message was sent, usually by falling back on the timestamp of the first-added Received header. So, they will display a date. But they have to then sort all messages themselves in order to get a valid sorting. Would it be possible to get Dovecot to fall back to Received header parsing whenever a message is missing a Date header? The idea is that SORT (DATE) would then become useful even in the face of commonly malformed email. How hard would this be to hack into the current Dovecot source? Does this maybe exist already (e.g. as SORT (X-DATE)?
~Kyle
On Monday, September 24 at 04:19 PM, quoth Rich at Whidbey Telecom:
We recently encountered this with a new VOIP voicemail system. However, using Thunderbird at least, the time the message file was written is used (probably using "INTERNALDATE").
This might only apply if you're using Maildir's, but it shows that the behavior you're seeing might be specific to Mutt.
It's not mutt, actually. The client where this became an issue is RoundCube webmail. Mutt falls back to INTERNALDATE, but think about it: it requires sorting on the client's side, because you're combining SORT(DATE) and SORT(ARRIVAL), which completely negates the whole point of server-side sorting. Even then, though, INTERNALDATE is a bad approximation if I want to sort by time *sent* (compare, for example, a message that I bounced from my work address to my home mailbox). The best approximation would be to use the first-added Received header (which could be several days different from INTERNALDATE).
Yes, I know *most* mail clients have a workaround for this problem. Is that really the best solution, though? To simply apply a workaround to all IMAP clients, rather than fix it in the IMAP server?
~Kyle
Never think that war, no matter how necessary, no matter how justified, is not a crime. -- Ernest Hemingway
On Mon, 2007-09-24 at 14:38 -0500, Kyle Wheeler wrote:
Would it be possible to get Dovecot to fall back to Received header parsing whenever a message is missing a Date header? The idea is that SORT (DATE) would then become useful even in the face of commonly malformed email.
Unfortunately that would violate the SORT spec.
How hard would this be to hack into the current Dovecot source?
Replace mail_get_date() calls in src/imap/imap-sort.c with something like:
t = mail_get_date(..); if (t == (time_t)-1) t = mail_get_received_date(..);
On Tuesday, September 25 at 12:39 PM, quoth Timo Sirainen:
How hard would this be to hack into the current Dovecot source?
Replace mail_get_date() calls in src/imap/imap-sort.c with something like:
t = mail_get_date(..); if (t == (time_t)-1) t = mail_get_received_date(..);
That's not *quite* what I meant. ARRIVAL is "when did this mail get here", while DATE is supposed to be "when was this mail sent". My thought here is that "when was this mail sent" can be approximated in the absence of a Date header by checking the earliest timestamp in the Received headers. mail_get_received_date() returns the *latest* timestamp in the Received headers (actually, in a Maildir backend, it just returns the fstat of the message file), so in a folder full of messages without Date headers, SORT(DATE) and SORT(ARRIVAL) would be identical, which is not what I'm aiming for.
But, looking at it, I guess this becomes pretty difficult. To work around the fstat, I'd have to do something even more complicated than an mbox _read() to find the Received header with the oldest timestamp. Is that correct?
~Kyle
The sacred rights of mankind are not to be rummaged for, among old parchments, or musty records. They are written, as with a sun beam in the whole volume of human nature, by the hand of the divinity itself; and can never be erased or obscured by mortal power. -- Alexander Hamilton, 1775
On Tue, 2007-09-25 at 09:58 -0500, Kyle Wheeler wrote:
On Tuesday, September 25 at 12:39 PM, quoth Timo Sirainen:
How hard would this be to hack into the current Dovecot source?
Replace mail_get_date() calls in src/imap/imap-sort.c with something like:
t = mail_get_date(..); if (t == (time_t)-1) t = mail_get_received_date(..);
That's not *quite* what I meant. ARRIVAL is "when did this mail get here", while DATE is supposed to be "when was this mail sent". My thought here is that "when was this mail sent" can be approximated in the absence of a Date header by checking the earliest timestamp in the Received headers.
So, something like:
const char *const *headers = mail_get_headers(mail, "Received"); if (headers != NULL && headers[0] != NULL) { while (headers[1] != NULL) headers++; // do your Received header parsing magic for headers[0] }
On Tuesday, September 25 at 06:26 PM, quoth Timo Sirainen:
That's not *quite* what I meant. ARRIVAL is "when did this mail get here", while DATE is supposed to be "when was this mail sent". My thought here is that "when was this mail sent" can be approximated in the absence of a Date header by checking the earliest timestamp in the Received headers.
So, something like:
const char *const *headers = mail_get_headers(mail, "Received"); if (headers != NULL && headers[0] != NULL) { while (headers[1] != NULL) headers++; // do your Received header parsing magic for headers[0] }
Aha! That's perfect! (and so simple!)
If anyone in the future is interested in the code for this, here's what I did that works for me. This goes in all three places that mail_get_date() is used in the code:
t = mail_get_date(mail, NULL); if (t == (time_t)-1 || t == 0) { const char *const *headers = mail_get_headers(mail, "Received"); if (headers != NULL && headers[0] != NULL) { while (headers[1] != NULL) headers++; // find the last one /* find the semicolon */ const char * curs = headers[0]; while (curs[0] != ';' && curs[0] != 0) { if (curs[0] == '(') while (curs[0] != ')' && curs[0] != 0) curs++; curs++; if (curs[0] == ';') { curs++; if (curs[0] != 0) { int tz; message_date_parse((const unsigned char *)curs, strlen(curs), &t, &tz); } } } }
Thanks, Timo!
~Kyle
Coffee is the common man's gold, and like gold, it brings to every person the feeling of luxury and nobility. -- Sheik Abd-al-Kadir
On Tuesday, October 2 at 09:49 AM, quoth Kyle Wheeler:
Aha! That's perfect! (and so simple!)
If anyone in the future is interested in the code for this, here's what I did that works for me. This goes in all three places that mail_get_date() is used in the code:
I put up a patch for this (and the two other things I regularly patch dovecot to do) up on the web, in case other people find them useful: http://www.memoryhole.net/dovecot/
~Kyle
The average Ph.D thesis is nothing but the transference of bones from one graveyard to another. -- J. Frank Dobie, "A Texan in England"
participants (3)
-
Kyle Wheeler
-
Rich at Whidbey Telecom
-
Timo Sirainen