[Dovecot] IMAP threading - the THREAD command
Hi folks, there's an extension [1] to IMAP specifying the THREAD command to return list of message sequence numbers grouped to indicate the threading of them. That draft currently defines two algorithms, one of them based solely on Subjects (which is rather dumb) and the second one, combining Subjects with In-Reply-To (etc) headers.
The second method is actually advertised and supported by Dovecot. However, I've got quite a lot of friends that lack the knowledge of what the "Subject" field is for, so they just leave it empty. The problem comes when I want Dovecot to return the threads - it follows the ietf draft, so it doesn't check only for references from message headers, but also the subjects. The result is that especially those mails with empty subjects are incorrectly grouped together.
So, I'd like to see a feature that would eliminate this problem. It seems to me that there are currently two methods:
a) Don't look at the Subject field at all - altough it might increase the speead a bit, it has some disadvantages as well, namely that you'll lose threading if someone uses broken mail client (but broken only in such a way that it preserves subjects)
b) Ignore the Subject field if it is empty (after stripping stuff like "re:" etc - just as that draft specifies). Seems reasonable, IMHO.
Of course we can't modify the result of a THREAD command (that would break the standard and it generally isn't a proper way to go), so we'd have to introduce another one.
I'm pretty sure that especially method a) is very simple (and even simpler if I decided to break the standard by "fixing" the THREAD command), but I really don't like <rant>portable assembler :)</rant> <politely>C</politely>, so are there any chances for this "new and fixed" command to be included into Dovecot? I mean, I might do that in a flawed and ugly way just for me and post a patch so that possible interested persons can pick it, or I migth persuade someone to do it, actually :). Do you think that it's worth the issue?
Does anyone have any experiences with talking to the IETF about possible extension of their draft?
Ideas?
[1] http://www.ietf.org/internet-drafts/draft-ietf-imapext-sort-17.txt
Cheers, -jkt
-- cd /local/pub && more beer > /dev/mouth
On Fri, 2006-04-07 at 15:50 +0200, Jan Kundrát wrote:
The second method is actually advertised and supported by Dovecot. However, I've got quite a lot of friends that lack the knowledge of what the "Subject" field is for, so they just leave it empty. The problem comes when I want Dovecot to return the threads - it follows the ietf draft, so it doesn't check only for references from message headers, but also the subjects. The result is that especially those mails with empty subjects are incorrectly grouped together.
I don't really like that either. I don't have the problem with empty subjects, but the same generic subjects are used by different people once in a while, such as "Dovecot".
a) Don't look at the Subject field at all - altough it might increase the speead a bit, it has some disadvantages as well, namely that you'll lose threading if someone uses broken mail client (but broken only in such a way that it preserves subjects)
b) Ignore the Subject field if it is empty (after stripping stuff like "re:" etc - just as that draft specifies). Seems reasonable, IMHO.
I've been thinking about adding some time limits, like the subjects are grouped only if their date difference is less than, say, 3 days.
Of course we can't modify the result of a THREAD command (that would break the standard and it generally isn't a proper way to go), so we'd have to introduce another one.
THREAD=REFERENCES2 or something. :)
I'm pretty sure that especially method a) is very simple (and even simpler if I decided to break the standard by "fixing" the THREAD command),
b) is probably easier since the subject merging is done only after removing the Re:, etc.
but I really don't like <rant>portable assembler :)</rant> <politely>C</politely>, so are there any chances for this "new and fixed" command to be included into Dovecot? I mean, I might do that in a flawed and ugly way just for me and post a patch so that possible interested persons can pick it, or I migth persuade someone to do it, actually :). Do you think that it's worth the issue?
While at it I think another thing that needs fixing is that threads should be sorted by their latest mail's received-date, not the first mail's date-header.
If others agree with this, the REFERENCES2 algorithm could do both of these.
Does anyone have any experiences with talking to the IETF about possible extension of their draft?
I don't think that draft can be updated much anymore. Instead a completly new draft could be made which depends on it.. I was just talking about this in imap-proto list a few days ago.
Timo Sirainen wrote:
I don't really like that either. I don't have the problem with empty subjects, but the same generic subjects are used by different people once in a while, such as "Dovecot".
That's indeed true, but you'd also lose threading from those sucky clients. I don't have any statistics handy but I'd say that it wouldn't matter, though, so yup, my vote goes to completely removing checks for subjects.
I've been thinking about adding some time limits, like the subjects are grouped only if their date difference is less than, say, 3 days.
That won't be much effective, I'd say, but surely better than nothing. I'd rather see a more robust solution, though.
b) is probably easier since the subject merging is done only after removing the Re:, etc.
I meant that entirely skipping the processing of subjects seems to be easier, but I haven't looked at the Dovecot code.
While at it I think another thing that needs fixing is that threads should be sorted by their latest mail's received-date, not the first mail's date-header.
Oh, they aren't? I should probably re-read the draft then.
If others agree with this, the REFERENCES2 algorithm could do both of these.
Agreed.
Cheers, -jkt
-- cd /local/pub && more beer > /dev/mouth
On 07-04-2006 17:21:08 +0200, Jan Kundrát wrote:
Timo Sirainen wrote:
I don't really like that either. I don't have the problem with empty subjects, but the same generic subjects are used by different people once in a while, such as "Dovecot".
That's indeed true, but you'd also lose threading from those sucky clients. I don't have any statistics handy but I'd say that it wouldn't matter, though, so yup, my vote goes to completely removing checks for subjects.
This might sound a bit weird, but Mutt for example tackled this problem by having (in threaded mode) a "strict_threaded" flag, that toggles trying to do (some intelligent) subject matching, or only looking at In-Reply-To headers.
I find that almost any email client these days sends proper In-Reply-To headers, and hence, strict threading works fine (and if it doesn't, Mutt of course allows to manually join and break threads). So, I also think subject matching is not necessary any more.
-- Fabian Groffen Gentoo for Mac OS X Project
Timo Sirainen wrote:
While at it I think another thing that needs fixing is that threads should be sorted by their latest mail's received-date, not the first mail's date-header.
If others agree with this, the REFERENCES2 algorithm could do both of these.
Okay, so the differences/improvements are those:
- Ignore Subjects completely
- Use the max(INTERNALDATE) from all messages in the thread as the "thread date"
- Maybe add some mechanism to split/join threads by hand (tricky one)
It's exam time for me right now, but I might be able to try to make a draft or something in a few weeks.
Comments, opinions or monetary support would be appreciated :).
Cheers, -jkt
-- cd /local/pub && more beer > /dev/mouth
participants (3)
-
Grobian
-
Jan Kundrát
-
Timo Sirainen