On Tue, Oct 22, 2002 at 03:40:07PM +0200, Thomas Wouters wrote:
Only bigger thing to do is to parse the headers and convert the =?xxx?yyy?= things. I think everything should go either through UTF8 or without any conversion if both header and search charsets are same.
I assume you only want to convert to UTF-8 or the other character sets when it's really necessary, not store all data internally as UTF-8 or wchar_t ?
Well, there's not much stored in memory, and index files store mostly just FETCH ENVELOPE. The envelope is better to be in format where it's suitable for directly sending to IMAP client and those few things that are stored in memory aren't used by search at all. I think it'll be easier if everything was just converted when needed and it's just more CPU (and maybe memory) usage - there should be plenty of that left :)
I think 2) might be an option if you're dealing with very specific SORTs. SquirrelMail, for instance, allows sorting on date, from, subject, arrival and to (but the last one only in 'sent-mail' mailbox, oddly enough) and all reverses, and in various order as well, by little buttony things on the mailbox-index page... easy to play with. (Don't forget, you can
. SORT (SUBJECT REVERSE FROM REVERSE TO REVERSE DATE ARRIVAL) UTF-8 ALL
and which btrees would you use how, in that case ? :) Anyway, in
Primary condition could be enough to store in the btree, the other conditions are used only when primary compares equal between mails, so we can just read those into memory and then apply the rest of the sorting. Still faster and takes less memory than reading everything into memory and then sorting.
Or the btree could be fully sorted with some condition, but if it's not exactly the same we want we could just use the primary condition.
(uh, a bit badly said, hope it makes some sense :)