[Dovecot] v1.2 development tree started
I merged all the new features and latest v1.1 changes under one tree:
http://hg.dovecot.org/dovecot-1.2/
The changeset order is kind of weird though. The timestamps jump all around. Wonder if there's a way to reorder them so that it looks more sane..
The status of the new features:
- CONDSTORE extension is probably the largest change. It adds new "modification sequences" for messages that increase whenever the message's metadata changes.
I'll probably have to reimplement the way modseqs are calculated, because modseqs will be very useful when implementing replication and the current way just doesn't work with it. If modseq-supporting clients see the current modseqs and later the server gets upgraded to new modseqs, the clients will most likely break. So this change should be done for v1.2.
QRESYNC, ESEARCH, SEARCHRES and WITHIN extensions are fully implemented. QRESYNC might need some cleanups and optimizations though.
CONTEXT=SEARCH extension is fully implemented, but could use a few optimizations (caching PARTIAL and CONTEXT searches, maybe others too).
Virtual mailboxes should work fast after mailbox is opened. The initial opening could use several optimizations though. It could probably share some code with QRESYNC to avoid the full initial search (storing each backend's modseq to index header). Also if search parameters don't contain any dynamically changing data, there's no point in searching the old messages.
The current design doesn't allow changing the search parameters or list of mailboxes, otherwise it breaks more or less badly. I guess I could add code to check if the dovecot-virtual file's mtime has changed and if so make it do a full resync. This anyway means that there's no way to support wildcard mailbox names (e.g. "all mailboxes"). But does anyone really want that (yet)? It'll anyway be faster/easier to implement once mailbox list indexes are implemented.
I'll still have to add a new X-MAILBOX search parameter which can be used to test what the backend mailbox name is. This will be especially useful with INTHREAD extension. I guess it wouldn't hurt to have FETCH X-MAILBOX if someone wants it.
Thread indexes should work, but some optimizations missing and there are some small bugs left. X-REFERENCES2 thread algorithm works.
INTHREAD extension isn't started yet, but I'll start it soon. Hopefully won't be too tricky to get it working with virtual mailboxes and CONTEXT=SEARCH..
Updates:
On Mon, 2008-06-09 at 05:51 +0300, Timo Sirainen wrote:
I merged all the new features and latest v1.1 changes under one tree:
Nightly snapshots are also from v1.2 code tree nowadays.
- CONDSTORE extension is probably the largest change. It adds new "modification sequences" for messages that increase whenever the message's metadata changes.
I'll probably have to reimplement the way modseqs are calculated, because modseqs will be very useful when implementing replication and the current way just doesn't work with it. If modseq-supporting clients see the current modseqs and later the server gets upgraded to new modseqs, the clients will most likely break. So this change should be done for v1.2.
Modseq changes are implemented. The only issue with CONDSTORE is that STORE UNCHANGEDSINCE command doesn't atomically check-and-update. Implementing the atomicity should be pretty easy since there is a similar check already in the code. The largest issue with it is changing APIs enough to support returning back which messages failed the STORE. Still should be pretty easy.
- Virtual mailboxes should work fast after mailbox is opened. The initial opening could use several optimizations though. It could probably share some code with QRESYNC to avoid the full initial search (storing each backend's modseq to index header). Also if search parameters don't contain any dynamically changing data, there's no point in searching the old messages.
Implemented initial opening optimizations. I haven't done much testing though, other than it appears not to crash and appears to work with simple tests. :) So the current implementation should be as fast as it's possible to make it.
The current design doesn't allow changing the search parameters or list of mailboxes, otherwise it breaks more or less badly. I guess I could add code to check if the dovecot-virtual file's mtime has changed and if so make it do a full resync. This anyway means that there's no way to support wildcard mailbox names (e.g. "all mailboxes"). But does anyone really want that (yet)? It'll anyway be faster/easier to implement once mailbox list indexes are implemented.
Changing mailbox list is now detected and handled, as well as UIDVALIDITY changing in mailboxes. Mailbox list wildcards wouldn't be all that difficult to implement anymore if someone wants them, but until then I don't think I'll bother.
Changing search parameters still isn't detected though. Maybe it could store a MD5 sum of the search parameters in the header and if it changes rebuild the entire mailbox.
I'll still have to add a new X-MAILBOX search parameter which can be used to test what the backend mailbox name is. This will be especially useful with INTHREAD extension. I guess it wouldn't hurt to have FETCH X-MAILBOX if someone wants it.
Oh, almost forgot about this one.
- INTHREAD extension isn't started yet, but I'll start it soon. Hopefully won't be too tricky to get it working with virtual mailboxes and CONTEXT=SEARCH..
This one is the last major unimplemented v1.2 feature. After that I'll start finishing, optimizing and stabilizing the features for a v1.2 release (as well as start v2.0/replication coding). I'm hoping for v1.2.0 release by the end of this summer.
Hi Timo,
First of all, dovecot is great! :)
Question on CONDSTORE. I haven't re-read RFC to confirm, isn't CONDSTORE operates under switch mode with command ENABLE? So that IMAP client needs to request such capability. Maybe I mixed up with another IMAP command.
Thanks Joseph
Timo Sirainen wrote:
Updates:
On Mon, 2008-06-09 at 05:51 +0300, Timo Sirainen wrote:
I merged all the new features and latest v1.1 changes under one tree:
Nightly snapshots are also from v1.2 code tree nowadays.
- CONDSTORE extension is probably the largest change. It adds new "modification sequences" for messages that increase whenever the message's metadata changes.
I'll probably have to reimplement the way modseqs are calculated, because modseqs will be very useful when implementing replication and the current way just doesn't work with it. If modseq-supporting clients see the current modseqs and later the server gets upgraded to new modseqs, the clients will most likely break. So this change should be done for v1.2.
Modseq changes are implemented. The only issue with CONDSTORE is that STORE UNCHANGEDSINCE command doesn't atomically check-and-update. Implementing the atomicity should be pretty easy since there is a similar check already in the code. The largest issue with it is changing APIs enough to support returning back which messages failed the STORE. Still should be pretty easy.
- Virtual mailboxes should work fast after mailbox is opened. The initial opening could use several optimizations though. It could probably share some code with QRESYNC to avoid the full initial search (storing each backend's modseq to index header). Also if search parameters don't contain any dynamically changing data, there's no point in searching the old messages.
Implemented initial opening optimizations. I haven't done much testing though, other than it appears not to crash and appears to work with simple tests. :) So the current implementation should be as fast as it's possible to make it.
The current design doesn't allow changing the search parameters or list of mailboxes, otherwise it breaks more or less badly. I guess I could add code to check if the dovecot-virtual file's mtime has changed and if so make it do a full resync. This anyway means that there's no way to support wildcard mailbox names (e.g. "all mailboxes"). But does anyone really want that (yet)? It'll anyway be faster/easier to implement once mailbox list indexes are implemented.
Changing mailbox list is now detected and handled, as well as UIDVALIDITY changing in mailboxes. Mailbox list wildcards wouldn't be all that difficult to implement anymore if someone wants them, but until then I don't think I'll bother.
Changing search parameters still isn't detected though. Maybe it could store a MD5 sum of the search parameters in the header and if it changes rebuild the entire mailbox.
I'll still have to add a new X-MAILBOX search parameter which can be used to test what the backend mailbox name is. This will be especially useful with INTHREAD extension. I guess it wouldn't hurt to have FETCH X-MAILBOX if someone wants it.
Oh, almost forgot about this one.
- INTHREAD extension isn't started yet, but I'll start it soon. Hopefully won't be too tricky to get it working with virtual mailboxes and CONTEXT=SEARCH..
This one is the last major unimplemented v1.2 feature. After that I'll start finishing, optimizing and stabilizing the features for a v1.2 release (as well as start v2.0/replication coding). I'm hoping for v1.2.0 release by the end of this summer.
On Jun 18, 2008, at 5:56 PM, Joseph Yee wrote:
Hi Timo,
First of all, dovecot is great! :)
Question on CONDSTORE. I haven't re-read RFC to confirm, isn't
CONDSTORE operates under switch mode with command ENABLE? So that
IMAP client needs to request such capability. Maybe I mixed up with
another IMAP command.
ENABLE was created after CONDSTORE. But you're correct that it's not
enabled until client does so using a "CONDSTORE enabling command", one
of which is ENABLE command but there are others. I also implemented
this to Dovecot so that it doesn't waste disk space on storing modseqs
unless CONDSTORE has been enabled (or some other feature needs
modseqs, like virtual mailboxes). Although it does waste some CPU and
probably some disk I/O keeping track of the highest modseq even if
they're not enabled. Otherwise it's a bit difficult to reliably enable
modseqs. Hmm. I think. Or maybe it wouldn't be. Maybe I should look
into it. :)
Timo Sirainen wrote:
This one is the last major unimplemented v1.2 feature.
Can I make a very weak suggestion to look at that ZLIB compression extension I think you mentioned in the past?
The motivation is that I find my "8 mbit broadband" link seems to saturate at quite low numbers of headers per second when Thunderbird is pulling down new mailbox messages. As you know on most of my machines I use our compression proxy application which is very noticably increasing my mailbox access speeds even on cutting edge broadband (for europe).
Now whilst probably zero clients implement the compression extension this is also a chicken/egg thing so we could start by having a working implementation on the server end at least
Second reason is that this suggests that a typical rented server with a meagre 100mbit connection could be network limited while replicating, rather than being network or CPU bound. A lightly compressed protocol *might* be a win even on fairly fast connections simply because many of the imap command outputs seem to compress extremely well (13:1 is typical based on the rather inefficient way OE accesses IMAP and 4:1 average is very normal even for more efficient implementations - YMMV)
Anyway, just a thought - I'm assuming that the probable implementation is going to be fairly simple. I would think that zlib and/or lzo would be good compressors if there is a choice of implementations? Certainly LZO would be a good choice for faster 100mbit connections
Ed W
On Jun 20, 2008, at 3:02 PM, Ed W wrote:
Timo Sirainen wrote:
This one is the last major unimplemented v1.2 feature.
Can I make a very weak suggestion to look at that ZLIB compression
extension I think you mentioned in the past?
It would have to be done by proxying in imap-login similar to how SSL
connections are handled. But aren't you using SSL already, and why
not? Using that would give compression for free. Although I haven't
really looked at if it's already automatically enabled or if I or
clients should do something special..
Anyway, just a thought - I'm assuming that the probable
implementation is going to be fairly simple. I would think that
zlib and/or lzo would be good compressors if there is a choice of
implementations? Certainly LZO would be a good choice for faster
100mbit connections
http://www.ietf.org/rfc/rfc4978.txt specifies DEFLATE format that can
be implemented using zlib.
Can I make a very weak suggestion to look at that ZLIB compression extension I think you mentioned in the past?
It would have to be done by proxying in imap-login similar to how SSL connections are handled. But aren't you using SSL already, and why not? Using that would give compression for free. Although I haven't really looked at if it's already automatically enabled or if I or clients should do something special..
I don't think that SSL in general has compression enabled? Could be wrong, but I believe it's a option, but badly supported? I'm not an expert though so I don't know that for sure... I would be interested if someone had a recipe for enabling compression on TLS?
Also if you use SSL then you can no longer do after the fact compression. By definition, encryption done well produces an output which cannot be compressed. So it's even more important to precompress before encryption
Anyway, just a thought - I'm assuming that the probable implementation is going to be fairly simple. I would think that zlib and/or lzo would be good compressors if there is a choice of implementations? Certainly LZO would be a good choice for faster 100mbit connections
http://www.ietf.org/rfc/rfc4978.txt specifies DEFLATE format that can be implemented using zlib.
I think this is probably what you referenced before.
My own experience is using a very powerful (cpu hungry) compressor where it doesn't seem to matter all that much if stuff is base64 encoded or not. Long shot is that whilst all that reflushing sounds really nice I think it's just icing compared with just doing blind compression of everything...
My guess is that with the replication stuff you are going to see a 5x-10x speedup on exchanging long lists of guids to compare folders, etc. Compression on the actual mailbodies may be much less. In my case even with an incompressible jpg file which is base64 encoded we still knock off the expected 1/3 in file size due to the base64 encoding so it's a nice benefit
(My customers are on dialup connections of just 2,400 baud... ie 20KB per *minute* http://www.mailasail.com )
For replication I would have thought support of an optional non RFC LZO compressor would be beneficial on anything under gigabit links..?
Ed W
On Fri, 2008-06-20 at 13:39 +0100, Ed W wrote:
Can I make a very weak suggestion to look at that ZLIB compression extension I think you mentioned in the past?
It would have to be done by proxying in imap-login similar to how SSL connections are handled. But aren't you using SSL already, and why not? Using that would give compression for free. Although I haven't really looked at if it's already automatically enabled or if I or clients should do something special..
I don't think that SSL in general has compression enabled? Could be wrong, but I believe it's a option, but badly supported? I'm not an expert though so I don't know that for sure... I would be interested if someone had a recipe for enabling compression on TLS?
Personally, just as a data point, I use SS_H_ (dovecot --exec-mail imap) to connect to my imap host and enable compression using -C, which seems to have a good effect.
johannes
participants (4)
-
Ed W
-
Johannes Berg
-
Joseph Yee
-
Timo Sirainen