[Dovecot] Speed and memory probs writing large Maildir
I have Dovecot 0.99.9.1 compiled from the source, without SSL, running on a Red Hat 9.0 system - EXT3 file system, Pentium Pro 233 MHz, Intel motherboard, 128M RAM, IBM 40Gig 7200 RPM HD. I find reading and searching to be nice and fast, and comparing it with Courier IMAP on a Celeron 824 MHz, I think Dovecot is generally not much slower, which means it may well be faster than Courier in some or many ways if it was running on the same machine.
My only problem is when writing mailboxes with large numbers of messages. I use Mailir - the same arrangement as Courier IMAP. (I also find that Netscape 4.77 can't use mailboxes which are subfolders of others, but that is probably a Netscape config problem.)
I am using Netscape 4.77 (Windows 2000 1.3 GHz Celeron, via 100Mbps switched Ethernet to the Linux machines) to do these tests since I find Netscape 7.02 to be much slower. Netscape 7.02 seems to want to resynch its notion of the mailbox repeatedly as it is writing to it. Without looking at the IMAP traffic (I don't know how) I would say that Netscape 4.77 is not doing this at all. Indeed, after the write, looking at the mailbox causes Netscape 4.77 to read the headers from the IMAP server - so I feel sure that its way of writing the messages is purely "write".
Writing a mailbox with a small number of large messages, such as 20 x 100k messages is fine. There is no obvious speed problem or excessive memory use.
When writing 2000 x 1k messages, it is totally different. Here are some times for writing from a Netscape 4.77 local mailbox to Dovecot and to Courier IMAP.
Message Total Time CPU max Memory RSS size number size Seconds Approx Megs - max
Dovecot
100k 100 10M 23 15% ~2 100k 20 2M ~5 ? 1.2 10k 200 2M 20 15% 4 ! 1k 2000 2M 230 60 32 !!!!
Courier IMAP
100k 100 10M 14 5% ~1 100k 20 2M ~2 ? ? 10k 200 2M 9 5% ~0.6 1k 2000 2M 88 2% ~0.6
This 2000 message test is small compared to some of the mailboxes I have.
I used a real-live mailbox, 6878 messages - 44 Megabytes in a single Mbox for a more demanding test. Using Netscape 4.77 to copy this to a Dovecot Maildir took a very long time - over 30 minutes. The process was fast at first and got slower and slower. As the number of messages in the destination Maildir increased, the memory usage went up and the speed reduced. It seems that RSS memory usage (as reported by "top -d 0.3" hit some kind of limit at about 49 Megs after 4000 messages and then went down a little to about 48 or less. CPU usage was 30 to 82%, fluctuating according to the moment top looked. At the end, I think it was only writing 1.5 messages a second.
RSS is the "total amount of physical memory used by the task". The "SIZE" was even bigger - "code plus data plus stack". This got up to 114 Megs by the end of the writing process!
After the write had finished, I made Netscape 4.77 view the destination mailbox. This caused Dovecot to send it all the headers, and again Dovecot had these huge memory usage figures. It seemed to read the mailbox at a decent speed - 20 seconds or so.
But if I made Dovecot look at another mailbox, and then again at this big target mailbox (say with another client - Netscape 7), it would read the big mailbox at Dovecot's usual high speed - without the high memory usage and I think without high CPU usage as well.
Without knowing anything about how Dovecot works, I imagine that there is some kind of caching algorithm for writes, and that this is retained until Dovecot (or at least the instance serving this client) is asked to look at another Maildir. It seems that reading is not slowed by this process much or at all - but writing gets progressively slower as the cache gets bigger.
The memory usage - even 49 Megs for RSS - seems excessive. It reached that about halfway through writing the mailbox, so the entire message contents at that state would have been about 22 Megs. But the "SIZE" memory usage is even larger - perhaps limited by the available RAM.
This slow speed for writing Maildirs with large numbers of messages is currently a barrier to me using Dovecot. I am about to go back and try again to get Courier IMAP to do what I want . . .
The Courier IMAP documentation and what I regard as the difficult "configure -> build -> install -> wonder why it doesn't run" process has cost me so much time that I would be very happy to use something simpler and easier (for me, at least) to install.
Thanks for developing this new IMAP server!
- Robin
P.S. I have not altered the dovecot.conf file at all. The only deviation from standard is no SSL. I find it remarkable that Dovecot works out of the box and automagically finds the user's Maildirs!
- Robin
On Wed, 28 May 2003 15:18:22 +1000, Robin Whittle <rw@firstpr.com.au> wrote:
My only problem is when writing mailboxes with large numbers of messages. I use Mailir - the same arrangement as Courier IMAP. (I also find that Netscape 4.77 can't use mailboxes which are subfolders of others, but that is probably a Netscape config problem.)
When writing 2000 x 1k messages, it is totally different. Here are some times for writing from a Netscape 4.77 local mailbox to Dovecot and to Courier IMAP.
I had problems with big (2000+ messages) folders on OpenBSD. I updated to 0.9.99.10-test6 and it is fine.
On Wed, 2003-05-28 at 08:18, Robin Whittle wrote:
When writing 2000 x 1k messages, it is totally different. Here are some times for writing from a Netscape 4.77 local mailbox to Dovecot and to Courier IMAP.
Thanks, this test helped to find several problems. First problem was a memory leak with APPEND command. This caused the huge memory usage. Here's a fix: --- cmd-append.c 14 Feb 2003 08:00:52 -0000 1.24 +++ cmd-append.c 28 May 2003 11:01:03 -0000 @@ -197,6 +197,7 @@ count++; } + imap_parser_destroy(save_parser); if (!box->save_deinit(ctx, failed)) { failed = TRUE; Then I noticed that each APPEND command has to reopen and resync the mailbox every time which gets just slower and slower. I think I'll make it leave the mailbox open for a few seconds just in case another APPEND comes. That should speed it up quite a lot. Actually APPEND wouldn't even need the mailbox to be synced. I should try fixing that too.. And finally there's a nasty bug in 0.99.10-test's new maildir syncing code. I think I'll have to rewrite it a bit more to avoid wasting memory..
On Wed, 2003-05-28 at 08:18, Robin Whittle wrote:
When writing 2000 x 1k messages, it is totally different. Here are some times for writing from a Netscape 4.77 local mailbox to Dovecot and to Courier IMAP.
Want to try this with 0.99.10-test11? :) It now keeps the index file opened for 10 seconds before really closing it, that should help with the speed quite a lot. I just tested copying 4000 messages myself, I didn't notice any actual increase in speed, but CPU usage dropped from a few percent to 0.4%.
It still does a few unnecessary things every time mailbox is opened (ie. at every APPEND command). I think I shouldn't try to mkdir() the cur, new and tmp dirs immediately. Those could be more easily created if/when stat() fails while syncing.
On 30 May 2003, Timo Sirainen wrote:
It still does a few unnecessary things every time mailbox is opened (ie. at every APPEND command). I think I shouldn't try to mkdir() the cur, new and tmp dirs immediately. Those could be more easily created if/when stat() fails while syncing.
Shouldn't the maildir only be created in response to a CREATE command. It certainly shouldn't be as the result of an APPEND:
If the destination mailbox does not exist, a server MUST return an error, and MUST NOT automatically create the mailbox.
I would interpret any of cur, new and tmp being missing as "mailbox does not exist".
-- Charlie Brady charlie_brady@mitel.com
On Fri, 2003-05-30 at 23:29, Charlie Brady wrote:
On 30 May 2003, Timo Sirainen wrote:
It still does a few unnecessary things every time mailbox is opened (ie. at every APPEND command). I think I shouldn't try to mkdir() the cur, new and tmp dirs immediately. Those could be more easily created if/when stat() fails while syncing.
Shouldn't the maildir only be created in response to a CREATE command. It certainly shouldn't be as the result of an APPEND:
If the destination mailbox does not exist, a server MUST return an error, and MUST NOT automatically create the mailbox.
I would interpret any of cur, new and tmp being missing as "mailbox does not exist".
Well, maybe. I was just thinking cases when a DELETE died in the middle of it, and left only some of the dirs there. In that case you couldn't then SELECT/APPEND the mailbox, but I guess you could DELETE it again..
Or maybe rather Dovecot should just create those dirs when it notices they don't exist while trying to use them. I like to be able to create new mailboxes with "mkdir .boxname".
On 4 Jun 2003, Timo Sirainen wrote:
On Fri, 2003-05-30 at 23:29, Charlie Brady wrote:
I would interpret any of cur, new and tmp being missing as "mailbox does not exist".
Well, maybe. I was just thinking cases when a DELETE died in the middle of it, and left only some of the dirs there. In that case you couldn't then SELECT/APPEND the mailbox, but I guess you could DELETE it again..
Can you DELETE a mailbox which doesn't exist?
If a DELETE dies in the middle, and the maildir isn't complete, then by my definition (and I'd claim, logically) the mailbox does not exist. If a mailbox doesn't exist, you can SELECT/APPEND or DELETE. But it shouldn't appear in a LIST, and you must be able to CREATE.
Or maybe rather Dovecot should just create those dirs when it notices they don't exist while trying to use them.
I think a maildir must be complete to exist. Hence it's wrong to "by stealth" convert a partial maildir to a complete one.
I like to be able to create new mailboxes with "mkdir .boxname".
Shouldn't that be "CREATE boxname"? :-)
-- Charlie
On Wed, 2003-06-04 at 21:07, Charlie Brady wrote:
I would interpret any of cur, new and tmp being missing as "mailbox does not exist".
Well, maybe. I was just thinking cases when a DELETE died in the middle of it, and left only some of the dirs there. In that case you couldn't then SELECT/APPEND the mailbox, but I guess you could DELETE it again..
Actually, this can't happen. Dovecot does atomic maildir deletion by renaming the ".mailbox" into "..mailbox" which is then deleted.
Can you DELETE a mailbox which doesn't exist?
If a DELETE dies in the middle, and the maildir isn't complete, then by my definition (and I'd claim, logically) the mailbox does not exist. If a mailbox doesn't exist, you can SELECT/APPEND or DELETE. But it shouldn't appear in a LIST, and you must be able to CREATE.
That'd be pretty broken. If mailbox doesn't exist, you can't select or delete it. Most clients wouldn't even let you try.
Or maybe rather Dovecot should just create those dirs when it notices they don't exist while trying to use them.
I think a maildir must be complete to exist. Hence it's wrong to "by stealth" convert a partial maildir to a complete one.
Well, what harm could it cause?
I like to be able to create new mailboxes with "mkdir .boxname".
Shouldn't that be "CREATE boxname"? :-)
Easier to play with filesystem directly when testing.
participants (4)
-
Andrew Basterfield
-
Charlie Brady
-
Robin Whittle
-
Timo Sirainen