[Dovecot] Large folders and timeouts
I have been having persistent issues with server timeouts on operations on large folders on our Dovecot install (1.0beta2 RPM that came with Fedora Core 5). Some observations from today...
I have a Maildir folder with 38,000 messages (it is my Spam folder, naturally :-). In Thunderbird 1.5.0.2, if I click on that folder, it slowly but steadily downloads the headers and after 3-4 minutes, shows me the contents. So far, so good. If I then select all the messages and click Delete to delete them, that operation times out after about 1 minute. If I then click on my Trash folder after some number of minutes has passed, I see that those messages are now both in my Trash and in the original folder - I believe this is because Thunderbird first asks for a COPY over to Trash, and then asks for a STORE +FLAG \\Deleted on the copy that is still in the original folder. It appears that the COPY commands are working but the STORE commands are not since the connection is timing out before they run.
I thought I'd try on another mail client, so I tried Squirrelmail. I set it to show 1000 messages per page. It does this just fine. Then I set it to 4,000 messages per page. Then I click on a folder with 12,000 messages, and I get this:
ERROR: Connection dropped by IMAP server. Query: SORT (ARRIVAL) ISO-8859-1 ALL
My questions are 1. what is the likely cause? and 2. will upgrading to a newer beta address any of this? I'd prefer to wait until FC5 has an updated RPM if possible. Our mail is stored on an NFS store. (I know NFS is slow and that this might be part of the cause.) However, it seems that the timeouts I am having are more related to times when there is no data flowing between the server and client. Note that the timeouts happened when the server was doing a COPY and when the server was doing a SORT but not when the client was downloading 38,000 headers. I'm just at a loss for what to look into next. I don't want to just say "NFS is slow" and live with it if this really has some other cause. :-)
So, anyone have any thoughts as to other possible causes?
Thunderbird has no option for setting the length of timeout from the client side, so there's nothing I can do from that end.
Thanks, Fran
-- Fran Fabrizio Senior Systems Analyst Department of Computer and Information Sciences University of Alabama at Birmingham http://www.cis.uab.edu/ 205.934.0653
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Fran Fabrizio wrote:
I have been having persistent issues with server timeouts on operations on large folders on our Dovecot install (1.0beta2 RPM that came with Fedora Core 5). Some observations from today...
I have a Maildir folder with 38,000 messages (it is my Spam folder, naturally :-). In Thunderbird 1.5.0.2, if I click on that folder, it slowly but steadily downloads the headers and after 3-4 minutes, shows me the contents. So far, so good. If I then select all the messages and click Delete to delete them, that operation times out after about 1 minute. If I then click on my Trash folder after some number of minutes has passed, I see that those messages are now both in my Trash and in the original folder - I believe this is because Thunderbird first asks for a COPY over to Trash, and then asks for a STORE +FLAG \\Deleted on the copy that is still in the original folder. It appears that the COPY commands are working but the STORE commands are not since the connection is timing out before they run.
I thought I'd try on another mail client, so I tried Squirrelmail. I set it to show 1000 messages per page. It does this just fine. Then I set it to 4,000 messages per page. Then I click on a folder with 12,000 messages, and I get this:
ERROR: Connection dropped by IMAP server. Query: SORT (ARRIVAL) ISO-8859-1 ALL
My questions are 1. what is the likely cause? and 2. will upgrading to a newer beta address any of this? I'd prefer to wait until FC5 has an updated RPM if possible. Our mail is stored on an NFS store. (I know NFS is slow and that this might be part of the cause.) However, it seems that the timeouts I am having are more related to times when there is no data flowing between the server and client. Note that the timeouts happened when the server was doing a COPY and when the server was doing a SORT but not when the client was downloading 38,000 headers. I'm just at a loss for what to look into next. I don't want to just say "NFS is slow" and live with it if this really has some other cause. :-)
So, anyone have any thoughts as to other possible causes?
Thunderbird has no option for setting the length of timeout from the client side, so there's nothing I can do from that end.
Thanks, Fran
Fran,
You don't mention the specifics of your setup, but here are some very general things to consider.
NFS is typically slow... definitely slower than local storage, although if it was typically *THAT* slow, it wouldn't be a viable storage solution for a majority of places that use it.
when dealing with any filesystem, typically write speeds will be slower than read speeds. the fact that the timeout is ocurring during a COPY makes me think your problems may be related to write speeds.
However, unless the SORT command is doing disk writes (which i don't imagine it is) then this may not be the cause; *unless* you're running a server that's really underpowered in the RAM dept. and you're doing a lot of paging. especially if you're paging to a single disk mechanism that isn't all that fast.
So, without further details of your system this is about as far as I can speculate, but it seems that either your system is significantly underpowered, or it's mis-configured.
hope this helps.
Alan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFEYstHE2gsBSKjZHQRArWNAJ0RHc3MnZ2SG82k5HygM3V60U1DAACfRxCv QZiilQQmtXP0Dbc7OvmLPhI= =8/ju -----END PGP SIGNATURE-----
On Thursday 11 May 2006 06:27, Alan Premselaar wrote:
NFS is typically slow... definitely slower than local storage, although if it was typically *THAT* slow, it wouldn't be a viable storage solution for a majority of places that use it.
when dealing with any filesystem, typically write speeds will be slower than read speeds. the fact that the timeout is ocurring during a COPY makes me think your problems may be related to write speeds.
NFS also defaults to synchronous operations.
I remember one site where I spent a long time understanding exactly how the NFS, and disk, writes were synchronised. In this case I was able to get write performance from pitiful to near wire speed, and if I'd had modern filesystems I'd have been able to get that without paying such a huge price in leaving data cached in RAM when it should have been safely on disk.
The most obvious sign of synchronous write issues with such systems is disk thrashing, which can usually be seen by watching the disks activity lights, or listening to the disk drives.
On Wed, 2006-05-10 at 16:14 -0500, Fran Fabrizio wrote:
I have a Maildir folder with 38,000 messages (it is my Spam folder, naturally :-). In Thunderbird 1.5.0.2, if I click on that folder, it slowly but steadily downloads the headers and after 3-4 minutes, shows me the contents. So far, so good. If I then select all the messages and click Delete to delete them, that operation times out after about 1 minute. If I then click on my Trash folder after some number of minutes has passed, I see that those messages are now both in my Trash and in the original folder - I believe this is because Thunderbird first asks for a COPY over to Trash, and then asks for a STORE +FLAG \\Deleted on the copy that is still in the original folder. It appears that the COPY commands are working but the STORE commands are not since the connection is timing out before they run.
maildir_copy_with_hardlinks=yes will probably help you here.
I thought I'd try on another mail client, so I tried Squirrelmail. I set it to show 1000 messages per page. It does this just fine. Then I set it to 4,000 messages per page. Then I click on a folder with 12,000 messages, and I get this:
ERROR: Connection dropped by IMAP server. Query: SORT (ARRIVAL) ISO-8859-1 ALL
Well, this makes it stat() all the files. I don't think it should take too long even for 12000 messages though.. Although you could simulate this by doing a "du" for the maildir and see how long it takes.
However after the first time this is done, Dovecot should have cached the results to cache file and the following SORTs should be fast.
Note that the timeouts happened when the server was doing a COPY and when the server was doing a SORT but not when the client was downloading 38,000 headers.
That's because when downloading the mail the client sees data coming all the time.
participants (4)
-
Alan Premselaar
-
Fran Fabrizio
-
Simon Waters
-
Timo Sirainen