[Dovecot] Dovecot very slow on a mailbox with > 700 IMAP Folders
Hello, I am using Dovecot 1.2.9 on a 32bit on Ubuntu 10.04.2 LTS.
We got one user who uses folders in an "exessive" way. He has got 704 subscribed folders. That means about nearly 3000 folders on the filesystem. Since ~ a week the user couldn't work because of timeouts when connecting to Dovecot (IMAP). Thunderbird doesn't show any folders
- timeout - the webmail System (Groupoffice) only presents the INBOX.
I did some debugging and found that a lsub "" "*" that is used by Thunderbird takes > 5 minutes. Thunderbird disconnects after ~1 Minute showing the timeout. I entered the command directly via port communication. So this is no clientside problem. But there is no error message or something else in any logs.
I did some debugging with strace, too. It's about 22MB only for running the lsub command !? The Mailbox size is about ~2.2GB, so this should be no problem. The maximum subfolder level is about 13 - could this be a problem? You can find the structure (simply shown) attached.
Thanks for any help!!
Best regards Ronny Becker
-- Mit freundlichen Grüßen, Ronny Becker
Institut für Medizinische Diagnostik GmbH Ingelheim / Datenverarbeitung Konrad Adenauer Strasse 17 55218 Ingelheim Telefon: 06132 781 249 Fax: 06132 781 9 249
ronny.becker@bioscientia.de www.bioscientia.de
Gesellschaft mit beschränkter Haftung mit Sitz in 55218 Ingelheim am Rhein, eingetragen im Handelsregister des Amtsgerichts Mainz unter HRB 21166. Geschäftsführer: PD. Dr. med. Markus Nauck und Dipl.-Vw. Johannes Brill. Vorsitzender des Aufsichtsrates: Prof. Dr. med. Bernd Heicke. Diese E-Mail kann vertrauliche oder auf andere Weise geschützte Informationen enthalten. Sollten Sie diese E-Mail irrtümlich erhalten haben, löschen Sie diese bitte von Ihrem System und setzen Sie uns unverzüglich von dem Vorfall in Kenntnis. USt-ID-Nr. DE 811138229
Limited liability company with registered office in 55218 Ingelheim am Rhein, registered with the commercial register of the local court of Mainz under HRB 21166. Managing directors: PD Dr. med. Markus Nauck and Dipl.-Vw. Johannes Brill. Chairman of the supervisory board: Prof. Dr. med. Bernd Heicke. This e-mail may contain confidential or otherwise privileged information. If you are not the intended recipient, please inform us immediately and delete the e-mail from your system. VAT-ID-Nr. DE 811138229
On 5.5.2011, at 10.04, Becker, Ronny wrote:
I did some debugging and found that a lsub "" "*" that is used by Thunderbird takes > 5 minutes.
Something's very wrong. I just tested with 1000 folders in a subscriptions file and it lists it in less than a second.
I did some debugging with strace, too. It's about 22MB only for running the lsub command !?
Run it with strace -tt and send me the output compressed?
Also dovecot -n output could be useful.
On 5.5.2011, at 10.19, Timo Sirainen wrote:
I did some debugging with strace, too. It's about 22MB only for running the lsub command !?
Run it with strace -tt and send me the output compressed?
Ah, you have ACLs enabled. With ACLs it's looking up dovecot-acl file from each folder before returning it. This shouldn't be necessary with LSUB I think. I suppose some kind of an ACL cache could be a good idea some day too. And maybe a setting not to bother looking up ACLs for mailboxes in private namespaces.
I'll try to do something about those within a few days..
Am 05.05.2011 10:44, schrieb Timo Sirainen:
On 5.5.2011, at 10.19, Timo Sirainen wrote:
I did some debugging with strace, too. It's about 22MB only for running the lsub command !? Run it with strace -tt and send me the output compressed? Ah, you have ACLs enabled. With ACLs it's looking up dovecot-acl file from each folder before returning it. This shouldn't be necessary with LSUB I think. I suppose some kind of an ACL cache could be a good idea some day too. And maybe a setting not to bother looking up ACLs for mailboxes in private namespaces. Ok, so for getting off this problem I will disable acl support. I'll try to do something about those within a few days..
I think it would be really a good thing to optimize dovecot this way. So you can reduce a lot of I/O. Hopefully I can get this fix - if available-, because I am using the Ubuntu packages !?!? Gesellschaft mit beschränkter Haftung mit Sitz in 55218 Ingelheim am Rhein, eingetragen im Handelsregister des Amtsgerichts Mainz unter HRB 21166. Geschäftsführer: PD. Dr. med. Markus Nauck und Dipl.-Vw. Johannes Brill. Vorsitzender des Aufsichtsrates: Prof. Dr. med. Bernd Heicke. Diese E-Mail kann vertrauliche oder auf andere Weise geschützte Informationen enthalten. Sollten Sie diese E-Mail irrtümlich erhalten haben, löschen Sie diese bitte von Ihrem System und setzen Sie uns unverzüglich von dem Vorfall in Kenntnis. USt-ID-Nr. DE 811138229
Limited liability company with registered office in 55218 Ingelheim am Rhein, registered with the commercial register of the local court of Mainz under HRB 21166. Managing directors: PD Dr. med. Markus Nauck and Dipl.-Vw. Johannes Brill. Chairman of the supervisory board: Prof. Dr. med. Bernd Heicke. This e-mail may contain confidential or otherwise privileged information. If you are not the intended recipient, please inform us immediately and delete the e-mail from your system. VAT-ID-Nr. DE 811138229
Am 05.05.2011 10:44, schrieb Timo Sirainen:
On 5.5.2011, at 10.19, Timo Sirainen wrote:
I did some debugging with strace, too. It's about 22MB only for running the lsub command !? Run it with strace -tt and send me the output compressed? Ah, you have ACLs enabled. With ACLs it's looking up dovecot-acl file from each folder before returning it. This shouldn't be necessary with LSUB I think. I suppose some kind of an ACL cache could be a good idea some day too. And maybe a setting not to bother looking up ACLs for mailboxes in private namespaces.
I'll try to do something about those within a few days..
Someone told me, that this problem should not happen when acl_shared_dict is used. But this was enabled in our setup. Is there any other workaround to use ACLs with such a large number of folders?!
On 5/10/2011 1:19 AM, Becker, Ronny wrote:
Am 05.05.2011 10:44, schrieb Timo Sirainen:
On 5.5.2011, at 10.19, Timo Sirainen wrote:
I did some debugging with strace, too. It's about 22MB only for running the lsub command !? Run it with strace -tt and send me the output compressed? Ah, you have ACLs enabled. With ACLs it's looking up dovecot-acl file from each folder before returning it. This shouldn't be necessary with LSUB I think. I suppose some kind of an ACL cache could be a good idea some day too. And maybe a setting not to bother looking up ACLs for mailboxes in private namespaces.
I'll try to do something about those within a few days..
Someone told me, that this problem should not happen when acl_shared_dict is used. But this was enabled in our setup. Is there any other workaround to use ACLs with such a large number of folders?!
In the absence of Timo producing the patches he mentioned in short order...
Install this 3.5" Vertex2 120GB SATA II SSD for less than $250 USD incl/shipping: http://www.newegg.com/Product/Product.aspx?Item=N82E16820227590 (If you're shy of using a 'consumer' marketed unit, buy the 64GB Intel SLC for ~$700)
This Vertex2 will give you 50,000 random write IOPS (~same for read) with 0.1ms seek latency, approximately 3x the IOPS of a $75k 60x15k SAS drive Nexsan e60 FC SAN array and 50x lower seek latency--but with only 1/300th the capacity.
Format with EXT3/4, and simply move your problem user's entire mail directory to the SSD and change his mail location setting. Problem solved instantly, with authority. As you should have ~100GB of space left on the SSD after moving him/her over, move all user indexes to the SSD as well. This will yield an incredible speed boost for all users, and prevent any 'jealously' politics when word spreads of 'Bob' getting his own super-duper fast drive in the server.
-- Stan
On 05/05/2011 09:44, Timo Sirainen wrote:
On 5.5.2011, at 10.19, Timo Sirainen wrote:
I did some debugging with strace, too. It's about 22MB only for running the lsub command !?
Run it with strace -tt and send me the output compressed?
Ah, you have ACLs enabled. With ACLs it's looking up dovecot-acl file from each folder before returning it. This shouldn't be necessary with LSUB I think. I suppose some kind of an ACL cache could be a good idea some day too. And maybe a setting not to bother looking up ACLs for mailboxes in private namespaces.
Silly question, but I presume there is effectively no "negative caching" from the OS for missing files, but presumably the OS would cache a zero length file?
Would it be faster then to create 700 zero length acl files so that these lookups will at least be returned from the OS cache instead of causing IO seeks? First open will be slow, but faster thereafter?
Just an idea?
Ed W
On Thu, 2011-05-05 at 10:44 +0200, Timo Sirainen wrote:
Ah, you have ACLs enabled. With ACLs it's looking up dovecot-acl file from each folder before returning it. This shouldn't be necessary with LSUB I think. I suppose some kind of an ACL cache could be a good idea some day too. And maybe a setting not to bother looking up ACLs for mailboxes in private namespaces.
http://hg.dovecot.org/dovecot-2.0/rev/a7f1980d250c should help for your specific problem.
On Thu, 5 May 2011, Timo Sirainen wrote:
On 5.5.2011, at 10.04, Becker, Ronny wrote:
I did some debugging and found that a lsub "" "*" that is used by Thunderbird takes > 5 minutes.
Something's very wrong. I just tested with 1000 folders in a subscriptions file and it lists it in less than a second.
Interesting. I just converted a system from Courier IMAP to Dovecot, and found a massive performance increase. My own personal account has 655 subscribed folders containing about 1.3 million messages. An LSUB "" "*" command takes of the order of about 0.1 second (the IMAP server has SATA disks and an Xeon E5345's at 2.33 GHz). Using alpine as client, a full scan of all messages to find unread messages (";puz") takes about 10 seconds - with Courier, this takes over a minute. I don't know of a way to do the same test in tbird.
Steve
on 5/5/2011 1:04 AM Becker, Ronny spake the following:
Hello, I am using Dovecot 1.2.9 on a 32bit on Ubuntu 10.04.2 LTS.
We got one user who uses folders in an "exessive" way. He has got 704 subscribed folders. That means about nearly 3000 folders on the filesystem. Since ~ a week the user couldn't work because of timeouts when connecting to Dovecot (IMAP). Thunderbird doesn't show any folders - timeout - the webmail System (Groupoffice) only presents the INBOX.
I did some debugging and found that a lsub "" "*" that is used by Thunderbird takes > 5 minutes. Thunderbird disconnects after ~1 Minute showing the timeout. I entered the command directly via port communication. So this is no clientside problem. But there is no error message or something else in any logs.
I did some debugging with strace, too. It's about 22MB only for running the lsub command !? The Mailbox size is about ~2.2GB, so this should be no problem. The maximum subfolder level is about 13 - could this be a problem? You can find the structure (simply shown) attached.
Thanks for any help!!
Best regards Ronny Becker
What is the underlying filesystem, and if its ext3, is dir_index set on?
Am 05.05.2011 22:24, schrieb Scott Silva:
on 5/5/2011 1:04 AM Becker, Ronny spake the following:
Hello, I am using Dovecot 1.2.9 on a 32bit on Ubuntu 10.04.2 LTS.
We got one user who uses folders in an "exessive" way. He has got 704 subscribed folders. That means about nearly 3000 folders on the filesystem. Since ~ a week the user couldn't work because of timeouts when connecting to Dovecot (IMAP). Thunderbird doesn't show any folders - timeout - the webmail System (Groupoffice) only presents the INBOX.
I did some debugging and found that a lsub "" "*" that is used by Thunderbird takes> 5 minutes. Thunderbird disconnects after ~1 Minute showing the timeout. I entered the command directly via port communication. So this is no clientside problem. But there is no error message or something else in any logs.
I did some debugging with strace, too. It's about 22MB only for running the lsub command !? The Mailbox size is about ~2.2GB, so this should be no problem. The maximum subfolder level is about 13 - could this be a problem? You can find the structure (simply shown) attached.
Thanks for any help!!
Best regards Ronny Becker
What is the underlying filesystem, and if its ext3, is dir_index set on?
The underlying Filesystem is ext3 with dir_index enabled. As Timo Sirainen explained the problem seems to be the ACL plugin. With ACL enabled every folder is scanned for a dovecot-acl file and that needs a lot of time. In this case there are >700 subscribed folders using >3600 folders on the filesystem. Gesellschaft mit beschränkter Haftung mit Sitz in 55218 Ingelheim am Rhein, eingetragen im Handelsregister des Amtsgerichts Mainz unter HRB 21166. Geschäftsführer: PD. Dr. med. Markus Nauck und Dipl.-Vw. Johannes Brill. Vorsitzender des Aufsichtsrates: Prof. Dr. med. Bernd Heicke. Diese E-Mail kann vertrauliche oder auf andere Weise geschützte Informationen enthalten. Sollten Sie diese E-Mail irrtümlich erhalten haben, löschen Sie diese bitte von Ihrem System und setzen Sie uns unverzüglich von dem Vorfall in Kenntnis. USt-ID-Nr. DE 811138229
Limited liability company with registered office in 55218 Ingelheim am Rhein, registered with the commercial register of the local court of Mainz under HRB 21166. Managing directors: PD Dr. med. Markus Nauck and Dipl.-Vw. Johannes Brill. Chairman of the supervisory board: Prof. Dr. med. Bernd Heicke. This e-mail may contain confidential or otherwise privileged information. If you are not the intended recipient, please inform us immediately and delete the e-mail from your system. VAT-ID-Nr. DE 811138229
participants (6)
-
Becker, Ronny
-
Ed W
-
Scott Silva
-
Stan Hoeppner
-
Steve Thompson
-
Timo Sirainen