multiple messages per second to a single mailbox
Dovecot 2.2.18 on CentOS 6
I have a pair of servers setup with MySQL, Postfix, and Dovecot. Replication is setup and working between the two dovecot instances.
The problem I'm running into is that a single mailbox receives a lot of messages, at times the rate is multiple messages per second. Delivery from Postfix to Dovecot is via tcp based LMTP. When I do 'ps -aef|grep lmtp|wc -l' I get 62 and does not appear to go higher than that. At the moment I have 4500 and 8300 messages queued on two Postfix instances waiting to deliver via LMTP to the same dovecot instance. Deliveries only happen via LMTP and only one of the two nodes actually gets the deliveries.
What I'm seeing is very high load on the system (40) and queues building on the Postfix side. Replication is keeping up. Looking at the logs now I see anywhere from 4-7 messages per second delivered to this single mailbox. I would like to increase that rate a lot.
These machines are VMs hosted on Xenserver 6.x. I have them setup with 8 vCPUs (2 sockets with 4 cores per socket), the dom0 machines have dual HBA connectors back to a SAN and have 128 CPUs and 256GB of RAM and are not taxed. I added a 2nd virtual disk that is used for storing mailbox data. It is ext4 and has noatime set during mount. /var is also mounted with noatime.
The performance graphs in XenCenter show nearly all 8 vCPUs at about 50%, and the writes on the mailbox data disk are about 20%. iostat is showing mostly <5 for await times for the disks, though I do see a 10 now and again.
I'm guessing that maybe I'm hitting a mailbox locking issue and not sure how to reduce the contention and thereby increase the delivery rate to this mailbox.
-Chad
Chad M Stewart cms@balius.com wrote:
Dovecot 2.2.18 on CentOS 6
I have a pair of servers setup with MySQL, Postfix, and Dovecot. Replication is setup and working between the two dovecot instances.
The problem I'm running into is that a single mailbox receives a lot of messages, at times the rate is multiple messages per second. Delivery from Postfix to Dovecot is via tcp based LMTP. When I do 'ps -aef|grep lmtp|wc -l' I get 62 and does not appear to go higher than that. At the moment I have 4500 and 8300 messages queued on two Postfix instances waiting to deliver via LMTP to the same dovecot instance. Deliveries only happen via LMTP and only one of the two nodes actually gets the deliveries.
What I'm seeing is very high load on the system (40) and queues building on the Postfix side. Replication is keeping up. Looking at the logs now I see anywhere from 4-7 messages per second delivered to this single mailbox. I would like to increase that rate a lot.
These machines are VMs hosted on Xenserver 6.x. I have them setup with 8 vCPUs (2 sockets with 4 cores per socket), the dom0 machines have dual HBA connectors back to a SAN and have 128 CPUs and 256GB of RAM and are not taxed. I added a 2nd virtual disk that is used for storing mailbox data. It is ext4 and has noatime set during mount. /var is also mounted with noatime.
The performance graphs in XenCenter show nearly all 8 vCPUs at about 50%, and the writes on the mailbox data disk are about 20%. iostat is showing mostly <5 for await times for the disks, though I do see a 10 now and again.
I'm guessing that maybe I'm hitting a mailbox locking issue and not sure how to reduce the contention and thereby increase the delivery rate to this mailbox.
Could you provide the following info: a) mailbox type (maildir/mbox/dbox/...) [mail_location in dovecot's config] b) file system type (ext2/ext3/ext4/fat32/...) [provided by "df -T" command on my system]
-- A. Filip
On Aug 12, 2015, at 11:04 AM, Andrzej A. Filip andrzej.filip@gmail.com wrote:
<..snip..>
Could you provide the following info: a) mailbox type (maildir/mbox/dbox/...)
maildir
[mail_location in dovecot's config]
/srv/mail/<domain>/<user-mailbox>/
b) file system type (ext2/ext3/ext4/fat32/...) [provided by "df -T" command on my system]
As I said ext4.
Since I posted I've changed a couple of things: ulimit -n 8192, and disabled fsync as in mail_fsync = never. I'm not sure if I'll put it back in the LMTP section or not. Given all the hardware abstraction layers.
-Chad
On Aug 12, 2015, at 11:25 AM, Chad M Stewart cms@balius.com wrote:
<..snip..>
Since I posted I've changed a couple of things: ulimit -n 8192, and disabled fsync as in mail_fsync = never. I'm not sure if I'll put it back in the LMTP section or not. Given all the hardware abstraction layers.
I forgot to mention that on a clean start of dovecot (after making those changes), I counted 25 msgs delivered to this single mailbox (all INBOX) in a second. Then it slowed down. Maybe the answer is less LMTP connections and increase the number of messages per connection. I have Postfix configured to do outbound connection caching and I've seen entries in the logs indicating 30 messages on a given connection. I'll have to look into how to limit dovecot to a certain number of LMTP processes.
-Chad
Chad M Stewart cms@balius.com wrote:
On Aug 12, 2015, at 11:04 AM, Andrzej A. Filip andrzej.filip@gmail.com wrote:
<..snip..>
Could you provide the following info: a) mailbox type (maildir/mbox/dbox/...)
maildir
[mail_location in dovecot's config]
/srv/mail/<domain>/<user-mailbox>/
b) file system type (ext2/ext3/ext4/fat32/...) [provided by "df -T" command on my system]
As I said ext4.
Since I posted I've changed a couple of things: ulimit -n 8192, and disabled fsync as in mail_fsync = never. I'm not sure if I'll put it back in the LMTP section or not. Given all the hardware abstraction layers.
Are docecot and postfix located on the same server? Can postfix access (deliver) directly maildir file directory dovecot uses?
-- A. Filip
On Aug 14, 2015, at 1:01 PM, Andrzej A. Filip andrzej.filip@gmail.com wrote:
Are docecot and postfix located on the same server? Can postfix access (deliver) directly maildir file directory dovecot uses?
For the moment yes they are on the same server. I designed it to be modular, the various components can be placed on different systems with no configuration changes required, should it be necessary.
I'll note that Postfix does not appear to have any problems writing the files to its queue as fast as they are being delivered to it. Postfix and Dovecot are writing to different disks, though both are on the SAN, and both have noatime set.
-Chad
Chad M Stewart cms@balius.com wrote:
On Aug 14, 2015, at 1:01 PM, Andrzej A. Filip andrzej.filip@gmail.com wrote:
Are docecot and postfix located on the same server? Can postfix access (deliver) directly maildir file directory dovecot uses?
For the moment yes they are on the same server. I designed it to be modular, the various components can be placed on different systems with no configuration changes required, should it be necessary.
I'll note that Postfix does not appear to have any problems writing the files to its queue as fast as they are being delivered to it. Postfix and Dovecot are writing to different disks, though both are on the SAN, and both have noatime set.
As I understand:
- maildir is designed for allowing fast "lockless" parallel deliveries
- dovecot's LDA updates some dovecot specific index/cache files even for deliveries to maildir => making postfix deliver to maildir without updating dovecot specific files may solve your performance problems
-- A. Filip
On 14.08.2015 23:20, Andrzej A. Filip wrote:
As I understand:
- maildir is designed for allowing fast "lockless" parallel deliveries
- dovecot's LDA updates some dovecot specific index/cache files even for deliveries to maildir => making postfix deliver to maildir without updating dovecot specific files may solve your performance problems
...with disabling dovecot to update its indexes, sieve filters and so on. I would not configure the system this way.
Kind Regards, Christian
-- No signature available.
On 08/12/2015 17:19, Chad M Stewart wrote:
What I'm seeing is very high load on the system (40) and queues building on the Postfix side. High load means, that there are a lot of processes waiting to run. The most likely cause for this is not CPU consumption, but I/O wait.
Please run vmstat and iostat and post their output.
Greetings Daniel
On Aug 12, 2015, at 11:58 AM, Daniel Tröder troeder@univention.de wrote:
On 08/12/2015 17:19, Chad M Stewart wrote:
What I'm seeing is very high load on the system (40) and queues building on the Postfix side. High load means, that there are a lot of processes waiting to run. The most likely cause for this is not CPU consumption, but I/O wait.
Please run vmstat and iostat and post their output.
I was watching iostat and avg service times, and maybe once every 30-45 seconds I'd see times of 10ms, but otherwise it was below that. I achieved the biggest impact by limiting the number of outbound connections from Postfix to Dovecot. I limited Postfix to 5 connections, which means a total of 10 inbound LMTP to Dovecot. Then I saw near 500 msgs per LMTP connection.
I suspect the problem was a locking issue on the mailbox in question. Too many simultaneous delivery attempts via too many LMTP sessions.
The backlog has cleared so I'm done troubleshooting for now. If this happens again I'll resume looking into it more. These are new servers so I'm tuning for the load, etc..
-Chad
The problem happened again this morning. Removing fsync calls helped, but I'm not sure about leaving that enabled long term.
I still believe the problem is multiple dovecot processes trying to write to a single folder at the same time. (If I could run dtrace I might be able to cobble together a script to prove it.)
I tried writing a sieve script to direct the messages to a set of folders, but I'm not able to make the logic work. I was thinking something like: generate random # ($N) between say 1-10, then file message into folder mail$N. But I didn't find a method to do that within sieve.
My next thought was to try parsing the message-id header. If the first character is [0-5] then fileinto mail1, etc. Then I could go so far as having 36 subfolders which the messages could be written too. This mailbox only keeps messages for a rolling 1d window. Right now for example it has 260,186 messages in the INBOX.
The sieve script I tried (with only about 4 hours of sleep) was
require ["fileinto","regex"];
if header :regex "message-id" "^1" { fileinto "mail1"; } else { keep; }
If anyone has some suggestions on how I might spread the messages out over multiple folders I'd like to hear your thoughts. Again the servers are configured using maildir, so each folder should have its own index and thus file locking contention should be less, at least so goes the theory in my head.
Thank you, Chad
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 2015-08-14 7:52 AM, Chad M Stewart wrote:
The problem happened again this morning. Removing fsync calls helped,
but I'm not sure about leaving that enabled long term.
I still believe the problem is multiple dovecot processes trying to
write to a single folder at the same time. (If I could run dtrace I might be able to cobble together a script to prove it.)
I tried writing a sieve script to direct the messages to a set of
folders, but I'm not able to make the logic work. I was thinking something like: generate random # ($N) between say 1-10, then file message into folder mail$N. But I didn't find a method to do that within sieve.
My next thought was to try parsing the message-id header. If the
first character is [0-5] then fileinto mail1, etc. Then I could go so far as having 36 subfolders which the messages could be written too. This mailbox only keeps messages for a rolling 1d window. Right now for example it has 260,186 messages in the INBOX.
The sieve script I tried (with only about 4 hours of sleep) was
require ["fileinto","regex"];
if header :regex "message-id" "^1" { fileinto "mail1"; } else { keep; }
If anyone has some suggestions on how I might spread the messages out
over multiple folders I'd like to hear your thoughts. Again the servers are configured using maildir, so each folder should have its own index and thus file locking contention should be less, at least so goes the theory in my head.
Thank you, Chad
With that many messages, it may be the index updates that are slowing things down. Depending on your needs or how you use this folder (it sounds automated to me), you may not need the indexes on disk at all. If you set the indexes to RAM, it may help.
mail_location = maildir:~/Maildir:INDEX=MEMORY
as per http://wiki2.dovecot.org/MailLocation#Index_files
If indexes in RAM isn't an option, maybe just put them on a separate spindle / SSD would help.
Please note I'm shooting from the hip with this advice, I don't have a system that pushes dovecot that hard.
Greg -----BEGIN PGP SIGNATURE----- Version: GnuPG v2
iQIcBAEBCAAGBQJV0e/oAAoJECD7Htp+IT9eUXsQALgtSocaTUXWjMzp1oIEZTsT IOBzc25q4LZtNWeQpa/czmc9TBQfmqcRzXspyqu/dTxdesJ05WL3PiD/c66Bsmb8 0652a+6l7rLgMroIgsYCQZNBaoU+7FgIakWQlyHA/Ods7DTFNNRvAVNhZ6e6NcvR pkIM5EtLaHzc7318ahR2mC4tRHbmlSxZT/fnWuCu185VyuG6GiHum0piyplEIRqw ngbHkMROgD/OJTKScngRP73umLznxOfReVrM50tIlgpONY+ngTLvD6rnkbLO1t1Y KdVAc49iEQPTRuRVDDS+kf2haUx9Rh8Zz0R4VRdoiBC9bcRmeuQDNU4mg4fBCgQF JlkIkuYBvZe/y7TcxJ5rJXNtEn6g4Uew7mIt+sMnGx7Ro55kl0X61Qg/IWu+F8m5 aO9Y7jMjptsjm/04YWlaeCNUPolgUQrFLDjiO5uycEZh1NReNZEH5AFM0cYWMhBK PWzDRXwv4XEypNA+y+uDLdN3mX/KcWMUv90ipJT5PdrLPjvgjTbyU/Wr391H01Kf 85ShLVvBsV2NGvXY0ZRAxuF5Ndq04TNrRYDvVgzb4XlCH++l2WeZJ2CZ8b1KL5AC BydVyp5tWoP5TIzelG6NcMcqNjD05rtmcH2QCT1BLMksY9+7YkbU6IX+W5Dc5sOX krjSPNAsLOFJJCxBIZ/S =AYpS -----END PGP SIGNATURE-----
On Aug 17, 2015, at 9:30 AM, Gregory Finch gfinch@ldmltd.ca wrote:
<..snip..>
With that many messages, it may be the index updates that are slowing things down. Depending on your needs or how you use this folder (it sounds automated to me), you may not need the indexes on disk at all. If you set the indexes to RAM, it may help.
mail_location = maildir:~/Maildir:INDEX=MEMORY
as per http://wiki2.dovecot.org/MailLocation#Index_files
If indexes in RAM isn't an option, maybe just put them on a separate spindle / SSD would help.
Please note I'm shooting from the hip with this advice, I don't have a system that pushes dovecot that hard.
Greg
This particular mailbox is unique. This mailbox holds a copy of member to member communications. The mailbox also receives a lot of unfiltered spam. You're right, it is the index updates that were limiting the message insertion rate. I wrote a sieve script that divides the messages based on the first character of the message-id header. For now this appears to be enough to spread out the index updates. The performance metrics indicate I'm now seeing double the IOPS as before the sieve script. With no noticeable increase in IO wait times.
If the sieve script ever stops being enough, then I'll look into moving this mailboxes index files to RAM.
Thank you for the tip!
Regards, -Chad
participants (5)
-
Andrzej A. Filip
-
Chad M Stewart
-
Christian Schmidt
-
Daniel Tröder
-
Gregory Finch