[Dovecot] Dovecot Deliver CPU usage
Hi All,
We have upgraded the dovecot from 1.0 to 1.2.8. After upgradation we have noticed that dovecot deliver process is utilizing a lot of CPU. As a result the mailsystem will stop working. When we limit the no of procs in postfix for dovecot deliver the CPU usage will come down. But we are not facing this problem with the previous version. Below is the log files for dovecot when the error occurs
write_full(/home/vmail/dhanbank.co.in/vinukumarvm/temp.server.be3e5a849a909847) failed: No space left on device dovecot: deliver(user@example.com): stat(/tmp/Dovecot Delivery Mail) failed: No space left on device
But my system has 54gb of free space for root filesystem where tmp residing. Can anyone help me to solve this issue.
Rgds, Aravind
On 11/28/2009, Aravind Divakaran (aravind.divakaran@yukthi.com) wrote:
write_full(/home/vmail/dhanbank.co.in/vinukumarvm/temp.server.be3e5a849a909847) failed: No space left on device dovecot: deliver(user@example.com): stat(/tmp/Dovecot Delivery Mail) failed: No space left on device
But my system has 54gb of free space for root filesystem where tmp residing. Can anyone help me to solve this issue.
Its not looking at / for the temp, it is using the users home directory, according to the above...
Are you using quotas?
dovecot -n output?
On Sat, 28 Nov 2009 09:17:12 -0500, Charles Marcus <CMarcus@Media-Brokers.com> wrote:
On 11/28/2009, Aravind Divakaran (aravind.divakaran@yukthi.com) wrote:
write_full(/home/vmail/dhanbank.co.in/vinukumarvm/temp.server.be3e5a849a909847)
failed: No space left on device dovecot: deliver(user@example.com): stat(/tmp/Dovecot Delivery Mail) failed: No space left on device
But my system has 54gb of free space for root filesystem where tmp residing. Can anyone help me to solve this issue.
Its not looking at / for the temp, it is using the users home directory, according to the above...
Are you using quotas?
dovecot -n output?
Yes i am using quota, below is my configuration
# 1.2.8: /etc/dovecot/dovecot.conf # OS: Linux 2.6.30 x86_64 Debian 5.0 ocfs2 log_timestamp: %Y-%m-%d %H:%M:%S protocols: imap imaps pop3 pop3s managesieve disable_plaintext_auth: no login_dir: /var/run/dovecot/login login_executable(default): /usr/lib/dovecot/imap-login login_executable(imap): /usr/lib/dovecot/imap-login login_executable(pop3): /usr/lib/dovecot/pop3-login login_executable(managesieve): /usr/lib/dovecot/managesieve-login mail_privileged_group: mail mail_uid: 1003 mail_gid: 1003 mail_location: maildir:/home/vmail/%d/%n mbox_write_locks: fcntl dotlock mail_executable(default): /usr/lib/dovecot/imap mail_executable(pop3): /usr/lib/dovecot/pop3 mail_executable(managesieve): /usr/lib/dovecot/managesieve mail_plugins(default): quota imap_quota acl listescape mail_plugins(imap): quota imap_quota acl listescape mail_plugins(pop3): quota mail_plugins(managesieve): mail_plugin_dir(default): /usr/lib/dovecot/modules/imap mail_plugin_dir(imap): /usr/lib/dovecot/modules/imap mail_plugin_dir(pop3): /usr/lib/dovecot/modules/pop3 mail_plugin_dir(managesieve): /usr/lib/dovecot/modules/managesieve namespace: type: private separator: / inbox: yes list: yes subscriptions: yes lda: postmaster_address: postmaster@example.com mail_plugins: quota sieve auth default: mechanisms: login plain passdb: driver: ldap args: /etc/dovecot/dovecot-ldap.conf userdb: driver: ldap args: /etc/dovecot/dovecot-ldap.conf socket: type: listen client: path: /var/spool/postfix/private/auth mode: 432 user: postfix group: postfix master: path: /var/run/dovecot/auth-master mode: 384 user: vmail plugin: quota: maildir:storage=250000 sieve: /home/vmail/sieve/%d/%n/.dovecot.sieve sieve_storage: /home/vmail/sieve/%d/%n/sieve sieve_extensions: +imapflags acl: vfile:/etc/dovecot/acls
Users quota is getting from ldap.
-Aravind
On Nov 28, 2009, at 3:21 AM, Aravind Divakaran wrote:
We have upgraded the dovecot from 1.0 to 1.2.8. After upgradation we have noticed that dovecot deliver process is utilizing a lot of CPU. As a result the mailsystem will stop working. When we limit the no of procs in postfix for dovecot deliver the CPU usage will come down. But we are not facing this problem with the previous version. Below is the log files for dovecot when the error occurs
You mean a single deliver process can run a long time with 100% CPU? Getting a strace of the process could be helpful.
write_full(/home/vmail/dhanbank.co.in/vinukumarvm/temp.server.be3e5a849a909847) failed: No space left on device dovecot: deliver(user@example.com): stat(/tmp/Dovecot Delivery Mail) failed: No space left on device
But my system has 54gb of free space for root filesystem where tmp residing. Can anyone help me to solve this issue.
The /tmp/ part is wrong, I should get that fixed probably. Is there space in /home/vmail? Or do you use filesystem quotas? (I see you're also using Maildir++ quotas.)
On Nov 28, 2009, at 3:21 AM, Aravind Divakaran wrote:
We have upgraded the dovecot from 1.0 to 1.2.8. After upgradation we have noticed that dovecot deliver process is utilizing a lot of CPU. As a result the mailsystem will stop working. When we limit the no of procs in postfix for dovecot deliver the CPU usage will come down. But we are not facing this problem with the previous version. Below is the log files for dovecot when the error occurs
You mean a single deliver process can run a long time with 100% CPU? Getting a strace of the process could be helpful.
Yes, dovecot deliver process is eating a lot of CPU. please find my strace output on the below link.
write_full(/home/vmail/dhanbank.co.in/vinukumarvm/temp.server.be3e5a849a909847) failed: No space left on device dovecot: deliver(user@example.com): stat(/tmp/Dovecot Delivery Mail) failed: No space left on device
But my system has 54gb of free space for root filesystem where tmp residing. Can anyone help me to solve this issue.
The /tmp/ part is wrong, I should get that fixed probably. Is there space in /home/vmail? Or do you use filesystem quotas? (I see you're also using Maildir++ quotas.)
My home partition has 353gb of free space. I am using maildir quota not using filesystem quota.
On Dec 1, 2009, at 4:48 AM, Aravind Divakaran wrote:
You mean a single deliver process can run a long time with 100% CPU? Getting a strace of the process could be helpful.
Yes, dovecot deliver process is eating a lot of CPU. please find my strace output on the below link.
strace -tt would have actually been also more helpful, so it would have included timestamps. Are you using NFS or something similar for your filesystem in /home? The strace does show that Dovecot seems to be reading from a file that it just wrote to, so there's something non-optimal but it necessarily shouldn't be that slow.
On Dec 1, 2009, at 4:48 AM, Aravind Divakaran wrote:
You mean a single deliver process can run a long time with 100% CPU? Getting a strace of the process could be helpful.
Yes, dovecot deliver process is eating a lot of CPU. please find my strace output on the below link.
strace -tt would have actually been also more helpful, so it would have included timestamps. Are you using NFS or something similar for your filesystem in /home? The strace does show that Dovecot seems to be reading from a file that it just wrote to, so there's something non-optimal but it necessarily shouldn't be that slow.
You can find my strace with timestamp on the below link.
My /home partition using san storage with ocfs filesystem.
I have configured mailarchivel, where in which a copy of all mails will sent to one mailbox. In the archive mailbox using sieve script i am sorting the mail according to username, year, month and delivering it.
When ever i am removing my max procs option limit from 4 to default in postfix. System cpu usage become high. Then i have to put the limit again to make my cpu usage go down.
Rgds, Aravind
On 12/3/2009, Aravind Divakaran (aravind.divakaran@yukthi.com) wrote:
You can find my strace with timestamp on the below link.
Don't do that, it wastes time and breaks archives (when the pastebin file goes away.
Just paste it in the email body...
(yes this is a peeve of mine)
--
Best regards,
Charles
On Thu, 03 Dec 2009 06:59:31 -0500 Charles Marcus <CMarcus@Media-Brokers.com> replied:
On 12/3/2009, Aravind Divakaran (aravind.divakaran@yukthi.com) wrote:
You can find my strace with timestamp on the below link.
Don't do that, it wastes time and breaks archives (when the pastebin file goes away.
Just paste it in the email body...
(yes this is a peeve of mine)
Which is fine if the contents are of a relatively short length. However, if it is a multi-page document, I would much rather use 'pastebin' for viewing. Besides, the user can set the relative life span of the posted document.
And yes, this is a pet peeve of mine. Receiving reams of documentation in an email, that has not been edited to remove extraneous gibberish, combined with top, bottom and in-line notations.
-- Jerry gesbbb@yahoo.com
|::::======= |::::======= |=========== |=========== |
"Why are there interstate highways in Hawaii?"
On 12/3/2009, Jerry (gesbbb@yahoo.com) wrote:
Don't do that, it wastes time and breaks archives (when the pastebin file goes away.
Just paste it in the email body...
(yes this is a peeve of mine)
Which is fine if the contents are of a relatively short length.
Obviously, you can do what you want, but many people will just click 'Delete' on your message if you use this kind of service, so you are missing out on an unknown amount of help.
However, if it is a multi-page document, I would much rather use 'pastebin' for viewing. Besides, the user can set the relative life span of the posted document.
And yes, this is a pet peeve of mine. Receiving reams of documentation in an email, that has not been edited to remove extraneous gibberish,
Trimming replies is always helpful. Thankfully, on most of the better tech support lists, it is mandated. The postfix list is the best example of this - and they will tell you the same thing about using pastebin type services.
combined with top, bottom and in-line notations.
This has absolutely nothing to do with the top vs bottom posting argument, so don't know whay you even brought that into it.
Anyway, as I said, do what you want...
--
Best regards,
Charles
On Thu, 2009-12-03 at 08:07 -0500, Charles Marcus wrote:
On 12/3/2009, Jerry (gesbbb@yahoo.com) wrote:
Don't do that, it wastes time and breaks archives (when the pastebin file goes away.
Just paste it in the email body...
(yes this is a peeve of mine)
Which is fine if the contents are of a relatively short length.
Obviously, you can do what you want, but many people will just click 'Delete' on your message if you use this kind of service, so you are missing out on an unknown amount of help.
+1 I never go to sites like that or those pesky tiny urls sites
On Dec 3, 2009, at 5:50 AM, Aravind Divakaran wrote:
strace -tt would have actually been also more helpful, so it would have included timestamps. Are you using NFS or something similar for your filesystem in /home? The strace does show that Dovecot seems to be reading from a file that it just wrote to, so there's something non-optimal but it necessarily shouldn't be that slow.
You can find my strace with timestamp on the below link.
That's only 400 milliseconds. I thought the delivers would be running for at least a few seconds? Can you show such a strace -tt?
My /home partition using san storage with ocfs filesystem.
Anyway I suppose this could be the reason, if the problem is the reads..
You could also try if it helps when you increase this in src/deliver/deliver.c:
#define MAIL_MAX_MEMORY_BUFFER (1024*128)
to
#define MAIL_MAX_MEMORY_BUFFER (1024*1024*100)
i.e. 100 MB (or whatever is large enough for most mails).
On Thu, 3 Dec 2009 09:05:21 -0600, Timo Sirainen <tss@iki.fi> wrote:
On Dec 3, 2009, at 5:50 AM, Aravind Divakaran wrote:
strace -tt would have actually been also more helpful, so it would have included timestamps. Are you using NFS or something similar for your filesystem in /home? The strace does show that Dovecot seems to be reading from a file that it just wrote to, so there's something non-optimal but it necessarily shouldn't be that slow.
You can find my strace with timestamp on the below link.
That's only 400 milliseconds. I thought the delivers would be running for at least a few seconds? Can you show such a strace -tt?
My /home partition using san storage with ocfs filesystem.
Anyway I suppose this could be the reason, if the problem is the reads..
You could also try if it helps when you increase this in src/deliver/deliver.c:
#define MAIL_MAX_MEMORY_BUFFER (1024*128)
to
#define MAIL_MAX_MEMORY_BUFFER (1024*1024*100)
i.e. 100 MB (or whatever is large enough for most mails).
I had made the changes for MAIL_MAX_MEMORY_BUFFER and installed the new dovecot.
I was monitoring my mail server for past few days. Today i experienced the same problem
My log file shows
Dec 14 12:38:19 server01 dovecot: deliver(user1@example.com): Created dotlock file's timestamp is different than current time (1260774499 vs 1260774377): /home/vmail/example.com/user1/dovecot-uidlist Dec 14 12:38:19 server01 dovecot: deliver(user2@example.com): Created dotlock file's timestamp is different than current time (1260774499 vs 1260774377): /home/vmail/example.com/user2/dovecot-uidlist
The above erros will show when i change the maxproc from 4 to default in postfix. At this time dovecot deliver process is using lot of cpu.
Dec 14 12:46:04 server01 postfix/pipe[1870]: 57489E913: to=<user1@example.com>, orig_to=<user2@example.com>, relay=dovecot, delay=1703, delays=1369/328/0/5.9, dsn=5.7.0, status=bounced (permission denied. Command output: Not enough disk space )
As my hardisk has enough space, why dovecot is giving this error?
-Aravind
On Mon, 2009-12-14 at 13:04 +0530, Aravind Divakaran wrote:
You can find my strace with timestamp on the below link.
This strace shows that it succeeds.
My /home partition using san storage with ocfs filesystem. .. Dec 14 12:38:19 server01 dovecot: deliver(user1@example.com): Created dotlock file's timestamp is different than current time (1260774499 vs 1260774377): /home/vmail/example.com/user1/dovecot-uidlist
You should get this fixed. I don't know how exactly OCFS works, but apparently all of the servers accessing it don't have their system times synchronized. The above shows a 122 second difference.
Dec 14 12:46:04 server01 postfix/pipe[1870]: 57489E913: to=<user1@example.com>, orig_to=<user2@example.com>, relay=dovecot, delay=1703, delays=1369/328/0/5.9, dsn=5.7.0, status=bounced (permission denied. Command output: Not enough disk space )
Can you get a strace from this kind of a failed delivery? The strace should contain ENOSPC string.
On Tue, 2009-12-15 at 15:08 -0500, Timo Sirainen wrote:
Dec 14 12:46:04 server01 postfix/pipe[1870]: 57489E913: to=<user1@example.com>, orig_to=<user2@example.com>, relay=dovecot, delay=1703, delays=1369/328/0/5.9, dsn=5.7.0, status=bounced (permission denied. Command output: Not enough disk space )
Also what did deliver log? It should have logged something like:
rejected: Not enough disk space
On Tue, 2009-12-15 at 15:08 -0500, Timo Sirainen wrote:
Dec 14 12:46:04 server01 postfix/pipe[1870]: 57489E913: to=<user1@example.com>, orig_to=<user2@example.com>, relay=dovecot, delay=1703, delays=1369/328/0/5.9, dsn=5.7.0, status=bounced (permission denied. Command output: Not enough disk space )
Can you get a strace from this kind of a failed delivery? The strace should contain ENOSPC string.
Looking at the code .. I think that can happen with both EDQUOT and ENOSPC errors from a write, but nothing else..
On Tue, 2009-12-15 at 15:08 -0500, Timo Sirainen wrote:
Dec 14 12:46:04 server01 postfix/pipe[1870]: 57489E913: to=<user1@example.com>, orig_to=<user2@example.com>, relay=dovecot, delay=1703, delays=1369/328/0/5.9, dsn=5.7.0, status=bounced (permission denied. Command output: Not enough disk space )
Can you get a strace from this kind of a failed delivery? The strace should contain ENOSPC string.
Looking at the code .. I think that can happen with both EDQUOT and ENOSPC errors from a write, but nothing else..
Strace with ENOSPC
09:57:39.782569 write(10, "T5vfloaIeSOsJdT6IXB146wSWLNKqK\n4v"..., 28672) = -1 ENOSPC (No space left on device) 09:57:40.695074 fstat(10, {st_mode=S_IFREG|0644, st_size=2068480, ...}) = 0 09:57:40.707778 munmap(0x7fa35c97b000, 2101248) = 0 09:57:40.708119 close(10) = 0 09:57:40.708162 unlink("/home/vmail/example.com/user1/tmp/1261456056.M913860P17478.node01") = 0 09:57:40.711774 sendto(6, "<22>Dec 22 09:57:40 dovecot: deli"..., 201, MSG_NOSIGNAL, NULL, 0) = 201 09:57:40.712087 munmap(0x7fa35dc47000, 200704) = 0 09:57:40.712148 write(2, "Not enough disk space\n"..., 22) = 22 09:57:40.712235 sendto(6, "<22>Dec 22 09:57:40 dovecot: deli"..., 189, MSG_NOSIGNAL, NULL, 0) = 189 09:57:40.712353 exit_group(77)
On Tue, 2009-12-22 at 10:03 +0530, Aravind Divakaran wrote:
Strace with ENOSPC
09:57:39.782569 write(10, "T5vfloaIeSOsJdT6IXB146wSWLNKqK\n4v"..., 28672) = -1 ENOSPC (No space left on device) .. 09:57:40.708162 unlink("/home/vmail/example.com/user1/tmp/1261456056.M913860P17478.node01") = 0
Well, that really looks as if there was no disk space left.. That's what the kernel anyway told Dovecot.
participants (5)
-
Aravind Divakaran
-
Charles Marcus
-
Jerry
-
Noel Butler
-
Timo Sirainen