[Dovecot] deliver: command died with signal 6
We recently upgraded to dovecot v1.0.15 (from v1.0.0 + some local fixes), and after this upgrade we've started to get a couple of failures from deliver:
Jan 12 20:34:34 smtp1.ulh.mydomain.net deliver(someuser@somedomain.net): Raw backtrace: /usr/local/dovecot/libexec/dovecot/deliver(i_syslog_panic_handler+0x1c) [0x45577c] -> /usr/local/dovecot/libexec/dovecot/deliver [0x45537c] -> /usr/local/dovecot/libexec/dovecot/deliver(mail_index_sync_update_index+0x86f) [0x43eb8f] -> /usr/local/dovecot/libexec/dovecot/deliver(mail_index_sync_begin+0x245) [0x43c6e5] -> /usr/local/dovecot/libexec/dovecot/deliver(maildir_sync_index_begin+0x45) [0x4162d5] -> /usr/local/dovecot/libexec/dovecot/deliver(maildir_transaction_save_commit_pre+0x68) [0x41c638] -> /usr/local/dovecot/libexec/dovecot/deliver(maildir_transaction_commit+0x70) [0x417320] -> /usr/local/dovecot-1.0.15/lib/dovecot/lda/lib10_quota_plugin.so [0x2a9557d3a8] -> /usr/local/dovecot/libexec/dovecot/deliver(deliver_save+0x136) [0x410856] -> /usr/local/dovecot/libexec/dovecot/deliver(main+0x1023) [0x411c43] -> /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3cbd81c40b] -> /usr/local/dovecot/libexec/dovecot/deliver [0x40ffaa]
Jan 12 20:38:53 smtp2.ulh.mydomain.net deliver(quarantine@mydomain.net): Raw backtrace: /usr/local/dovecot/libexec/dovecot/deliver(i_syslog_panic_handler+0x1c) [0x45577c] -> /usr/local/dovecot/libexec/dovecot/deliver [0x45537c] -> /usr/local/dovecot/libexec/dovecot/deliver(mail_index_sync_update_index+0x86f) [0x43eb8f] -> /usr/local/dovecot/libexec/dovecot/deliver(mail_index_sync_begin+0x245) [0x43c6e5] -> /usr/local/dovecot/libexec/dovecot/deliver(maildir_sync_index_begin+0x45) [0x4162d5] -> /usr/local/dovecot/libexec/dovecot/deliver [0x416ed5] -> /usr/local/dovecot/libexec/dovecot/deliver(maildir_sync_last_commit+0x4b) [0x4170bb] -> /usr/local/dovecot-1.0.15/lib/dovecot/lda/lib10_quota_plugin.so [0x2a9557d3a8] -> /usr/local/dovecot/libexec/dovecot/deliver(deliver_save+0x136) [0x410856] -> /usr/local/dovecot/libexec/dovecot/deliver(main+0x1023) [0x411c43] -> /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x322e11c40b] -> /usr/local/dovecot/libexec/dovecot/deliver [0x40ffaa]
Jan 13 10:37:59 atmail1.ulh.mydomain.net deliver(quarantine@mydomain.net): Raw backtrace: /usr/local/dovecot/libexec/dovecot/deliver(i_syslog_panic_handler+0x1c) [0x45577c] -> /usr/local/dovecot/libexec/dovecot/deliver [0x45537c] -> /usr/local/dovecot/libexec/dovecot/deliver(mail_index_sync_update_index+0x86f) [0x43eb8f] -> /usr/local/dovecot/libexec/dovecot/deliver(mail_index_sync_begin+0x245) [0x43c6e5] -> /usr/local/dovecot/libexec/dovecot/deliver(maildir_sync_index_begin+0x45) [0x4162d5] -> /usr/local/dovecot/libexec/dovecot/deliver(maildir_transaction_save_commit_pre+0x68) [0x41c638] -> /usr/local/dovecot/libexec/dovecot/deliver(maildir_transaction_commit+0x70) [0x417320] -> /usr/local/dovecot-1.0.15/lib/dovecot/lda/lib10_quota_plugin.so [0x2a9557d3a8] -> /usr/local/dovecot/libexec/dovecot/deliver(deliver_save+0x136) [0x410856] -> /usr/local/dovecot/libexec/dovecot/deliver(main+0x1023) [0x411c43] -> /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3ac5a1c40b] -> /usr/local/dovecot/libexec/dovecot/deliver [0x40ffaa]
Unfortunatley these leads to bounce messages to the sender:
Jan 13 10:37:59 atmail1.ulh.mydomain.net postfix/pipe[6320]: BB3337E4E7: to=<quarantine@mydomain.net>, relay=dovecot, delay=2.3, delays=0.06/0/0/2.2, dsn=5.3.0, status=bounced (Command died with signal 6: "/usr/local/dovecot/libexec/dovecot/deliver")
First, does anybody know if it's possible to change the behaviour here to not bounce, but retry instead ?
And, is there anything we should do to debug further what the problem might be ?
-jf
On Jan 13, 2009, at 5:00 AM, Jan-Frode Myklebust wrote:
We recently upgraded to dovecot v1.0.15 (from v1.0.0 + some local
fixes), and after this upgrade we've started to get a couple of failures
from deliver:Jan 12 20:34:34 smtp1.ulh.mydomain.net
deliver(someuser@somedomain.net): Raw backtrace: /usr/local/dovecot/ libexec/dovecot/deliver(i_syslog_panic_handler+0x1c) [0x45577c] -> / usr/local/dovecot/libexec/dovecot/deliver [0x45537c] -> /usr/local/ dovecot/libexec/dovecot/deliver(mail_index_sync_update_index+0x86f)
[0x43eb8f] ->
Before this raw backtrace there should have been a "Panic: Something"
logged also. That's more important than the raw backtrace.
Jan 13 10:37:59 atmail1.ulh.mydomain.net postfix/pipe[6320]:
BB3337E4E7: to=<quarantine@mydomain.net>, relay=dovecot, delay=2.3,
delays=0.06/0/0/2.2, dsn=5.3.0, status=bounced (Command died with
signal 6: "/usr/local/dovecot/libexec/dovecot/deliver")First, does anybody know if it's possible to change the behaviour here to not bounce, but retry instead ?
You could at least write a wrapper script to deliver that detects if
it crashes and then returns EX_TEMPFAIL.
And, is there anything we should do to debug further what the
problem might be ?
Does it always crash for the same user?
On Tue, Jan 13, 2009 at 09:46:35AM -0500, Timo Sirainen wrote:
Before this raw backtrace there should have been a "Panic: Something"
logged also. That's more important than the raw backtrace.
I can't find that in the syslog. But I found this just before the backtrace:
deliver(someuser@mydomain.no): file mail-index-sync-update.c: line 854 (mail_index_sync_update_index): assertion failed: (view->hdr.messages_count == map->hdr.messages_count)
And, is there anything we should do to debug further what the problem might be ?
Does it always crash for the same user?
No, it crashed:
4 times today. 3 for the same user and 1 our "quarantine" user
3 times yesterday. 2 for "quarantine", plus 1 for another user.
The "quarantine" user is a user that get's all suspect spam, and got almost 4000 messages the last 24 hours. So the failure is not something that happens very often..
I just tried piping a copy of one of the bounced emails to deliver, but I couldn't trigger a crash :-(
-jf
On Tue, 2009-01-13 at 16:22 +0100, Jan-Frode Myklebust wrote:
On Tue, Jan 13, 2009 at 09:46:35AM -0500, Timo Sirainen wrote:
Before this raw backtrace there should have been a "Panic: Something"
logged also. That's more important than the raw backtrace.I can't find that in the syslog. But I found this just before the backtrace:
deliver(someuser@mydomain.no): file mail-index-sync-update.c: line 854 (mail_index_sync_update_index): assertion failed: (view->hdr.messages_count == map->hdr.messages_count)
Reading your old mails: are you still using GPFS? This crash just shouldn't be happening, so perhaps something randomly breaks with it. Are you using mmap_disable=yes? Multiple servers can access the same user's mails at the same time?
Anyway Dovecot still shouldn't be crashing no matter what GPFS does, but I won't try to fix v1.0 indexing bugs anymore since I've already done a ton of work to get them more robust in v1.1.
On Tue, Jan 13, 2009 at 12:46:04PM -0500, Timo Sirainen wrote:
Reading your old mails: are you still using GPFS? This crash just shouldn't be happening, so perhaps something randomly breaks with it. Are you using mmap_disable=yes? Multiple servers can access the same user's mails at the same time?
Yes, we're using GPFS, but haven't turned on mmap_disable=yes. I believe GPFS should handle mmap across the cluster, and it's been working fine without this setting with the old v1.0.0+.
Do you have any idea what the performance impact should be of turning on mmap_disable=yes ?
Anyway Dovecot still shouldn't be crashing no matter what GPFS does, but I won't try to fix v1.0 indexing bugs anymore since I've already done a ton of work to get them more robust in v1.1.
Ok, we are planning to move to v1.1, but figured we needed to first move to a high number v1.0-release to have the option of reverting from a v1.1 if we hit any problems.. Not quite sure how we'll handle this now.
-jf
On Tue, 2009-01-13 at 19:20 +0100, Jan-Frode Myklebust wrote:
On Tue, Jan 13, 2009 at 12:46:04PM -0500, Timo Sirainen wrote:
Reading your old mails: are you still using GPFS? This crash just shouldn't be happening, so perhaps something randomly breaks with it. Are you using mmap_disable=yes? Multiple servers can access the same user's mails at the same time?
Yes, we're using GPFS, but haven't turned on mmap_disable=yes. I believe GPFS should handle mmap across the cluster, and it's been working fine without this setting with the old v1.0.0+.
Do you have any idea what the performance impact should be of turning on mmap_disable=yes ?
I don't think there would be much of a performance impact. With GPFS it might actually even be a positive impact (especially with v1.0).
I don't really have ideas why it was working with a previous v1.0 version but not with v1.0.15. Although perhaps I have just added more error checks and the previous version just handled the error condition wrong..
On 2009-01-13, Timo Sirainen <tss@iki.fi> wrote:
Reading your old mails: are you still using GPFS? This crash just shouldn't be happening, so perhaps something randomly breaks with it. Are you using mmap_disable=3Dyes? Multiple servers can access the same user's mails at the same time?
I tried setting mmap_disable=yes today, but got two new failures 2.5 hours later :-(
Jan 20 10:38:32 smtp2.ulh.myinternaldomain.net deliver(quarantine@mydomain.net): Raw backtrace: /usr/local/dovecot/libexec/dovecot/deliver(i_syslog_panic_handler+0x1c) [0x45577c] -> /usr/local/dovecot/libexec/dovecot/deliver [0x45537c] -> /usr/local/dovecot/libexec/dovecot/deliver(mail_index_sync_update_index+0x86f) [0x43eb8f] -> /usr/local/dovecot/libexec/dovecot/deliver(mail_index_sync_begin+0x245) [0x43c6e5] -> /usr/local/dovecot/libexec/dovecot/deliver(maildir_sync_index_begin+0x45) [0x4162d5] -> /usr/local/dovecot/libexec/dovecot/deliver(maildir_transaction_save_commit_pre+0x68) [0x41c638] -> /usr/local/dovecot/libexec/dovecot/deliver(maildir_transaction_commit+0x70) [0x417320] -> /usr/local/dovecot-1.0.15/lib/dovecot/lda/lib10_quota_plugin.so [0x2a9557d3a8] -> /usr/local/dovecot/libexec/dovecot/deliver(deliver_save+0x136) [0x410856] -> /usr/local/dovecot/libexec/dovecot/deliver(main+0x1023) [0x411c43] -> /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x322e11c40b] -> /usr/local/dovecot/libexec/dovecot/deliver [0x40ffaa]
Jan 20 10:38:32 smtp2.ulh.myinternaldomain.net deliver(quarantine@mydomain.net): file mail-index-sync-update.c: line 854 (mail_index_sync_update_index): assertion failed: (view->hdr.messages_count == map->hdr.messages_count)
Jan 20 10:30:10 smtp1.ulh.myinternaldomain.net deliver(quarantine@mydomain.net): Raw backtrace: /usr/local/dovecot/libexec/dovecot/deliver(i_syslog_panic_handler+0x1c) [0x45577c] -> /usr/local/dovecot/libexec/dovecot/deliver [0x45537c] -> /usr/local/dovecot/libexec/dovecot/deliver(mail_index_sync_update_index+0x86f) [0x43eb8f] -> /usr/local/dovecot/libexec/dovecot/deliver(mail_index_sync_begin+0x245) [0x43c6e5] -> /usr/local/dovecot/libexec/dovecot/deliver(maildir_sync_index_begin+0x45) [0x4162d5] -> /usr/local/dovecot/libexec/dovecot/deliver(maildir_transaction_save_commit_pre+0x68) [0x41c638] -> /usr/local/dovecot/libexec/dovecot/deliver(maildir_transaction_commit+0x70) [0x417320] -> /usr/local/dovecot-1.0.15/lib/dovecot/lda/lib10_quota_plugin.so [0x2a9557d3a8] -> /usr/local/dovecot/libexec/dovecot/deliver(deliver_save+0x136) [0x410856] -> /usr/local/dovecot/libexec/dovecot/deliver(main+0x1023) [0x411c43] -> /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3cbd81c40b] -> /usr/local/dovecot/libexec/dovecot/deliver [0x40ffaa]
Jan 20 10:30:10 smtp1.ulh.myinternaldomain.net deliver(quarantine@mydomain.net): file mail-index-sync-update.c: line 854 (mail_index_sync_update_index): assertion failed: (view->hdr.messages_count == map->hdr.messages_count)
This is with the following config. Any other suggestions for what we should try ?
protocols = imap pop3 protocol imap { listen = *:143 } protocol pop3 { listen = *:110 } disable_plaintext_auth = no ssl_disable = yes login_user = dovecot max_mail_processes = 512 namespace private { prefix = INBOX. inbox = yes } mmap_disable = yes protocol imap { mail_plugins = quota imap_quota imap_client_workarounds = outlook-idle delay-newmail } protocol pop3 { mail_plugins = quota pop3_uidl_format = UID%u-%v pop3_client_workarounds = outlook-no-nuls oe-ns-eoh } protocol lda { postmaster_address = MAILER-DAEMON@mydomain.net mail_plugins = quota auth_socket_path = /var/run/dovecot/auth-master sendmail_path = /usr/sbin/sendmail } auth default { mechanisms = plain passdb sql { args = /usr/local/dovecot/etc/dovecot-sql.conf } userdb sql { args = /usr/local/dovecot/etc/dovecot-sql.conf } user = dovecot-auth socket listen { master { path = /var/run/dovecot/auth-master mode = 0660 user = root group = atmail } } } auth_verbose = yes
% dovecot -n # 1.0.15: /usr/local/dovecot-1.0.15/etc/dovecot.conf protocols: imap pop3 listen(default): *:143 listen(imap): *:143 listen(pop3): *:110 ssl_disable: yes disable_plaintext_auth: no login_dir: /usr/local/dovecot-1.0.15/var/run/dovecot/login login_executable(default): /usr/local/dovecot-1.0.15/libexec/dovecot/imap-login login_executable(imap): /usr/local/dovecot-1.0.15/libexec/dovecot/imap-login login_executable(pop3): /usr/local/dovecot-1.0.15/libexec/dovecot/pop3-login max_mail_processes: 512 mmap_disable: yes mail_executable(default): /usr/local/dovecot-1.0.15/libexec/dovecot/imap mail_executable(imap): /usr/local/dovecot-1.0.15/libexec/dovecot/imap mail_executable(pop3): /usr/local/dovecot-1.0.15/libexec/dovecot/pop3 mail_plugins(default): quota imap_quota mail_plugins(imap): quota imap_quota mail_plugins(pop3): quota mail_plugin_dir(default): /usr/local/dovecot-1.0.15/lib/dovecot/imap mail_plugin_dir(imap): /usr/local/dovecot-1.0.15/lib/dovecot/imap mail_plugin_dir(pop3): /usr/local/dovecot-1.0.15/lib/dovecot/pop3 imap_client_workarounds(default): outlook-idle delay-newmail imap_client_workarounds(imap): outlook-idle delay-newmail imap_client_workarounds(pop3): outlook-idle pop3_uidl_format(default): pop3_uidl_format(imap): pop3_uidl_format(pop3): UID%u-%v pop3_client_workarounds(default): pop3_client_workarounds(imap): pop3_client_workarounds(pop3): outlook-no-nuls oe-ns-eoh namespace: type: private prefix: INBOX. inbox: yes auth default: user: dovecot-auth verbose: yes passdb: driver: sql args: /usr/local/dovecot/etc/dovecot-sql.conf userdb: driver: sql args: /usr/local/dovecot/etc/dovecot-sql.conf socket: type: listen master: path: /var/run/dovecot/auth-master mode: 432 user: root group: atmail
-jf
On 2009-01-20, Jan-Frode Myklebust <janfrode@tanso.net> wrote:
I tried setting mmap_disable=yes today, but got two new failures 2.5 hours later :-(
I check the logs for the last failure, and see that at the same second two servers were trying to deliver separate messages to the same account. I'll try avoiding a bit of these parallell deliveries by changing MX to prefer one host. That might help as a workaround, but might fail next time this preferred host is too busy to process all requests..
-jf
On Jan 20, 2009, at 5:35 AM, Jan-Frode Myklebust wrote:
On 2009-01-20, Jan-Frode Myklebust <janfrode@tanso.net> wrote:
I tried setting mmap_disable=yes today, but got two new failures 2.5 hours later :-(
I check the logs for the last failure, and see that at the same second two servers were trying to deliver separate messages to the same account. I'll try avoiding a bit of these parallell deliveries by changing MX to prefer one host. That might help as a workaround, but might fail next time this preferred host is too busy to process all requests..
That really sounds like the problem then is with GPFS, perhaps it has
some internal caching that doesn't work as Dovecot expects.. Maybe
Dovecot v1.1 with mail_nfs_*=yes settings would fix it too (I know it
helps FUSE filesystems like glusterfs).
For large mailboxes (64k + messages) using maildir I got random "out of memory" errors.
[mail addresses hidden]
dovecot: Dec 05 18:25:27 Error: IMAP(****@****): block_alloc(): Out of memory dovecot: Dec 05 18:25:27 Error: child 14142 (imap) returned error 83 (Out of memory) dovecot: Dec 05 18:27:29 Error: IMAP(****@****): file maildir-uidlist.c: line 1117 (maildir_uidlist_sync_deinit): assertion failed: (ctx->locked) dovecot: Dec 05 18:27:29 Error: IMAP(****@****): Raw backtrace: imap [0x5555555bd6fe] -> imap [0x5555555bd43e] -> imap [0x55555557dd57] -> imap [0x55555557c026] -> imap(maildir_storage_sync_force+0x43) [0x55555557c2b3] -> imap(maildir_storage_sync_init+0xc2) [0x55555557c3e2] -> imap(imap_sync_nonselected+0xf) [0x55555557533f] -> imap(_cmd_select_full+0xc5) [0x55555556da05] -> imap(cmd_select+0xb) [0x55555556db9b] -> imap [0x55555556eee7] -> imap [0x55555556ef79] -> imap(_client_input+0x6f) [0x55555556f61f] -> imap(io_loop_handler_run+0x108) [0x5555555c2f38] -> imap(io_loop_run+0x18) [0x5555555c20f8] -> imap(main+0x41b) [0x5555555770fb] -> /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b02fcb30b54] -> imap [0x55555556a6e9] dovecot: Dec 05 18:27:29 Error: child 14144 (imap) killed with signal 6 dovecot: Dec 05 18:27:37 Error: IMAP(****@****): block_alloc(): Out of memory dovecot: Dec 05 18:27:37 Error: child 14211 (imap) returned error 83 (Out of memory) dovecot: Dec 05 18:29:41 Error: IMAP(****@****): block_alloc(): Out of memory dovecot: Dec 05 18:29:41 Error: child 14215 (imap) returned error 83 (Out of memory) dovecot: Dec 05 18:31:46 Error: IMAP(****@****): block_alloc(): Out of memory dovecot: Dec 05 18:31:46 Error: child 14248 (imap) returned error 83 (Out of memory) dovecot: Dec 05 18:33:50 Error: IMAP(****@****): block_alloc(): Out of memory dovecot: Dec 05 18:33:50 Error: child 14345 (imap) returned error 83 (Out of memory) dovecot: Dec 05 18:35:55 Error: IMAP(****@****): block_alloc(): Out of memory dovecot: Dec 05 18:35:55 Error: child 14406 (imap) returned error 83 (Out of memory) dovecot: Dec 05 18:37:59 Error: IMAP(****@****): block_alloc(): Out of memory
dovecot 1.0.17 mail_prcosess_size = 256
ulimit -a : core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 73728 max locked memory (kbytes, -l) 32 max memory size (kbytes, -m) 6959460 open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 73728 virtual memory (kbytes, -v) 9910880 file locks (-x) unlimited
cat /proc/meminfo : MemTotal: 8187540 kB MemFree: 353060 kB Buffers: 737968 kB Cached: 5094700 kB SwapCached: 20 kB Active: 4255100 kB Inactive: 2787980 kB SwapTotal: 4200988 kB SwapFree: 4192956 kB Dirty: 20184 kB Writeback: 124 kB AnonPages: 1165756 kB Mapped: 60472 kB Slab: 686100 kB SReclaimable: 591232 kB SUnreclaim: 94868 kB PageTables: 31408 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 8294756 kB Committed_AS: 3417584 kB VmallocTotal: 34359738367 kB VmallocUsed: 91040 kB VmallocChunk: 34359645611 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 2048 kB
What I need to configure to increase available resources? What else I can done?
Uldis
participants (4)
-
Charles Marcus
-
Jan-Frode Myklebust
-
Timo Sirainen
-
Uldis Pakuls