[Dovecot] assertion failed
What is this ?
dovecot: Apr 24 13:03:29 Error: IMAP(tina.burdujan): file maildir-sync.c: line 1075 (maildir_sync_index): assertion failed: (uid
prev_uid)
dovecot: Apr 24 13:03:29 Error: IMAP(tina.burdujan): Raw backtrace: /usr/libexec/dovecot/imap [0x80b8741] -> /usr/libexec/dovecot/imap [0x80b819c] -> /usr/libexec/dovecot/imap(maildir_sync_index+0x898) [0x80695f8] -> /usr/libexec/dovecot/imap [0x80698b3] -> /usr/libexec/dovecot/imap(maildir_ storage_sync_init+0x49) [0x8069aa9] -> /usr/libexec/dovecot/imap(imap_sync_init+0x40) [0x8062620] -> /usr/libexec/dovecot/imap(cmd_sync+0x71) [0x8 062b81] -> /usr/libexec/dovecot/imap(cmd_noop+0x26) [0x8059c76] -> /usr/libexec/dovecot/imap [0x805be7f] -> /usr/libexec/dovecot/imap [0x805bf20] -> /usr/libexec/dovecot/imap(_client_input+0x6c) [0x805c0fc] -> /usr/libexec/dovecot/imap(io_loop_handler_run+0xff) [0x80bef6f] -> /usr/libexec/do vecot/imap(io_loop_run+0x36) [0x80be366] -> /usr/libexec/dovecot/imap(main+0x46c) [0x806448c] -> /lib/libc.so.6(__libc_start_main+0xb9) [0xb7eb4e1 9] -> /usr/libexec/dovecot/imap [0x80568b1]
dovecot: Apr 24 13:03:29 Error: child 25748 (imap) killed with signal 6
My conf is: exim + dovecot + mysql exim deliver mail to maildir , because i don't know how to use LDA from dovecot. the maildirs are very large, for one account could be 30000 mails. dovecot is version 1.0.0 dovecot -n shows: # /etc/dovecot.conf base_dir: /var/run/dovecot/login log_path: /var/log/mail.log login_dir: /var/run/dovecot login_executable: /usr/libexec/dovecot/imap-login login_process_per_connection: no login_process_size: 80 login_processes_count: 10 login_max_processes_count: 50 login_max_connections: 80 first_valid_uid: 1001 last_valid_uid: 1001 mail_extra_groups: exim,vmail mail_location: maildir:%Lh/Maildir mail_cache_min_mail_count: 1 dotlock_use_excl: yes fsync_disable: yes maildir_copy_with_hardlinks: yes maildir_copy_preserve_filename: yes mail_drop_priv_before_exec: yes mail_executable: /bin/imap.sh mail_plugins: quota imap_quota auth default: mechanisms: PLAIN CRAM-MD5 default_realm: testing.net cache_size: 256 worker_max_count: 100 passdb: driver: sql args: /etc/dovecot-crammd5.conf userdb: driver: prefetch plugin: quota: maildir
On 2007-04-26, Adrian Stoica <adrian.stoica@dacris.net> wrote:
What is this ?
We just saw the same fault today when we switched from courier to dovecot on a large system today. The ~username/dovecot-uidlist contained:
1 -1 0
and deleting this file plus it's lockfile seems to have fixed the problem for the two users this happened for.
-jf
On Wed, 2007-05-02 at 14:10 +0200, Jan-Frode Myklebust wrote:
On 2007-04-26, Adrian Stoica <adrian.stoica@dacris.net> wrote:
What is this ?
We just saw the same fault today when we switched from courier to dovecot on a large system today. The ~username/dovecot-uidlist contained:
1 -1 0
and deleting this file plus it's lockfile seems to have fixed the problem for the two users this happened for.
Fixed it to log an error instead in such situations: http://dovecot.org/list/dovecot-cvs/2007-May/008728.html
On 2007-05-09, Timo Sirainen <tss@iki.fi> wrote:
Fixed it to log an error instead in such situations: http://dovecot.org/list/dovecot-cvs/2007-May/008728.html
Great, thanks!
We just moved a large cluster (100k+ active accounts) from courier pop/imap to dovecot (v1.0.0), and used the courier-dovecot-migrate.pl to do the conversion of maildirs.
A couple of other failures we've been hitting is:
#1: deliver(xxxxx@xxxxx): file mail-index-sync-update.c: line 841 (mail_index_sync_update_index): assertion failed: (view->hdr.messages_count == map->hdr.messages_count) deliver(xxxxx@xxxxx): Raw backtrace: /usr/local/dovecot/libexec/dovecot/deliver(i_syslog_panic_handler+0x1c) [0x45d67c] -> /usr/local/dovecot/libexec/dovecot/deliver [0x45d27c] -> /usr/local/dovecot/libexec/dovecot/deliver(mail_index_sync_update_index+0x86f) [0x446abf] -> /usr/local/dovecot/libexec/dovecot/deliver(mail_index_sync_begin+0x245) [0x444665] -> /usr/local/dovecot/libexec/dovecot/deliver(maildir_sync_index_begin+0x45) [0x416885] -> /usr/local/dovecot/libexec/dovecot/deliver(maildir_transaction_save_commit_pre+0x68) [0x41c778] -> /usr/local/dovecot/libexec/dovecot/deliver(maildir_transaction_commit+0x70) [0x417730] -> /usr/local/dovecot-1.0.0/lib/dovecot/lda/lib10_quota_plugin.so [0x2a9557c3a8] -> /usr/local/dovecot/libexec/dovecot/deliver(deliver_save+0x100) [0x411360] -> /usr/local/dovecot/libexec/dovecot/deliver(main+0xb62) [0x412132] -> /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x307b11c3fb] -> /usr/local/dovecot/libexec/dovecot/deliver [0x410b0a]
#2: deliver(xxxxxx@xxxxxx): file mail-index.c: line 983 (mail_index_sync_from_transactions): assertion failed: (hdr.messages_count == (*map)->hdr.messages_count) deliver(xxxxxx@xxxxxx): Raw backtrace: /usr/local/dovecot/libexec/dovecot/deliver(i_syslog_panic_handler+0x1c) [0x45d67c] -> /usr/local/dovecot/libexec/dovecot/deliver [0x45d27c] -> /usr/local/dovecot/libexec/dovecot/deliver(mail_index_map+0x87) [0x43e5f7] -> /usr/local/dovecot/libexec/dovecot/deliver(mail_index_sync_begin+0x9e) [0x4444be] -> /usr/local/dovecot/libexec/dovecot/deliver(maildir_sync_index_begin+0x45) [0x416885] -> /usr/local/dovecot/libexec/dovecot/deliver [0x4173aa] -> /usr/local/dovecot/libexec/dovecot/deliver(maildir_sync_last_commit+0x47) [0x4174c7] -> /usr/local/dovecot-1.0.0/lib/dovecot/lda/lib10_quota_plugin.so [0x2a9557c3a8] -> /usr/local/dovecot/libexec/dovecot/deliver(deliver_save+0x100) [0x411360] -> /usr/local/dovecot/libexec/dovecot/deliver(main+0xb62) [0x412132] -> /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x307b11c3fb] -> /usr/local/dovecot/libexec/dovecot/deliver [0x410b0a]
#3: deliver(xxxxxxxxxxx@xxxxx): file maildir-save.c: line 520 (maildir_transaction_save_commit_pre): assertion failed: (first_uid != 0) deliver(xxxxxxxxxxx@xxxxx): Raw backtrace: /usr/local/dovecot/libexec/dovecot/deliver(i_syslog_panic_handler+0x1c) [0x45d67c] -> /usr/local/dovecot/libexec/dovecot/deliver [0x45d27c] -> /usr/local/dovecot/libexec/dovecot/deliver [0x41c9ed] -> /usr/local/dovecot/libexec/dovecot/deliver(maildir_transaction_commit+0x70) [0x417730] -> /usr/local/dovecot-1.0.0/lib/dovecot/lda/lib10_quota_plugin.so [0x2a9557c3a8] -> /usr/local/dovecot/libexec/dovecot/deliver(deliver_save+0x100) [0x411360] -> /usr/local/dovecot/libexec/dovecot/deliver(main+0xb62) [0x412132] -> /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x307b11c3fb] -> /usr/local/dovecot/libexec/dovecot/deliver [0x410b0a]
#4: dovecot: POP3(xxxxx@xxxxxxx): file maildir-sync.c: line 1075 (maildir_sync_index): assertion failed: (uid > prev_uid) dovecot: POP3(xxxxx@xxxxxxx): Raw backtrace: /usr/local/dovecot/libexec/dovecot/pop3 [0x45d73c] -> /usr/local/dovecot/libexec/dovecot/pop3 [0x45d03c] -> /usr/local/dovecot/libexec/dovecot/pop3(maildir_sync_index+0x769) [0x417029] -> /usr/local/dovecot/libexec/dovecot/pop3 [0x417171] -> /usr/local/dovecot/libexec/dovecot/pop3(maildir_storage_sync_init+0x65) [0x4173c5] -> /usr/local/dovecot/libexec/dovecot/pop3(client_create+0x15d) [0x4111dd] -> /usr/local/dovecot/libexec/dovecot/pop3(main+0x554) [0x412fd4] -> /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x389e61c3fb] -> /usr/local/dovecot/libexec/dovecot/pop3 [0x410a2a]
The "deliver" bugs are quite bad, as they lead to incoming messages getting bounced..
-jf
On Wed, 2007-05-09 at 15:44 +0200, Jan-Frode Myklebust wrote:
On 2007-05-09, Timo Sirainen <tss@iki.fi> wrote:
Fixed it to log an error instead in such situations: http://dovecot.org/list/dovecot-cvs/2007-May/008728.html
Great, thanks!
We just moved a large cluster (100k+ active accounts) from courier pop/imap to dovecot (v1.0.0), and used the courier-dovecot-migrate.pl to do the conversion of maildirs.
Was it courier-dovecot-migrate.pl then that created those broken uidlist files? I guess I should fix it too then.
A couple of other failures we've been hitting is:
deliver(xxxxx@xxxxx): file mail-index-sync-update.c: line 841 (mail_index_sync_update_index): assertion failed: (view->hdr.messages_count == map->hdr.messages_count) .. deliver(xxxxxx@xxxxxx): file mail-index.c: line 983 (mail_index_sync_from_transactions): assertion failed: (hdr.messages_count == (*map)->hdr.messages_count)
I hoped these were completely fixed in v1.0. What filesystem do you use?
deliver(xxxxxxxxxxx@xxxxx): file maildir-save.c: line 520 (maildir_transaction_save_commit_pre): assertion failed: (first_uid != 0)
Hopefully fixed by the above patch. Or I think this should happen only if next_uid=0 in the uidlist header.
dovecot: POP3(xxxxx@xxxxxxx): file maildir-sync.c: line 1075 (maildir_sync_index): assertion failed: (uid > prev_uid)
I haven't seen this one before. I'll try to figure out how it could happen.
The "deliver" bugs are quite bad, as they lead to incoming messages getting bounced..
Those are all assertion failures. Doesn't your MTA treat deliver crashes as temporary failures which are retried?
On Sun, May 13, 2007 at 08:50:16PM +0300, Timo Sirainen wrote:
Was it courier-dovecot-migrate.pl then that created those broken uidlist files?
Yes, we cleaned these up manually.
deliver(xxxxx@xxxxx): file mail-index-sync-update.c: line 841 (mail_index_sync_update_index): assertion failed: (view->hdr.messages_count == map->hdr.messages_count) .. deliver(xxxxxx@xxxxxx): file mail-index.c: line 983 (mail_index_sync_from_transactions): assertion failed: (hdr.messages_count == (*map)->hdr.messages_count)
I hoped these were completely fixed in v1.0. What filesystem do you use?
IBM's GPFS on linux, which is a shared disk cluster fs.
The "deliver" bugs are quite bad, as they lead to incoming messages getting bounced..
Those are all assertion failures. Doesn't your MTA treat deliver crashes as temporary failures which are retried?
No, sorry.. postfix seems to be bouncing when deliver dies from signal 6.:
postfix/pipe[21066]: 4D76F3B67E: to=<XXXXXXXXXXXXXXXXX@XXX.XXX>, relay=dovecot, delay=0.3, delays=0/0/0/0.3, dsn=5.3.0, status=bounced (Command died with signal 6: "/usr/local/dovecot/libexec/dovecot/deliver")
I guess postfix doesn't really have any way of knowing how far the delivery succeeded, but I'd prefer if postfix would freeze these instead.
-jf
On Sun, 2007-05-13 at 22:10 +0200, Jan-Frode Myklebust wrote:
On Sun, May 13, 2007 at 08:50:16PM +0300, Timo Sirainen wrote:
Was it courier-dovecot-migrate.pl then that created those broken uidlist files?
Yes, we cleaned these up manually.
OK, updated the script so other people won't run into the same problem.
deliver(xxxxx@xxxxx): file mail-index-sync-update.c: line 841 (mail_index_sync_update_index): assertion failed: (view->hdr.messages_count == map->hdr.messages_count) .. deliver(xxxxxx@xxxxxx): file mail-index.c: line 983 (mail_index_sync_from_transactions): assertion failed: (hdr.messages_count == (*map)->hdr.messages_count)
I hoped these were completely fixed in v1.0. What filesystem do you use?
IBM's GPFS on linux, which is a shared disk cluster fs.
So either there's some problem that only occurs with GPFS or it adds enough latency that a race condition somewhere can cause problems. Before v1.0 release I was running imap stress testing for many hours (reading and modifying the same mailbox) without a single error, so I doubt I can reproduce this myself. And if I can't reproduce it, this is going to be pretty much impossible to fix.
For Dovecot v1.1 I'm going to simplify the index file code so at least then this error should hopefully go away.
Those are all assertion failures. Doesn't your MTA treat deliver crashes as temporary failures which are retried?
No, sorry.. postfix seems to be bouncing when deliver dies from signal 6.:
So it seems. If you're using syslog, you could use the attached patch. But maybe this should be changed in Postfix side? I guess I could try asking in Postfix list if they've something against it. You could anyway change that by modifying src/global/pipe_command.c around line 630:
if (WIFSIGNALED(wait_status)) {
dsb_unix(why, "5.3.0", log_len ?
Change 5.3.0 to 4.3.0
On Sun, May 13, 2007 at 11:40:12PM +0300, Timo Sirainen wrote:
IBM's GPFS on linux, which is a shared disk cluster fs.
So either there's some problem that only occurs with GPFS or it adds enough latency that a race condition somewhere can cause problems. Before v1.0 release I was running imap stress testing for many hours (reading and modifying the same mailbox) without a single error, so I doubt I can reproduce this myself. And if I can't reproduce it, this is going to be pretty much impossible to fix.
For Dovecot v1.1 I'm going to simplify the index file code so at least then this error should hopefully go away.
Any idea when v1.1 will be released ? The mail_index_sync_update_index failure is happening about once a day, so we need to get something done about it.. Hmmm, maybe setting the postfix soft_bounce=yes (as suggested by Jasper Slits) will be an acceptable workaround for us, as I don't think these servers should ever need to bounce mail (other servers in front of them should be handeling that).
-jf
On Sun, 2007-05-13 at 22:53 +0200, Jan-Frode Myklebust wrote:
For Dovecot v1.1 I'm going to simplify the index file code so at least then this error should hopefully go away.
Any idea when v1.1 will be released ?
I haven't even started doing the index code cleanups. But I did write a small summary about it: http://dovecot.org/list/dovecot/2007-May/022591.html
I'm anyway hoping that I can get v1.1 mostly (if not completely) ready this summer. Unless I suddenly start wasting a lot of time with other things (such as paying work).
On Mon, May 14, 2007 at 12:20:43AM +0300, Timo Sirainen wrote:
I haven't even started doing the index code cleanups. But I did write a small summary about it: http://dovecot.org/list/dovecot/2007-May/022591.html
Which got me thinking.. Do you think changing locking method might help with these assertion failures ? Currently we're using default lock_method, but maybe one of the others are more appropriate for shared file-systems ? ... maybe even just to change code paths if this is a race we're seeing. Any thoughts ?
-jf
On Sun, 2007-05-13 at 23:46 +0200, Jan-Frode Myklebust wrote:
On Mon, May 14, 2007 at 12:20:43AM +0300, Timo Sirainen wrote:
I haven't even started doing the index code cleanups. But I did write a small summary about it: http://dovecot.org/list/dovecot/2007-May/022591.html
Which got me thinking.. Do you think changing locking method might help with these assertion failures ? Currently we're using default lock_method, but maybe one of the others are more appropriate for shared file-systems ? ... maybe even just to change code paths if this is a race we're seeing. Any thoughts ?
Code paths between fcntl and flock are pretty much the same. Unless there's a bug in GPFS it shouldn't make a difference which one you use. You can always try of course. Changing to dotlock would make it use a bit different code paths, but it also would make it slower.
The biggest difference is between mmap_disable=yes and =no, but unless GPFS supports shared mmaps it's probably not a good idea to set that to "no".
On Mon, May 14, 2007 at 12:54:31AM +0300, Timo Sirainen wrote:
The biggest difference is between mmap_disable=yes and =no, but unless GPFS supports shared mmaps it's probably not a good idea to set that to "no".
I verified that mmap is supported on GPFS, and changed to mmap_disable=no yesterday. This seems to have fixed our deliver-problems. Thanks!
-jf
On Sun, 2007-05-13 at 23:46 +0200, Jan-Frode Myklebust wrote:
On Mon, May 14, 2007 at 12:20:43AM +0300, Timo Sirainen wrote:
I haven't even started doing the index code cleanups. But I did write a small summary about it: http://dovecot.org/list/dovecot/2007-May/022591.html
Which got me thinking.. Do you think changing locking method might help with these assertion failures ? Currently we're using default lock_method, but maybe one of the others are more appropriate for shared file-systems ? ... maybe even just to change code paths if this is a race we're seeing. Any thoughts ?
One possibility of course is to just disable index file updates completely with deliver. Those crashes were all related to index file handling.
participants (3)
-
Adrian Stoica
-
Jan-Frode Myklebust
-
Timo Sirainen