[Dovecot] Some issues in Dovecot 1.2.5 after upgrade from 1.0.15

Chris Wakelin c.d.wakelin at reading.ac.uk
Wed Sep 30 16:54:08 EEST 2009


Timo Sirainen wrote:
> On Wed, 2009-09-30 at 11:06 +0100, Chris Wakelin wrote:
>>> Sep 30 09:57:11 IMAP 25384 xxxxxxxx 134.225.xx.xx : Panic: file mail-index-sync-update.c: line 933 (mail_index_sync_map): assertion failed: (map->hdr.indexid == index->indexid || map->hdr.indexid == 0)
> ..
>>> #5  0x98a00 in mail_index_sync_map (_map=0x1263e4, type=MAIL_INDEX_SYNC_HANDLER_HEAD, force=false) at mail-index-sync-update.c:802
> 
> Could you also do:
> 
> fr 5
> p *index
> p *map

OK:

> (gdb) fr 5
> #5  0x98a00 in mail_index_sync_map (_map=0x1263e4, type=MAIL_INDEX_SYNC_HANDLER_HEAD, force=false) at mail-index-sync-update.c:802
> 802                     return 0;
> (gdb) p *index
> $1 = {dir = 0x1264f8 "/export/indexes/a/120/xxxxxxxx/.imap/INBOX", prefix = 0x11add0 "dovecot.index", cache = 0x127930, log = 0x1271f0, fsync_mask = 0,
>   mode = 384, gid = -1, gid_origin = 0x1278b8 "/export/mail/a/120/xxxxxxxx/INBOX", extension_pool = 0x126540, extensions = {arr = {buffer = 0x126558,
>       element_size = 36}, v = 0x126558, v_modifiable = 0x126558}, sync_lost_handlers = {arr = {buffer = 0x11b578, element_size = 4}, v = 0x11b578,
>     v_modifiable = 0x11b578}, filepath = 0x1278e8 "/export/indexes/a/120/xxxxxxxx/.imap/INBOX/dovecot.index", fd = -1, map = 0x127ac0,
>   indexid = 1254301030, inconsistency_id = 0, last_read_log_file_seq = 0, last_read_log_file_head_offset = 0, last_read_log_file_tail_offset = 0,
>   last_read_stat = {st_dev = 0, st_pad1 = {0, 0, 0}, st_ino = 0, st_mode = 0, st_nlink = 0, st_uid = 0, st_gid = 0, st_rdev = 0, st_pad2 = {0, 0},
>     st_size = 0, st_atim = {tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 0, tv_nsec = 0}, st_ctim = {tv_sec = 0, tv_nsec = 0}, st_blksize = 0,
>     st_blocks = 0, st_fstype = '\000' <repeats 15 times>, st_pad4 = {0, 0, 0, 0, 0, 0, 0, 0}}, fsck_log_head_file_seq = 0, fsck_log_head_file_offset = 0,
>   lock_type = 3, shared_lock_count = 0, excl_lock_count = 0, lock_id_counter = 2, lock_method = FILE_LOCK_METHOD_FCNTL, file_lock = 0x0, dotlock = 0x0,
>   keywords_pool = 0x126948, keywords = {arr = {buffer = 0x11b5c8, element_size = 4}, v = 0x11b5c8, v_modifiable = 0x11b5c8}, keywords_hash = 0x126b88,
>   keywords_ext_id = 0, modseq_ext_id = 1, module_contexts = {arr = {buffer = 0x11b5a0, element_size = 4}, v = 0x11b5a0, v_modifiable = 0x11b5a0},
>   error = 0x0, nodiskspace = 0, index_lock_timeout = 0, opened = 1, log_locked = 0, mmap_disable = 0, fsync_disable = 0, use_excl_dotlocks = 1,
>   nfs_flush = 0, readonly = 0, mapping = 1, syncing = 0, need_recreate = 0, index_min_write = 1, modseqs_enabled = 0, initial_create = 0,
>   initial_mapped = 1}
> (gdb) p *map
> $2 = {index = 0x1263a8, refcount = 2, hdr = {major_version = 7 '\a', minor_version = 2 '\002', base_header_size = 120, header_size = 120, record_size = 8,
>     compat_flags = 0 '\000', unused = "\000\000", indexid = 1254301031, flags = 0, uid_validity = 0, next_uid = 1, messages_count = 0,
>     unused_old_recent_messages_count = 0, seen_messages_count = 0, deleted_messages_count = 0, first_recent_uid = 1, first_unseen_uid_lowwater = 0,
>     first_deleted_uid_lowwater = 0, log_file_seq = 2, log_file_tail_offset = 0, log_file_head_offset = 40, sync_size = 0, sync_stamp = 0, day_stamp = 0,
>     day_first_uid = {0, 0, 0, 0, 0, 0, 0, 0}}, hdr_base = 0x127fc0, hdr_copy_buf = 0x11b640, extension_pool = 0x0, extensions = {arr = {buffer = 0x0,
>       element_size = 0}, v = 0x0, v_modifiable = 0x0}, ext_id_map = {arr = {buffer = 0x0, element_size = 0}, v = 0x0, v_modifiable = 0x0},
>   keyword_idx_map = {arr = {buffer = 0x0, element_size = 0}, v = 0x0, v_modifiable = 0x0}, rec_map = 0x127b78, write_base_header = 0, write_ext_header = 0,
>   write_atomic = 0}

I was wrong about it being Solaris 10; the mail server is still on
Solaris 8 (with the latest and last patch cluster installed) and Dovecot
was compiled with gcc 3.3.2.

I've got a backup of the user's INBOX available, but I haven't been able
to reproduce the crash on my test box (either with it put through
mboxcrypt.pl or without). I've seen no similar crashes (so far) either.
Let me know if you'd like me to send the anonymised version (4MB expands
to ~70MB).

> 
>>> Sep 30 10:37:06 auth(default): Error: Master request 15820.607 not found
> 
> Either timed out for some reason or some other strange bug..
> 
>>> Sep 30 10:51:27 dovecot: Error: child 29959 (login) killed with signal 11 (core not dumped) (latest ip=134.225.1.46)
> ..
>> How do I get the login (presumably imap-login) processes to dump core
> 
> With Linux this would be enough:
> 
> login_executable = .../imap-login -D
> 
> With Solaris you'll also have to somehow make setuid binaries dump core.
> http://publib.boulder.ibm.com/httpserv/ihsdiag/coredumps.html says you
> could do:
> 
> coreadm -e global-setid -e proc-setid -e global
> 
> Perhaps with those global core dumps it wouldn't even be necessary to
> add the -D parameter. Anyway the -D's point was to make
> the /var/run/dovecot/login/ directory writable to it, so it could write
> core dump there.
> 
>> (or do I need to insert gdbhelper)?
> 
> I'm not really sure anymore if that even works (anymore). :)

...

>>>> Sep 30 10:51:27 dovecot: Error: child 29959 (login) killed with signal 11 (core not dumped) (latest ip=134.225.1.46)
>> ..
>>>>   passdb:
>>>>     driver: passwd-file
>>>>     args: /export/mailconfig/%Ju/%u/proxy.%Ls
>> 
>> You're using proxying also? Apparently there's something wrong there,
>> perhaps with master user logins or something.. I haven't yet found out
>> what exactly it is, but these should fix the crashes and log more
>> information:
>> 
>> http://hg.dovecot.org/dovecot-1.2/rev/b1293abca645
>> http://hg.dovecot.org/dovecot-1.2/rev/c8e24c1b8468
>> 

Yes we're proxying to Exchange 2003 for those users that have a
proxy.imap or proxy.pop3 in their mailconfig directory (we migrated most
of our staff from Unix mailstore to Exchange without them noticing; all
the students, most functional accounts, and staff that have too much
mail are still on Unix).

I'll look at applying the patches this evening (there's probably no
point doing it on the test server as I guess it's triggered by load or
something odd in Exchange).

Strangely, though we haven't had any crashes since this morning:

> Sep 30 00:38:32 dovecot: Error: child 26387 (login) killed with signal 11 (core not dumped) (latest ip=134.225.1.46)
> Sep 30 00:55:02 dovecot: Error: child 9268 (login) killed with signal 11 (core not dumped) (ip=134.225.1.46)
> Sep 30 09:43:07 dovecot: Error: child 13660 (login) killed with signal 11 (core not dumped) (latest ip=134.225.1.46)
> Sep 30 10:51:27 dovecot: Error: child 29959 (login) killed with signal 11 (core not dumped) (latest ip=134.225.1.46)
> Sep 30 10:51:38 dovecot: Error: child 11162 (login) killed with signal 11 (core not dumped) (latest ip=134.225.1.46)
> Sep 30 11:18:05 dovecot: Error: child 23772 (login) killed with signal 11 (core not dumped) (latest ip=134.225.1.46)
> Sep 30 11:20:16 dovecot: Error: child 2120 (login) killed with signal 11 (core not dumped) (latest ip=134.225.1.46)

and all of them had latest ip=134.225.1.46, which is our Prayer Webmail
server and is used mostly by students (i.e. not people on Exchange).

Best Wishes,
Chris

-- 
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
Christopher Wakelin,                           c.d.wakelin at reading.ac.uk
IT Services Centre, The University of Reading,  Tel: +44 (0)118 378 8439
Whiteknights, Reading, RG6 2AF, UK              Fax: +44 (0)118 975 3094


More information about the dovecot mailing list