[Dovecot] A couple of dovecot issues I've noted recently...

10 Jul 2008


      Timo et al.,
I haven't been reading the list in a while, but wanted to report back
that we've transitioned to dovecot 1.0.15 and things look to be working
very nicely.  The platform is Solaris "Nevada" running on x86.  I
have been saving up a couple of issues to bring to your attention.
On our server, we gather all core files, and so we get a good taste of
the different things which are causing crashes.  One problem which we
see on an occasional basis is a crash in which the internal fatal error
handling code goes horribly wrong.  Here is an example.  The complete
stack trace is hugely long-- it appears that we recurse until we run
totally out of stack.  In gdb this looks like repeated calls like
this:
...
#14507 0x080b22b6 in i_internal_fatal_handler (status=83, fmt=0x0,
args=0x76b95d8 "\020") at failures.c:410
#14508 0x080b1e1e in i_fatal_status (status=83,
format=0x80cdd64 "mem_block_alloc(): Out of memory when allocating %u bytes") at failures.c:201
#14509 0x080b1652 in mem_block_alloc (min_size=0) at data-stack.c:192
#14510 0x080b1811 in t_malloc_real (size=16, permanent=true)
at data-stack.c:244
#14511 0x080b1890 in t_malloc0 (size=16) at data-stack.c:279
#14512 0x080b8715 in pool_datastack_create () at mempool-datastack.c:51
#14513 0x080bda2b in t_str_new (initial_size=512) at str.c:17
#14514 0x080b21f3 in internal_handler (log_type=70 'F',
format=0x80cdd64 "mem_block_alloc(): Out of memory when allocating %u bytes", args=0x76b96d8 "\020") at failures.c:383
#14515 0x080b22b6 in i_internal_fatal_handler (status=83, fmt=0x0,
args=0x76b96d8 "\020") at failures.c:410
#14516 0x080b1e1e in i_fatal_status (status=83,
format=0x80cdd64 "mem_block_alloc(): Out of memory when allocating %u bytes") at failures.c:201
#14517 0x080b1652 in mem_block_alloc (min_size=0) at data-stack.c:192
#14518 0x080b1811 in t_malloc_real (size=16, permanent=true)
at data-stack.c:244
#14519 0x080b1890 in t_malloc0 (size=16) at data-stack.c:279
#14520 0x080b8715 in pool_datastack_create () at mempool-datastack.c:51
#14521 0x080bda2b in t_str_new (initial_size=512) at str.c:17
#14522 0x080b21f3 in internal_handler (log_type=70 'F',
format=0x80cdd64 "mem_block_alloc(): Out of memory when allocating %u bytes", args=0x76b97d8 "\020") at failures.c:383
#14523 0x080b22b6 in i_internal_fatal_handler (status=83, fmt=0x0,
args=0x76b97d8 "\020") at failures.c:410
#14524 0x080b1e1e in i_fatal_status (status=83,
format=0x80cdd64 "mem_block_alloc(): Out of memory when allocating %u bytes") at failures.c:201
#14525 0x080b1652 in mem_block_alloc (min_size=0) at data-stack.c:192
#14526 0x080b1811 in t_malloc_real (size=16, permanent=true)
at data-stack.c:244
#14527 0x080b1890 in t_malloc0 (size=16) at data-stack.c:279
#14528 0x080b8715 in pool_datastack_create () at mempool-datastack.c:51
#14529 0x080bda2b in t_str_new (initial_size=512) at str.c:17
#14530 0x080b21f3 in internal_handler (log_type=70 'F',
format=0x80cdd64 "mem_block_alloc(): Out of memory when allocating %u bytes", args=0x76b98d8 "\020") at failures.c:383
#14531 0x080b22b6 in i_internal_fatal_handler (status=83, fmt=0x0,
args=0x76b98d8 "\020") at failures.c:410
#14532 0x080b1e1e in i_fatal_status (status=83,
format=0x80cdd64 "mem_block_alloc(): Out of memory when allocating %u bytes") at failures.c:201
#14533 0x080b1652 in mem_block_alloc (min_size=0) at data-stack.c:192
#14534 0x080b1811 in t_malloc_real (size=16, permanent=true)
at data-stack.c:244
#14535 0x080b1890 in t_malloc0 (size=16) at data-stack.c:279
#14536 0x080b8715 in pool_datastack_create () at mempool-datastack.c:51
#14537 0x080bda2b in t_str_new (initial_size=512) at str.c:17
#14538 0x080b21f3 in internal_handler (log_type=70 'F',
...
At the top of the stack, I see this-- unfortunately I can't use
gdb to view it-- it core dumps after about 50,000 frames.  In dbx:
[327610] i_internal_fatal_handler(status = 83, fmt = (nil), args = 0x8047798 "^P"), line 410 in "failures.c"
[327611] i_fatal_status(status = 83, format = 0x80cdd64 "mem_block_alloc(): Out of memory when allocating %u bytes", ...), line 201 in "failures.c"
[327612] mem_block_alloc(min_size = 134510504U), line 192 in "data-stack.c"
[327613] t_malloc_real(size = 0, permanent = 0), line 244 in "data-stack.c"
[327614] t_buffer_get(size = 0), line 319 in "data-stack.c"
[327615] _vstrconcat(str1 = (nil), args = 0x8047834 "\x9d?^L^H", ret_len = 0x8047808), line 229 in "strfuncs.c"
[327616] t_strconcat(str1 = 0x100e77f0 "tmp/compiler/venus/handoff/build20.2/CDROM/sol-x86/kits/j2sdk_1.4.2_08/patches/J2SE_Solaris_8_x86_Recommended/108994-44/SUNWcsr/.svn/prop-base", ...), line 335 in "strfuncs.c"
[327617] mbox_list_next(ctx = 0x80f6348), line 273 in "mbox-list.c"
[327618] mbox_mailbox_list_next(_ctx = 0x8047928), line 209 in "mbox-list.c"
[327619] mail_storage_mailbox_list_next(ctx = 0x8047948), line 317 in "mail-storage.c"
[327620] cmd_list_continue(cmd = 0x80eff2c), line 184 in "cmd-list.c"
[327621] client_destroy(client = (nil), reason = 0x80479d8 "Ã¸y^D^HcÂ±^F^HÃ¨Ã¾^N^HÃx^L^H^Hz^D^HmÂ¨^K^H^P\x8b^N^H^B"), line 94 in "client.c"
[327622] idle_timeout(context = (nil)), line 522 in "client.c"
[327623] io_loop_handle_timeouts(ioloop = (nil), update_run_now = 1U), line 294 in "ioloop.c"
[327624] io_loop_handler_run(ioloop = 0x80ed6e0), line 159 in "ioloop-poll.c"
[327625] io_loop_run(ioloop = (nil)), line 329 in "ioloop.c"
[327626] main(argc = 1, argv = 0x8047afc, envp = 0x8047b04), line 290 in "main.c"
I wondered if anyone else has seen similar problems?  It seems
like once we fall into i_internal_fatal_handler(), dovecot should really
try hard to avoid further dynamic memory allocation, as heap corruption
seems to make that a bit of a disaster.
As for the crash in mbox_list_next, we see that one too with some
frequency.
A final problem, and this one affects me personally, is that
we see some errors like this:
dovecot: Jul 09 11:52:10 Error: IMAP(dp): FETCH for mailbox mail/tmp-inbox UID 92086 got too little data: 6054 vs 6057
dovecot: Jul 09 11:52:10 Info: IMAP(dp): Disconnected: Disconnected
This is frustrating because I wind up with "poisonous" messages in my
inbox which cause dovecot to disconnect me when I try to read them.
Across all of our users, we see this happen about 1000 times per month.
    -dp
--
Daniel Price - Solaris Kernel Engineering - dp@eng.sun.com - blogs.sun.com/dp

[Dovecot] A couple of dovecot issues I've noted recently...

Dan Price