[Dovecot] WG: on High Load using IMAPSYNC : Panic: file ostream-lzma.c: line 147: unreached. Dovecot 2.2.12 with zlib/XZ compression
when syncing Mailboxes with IMAPSYNC, using xz compression the dbox file gets corrupted. the same configuration was working ok on low load with xz compression.
Dovecot Version 2.2.12 Linux Mint 16 EXT4 Filesystem
another dovecot user already reported the same problem on the mailinglist,
using dsync. That user switched to bz2 compression what worked for him.
the answer was :
are you on latest patchlevel ? there are some dsync fixes since 2.2.12
but it is not a problem of dsync. when looking at the code, we get here an unexpected returnvalue when flushing.
I propose it would be good practice to write some logging in the default switch cases, to see the unexpected returnvalues in the errorlog.
something like :
ostream-lzma.c : -- Line 147 : i_unreached(); ++ Line 147 : i_fatal("unexpected lzma errorcode when flushing : %s", ret);
after that modification the errorlog shows :
May 6 14:03:05 vm-imap dovecot: imap(test): Fatal: lzma errorcode when flushing : (null)
maybe You can also investigate and let me know when fixed.
Here the log :
May 5 14:53:10 vm-imap dovecot: imap(rrernboeck): Panic: file ostream-lzma.c: line 147: unreached May 5 14:53:10 vm-imap dovecot: imap(rrernboeck): Error: Raw backtrace: /usr/local/lib/dovecot/libdovecot.so.0(+0x67450) [0x7ff99d8ce450] -> /usr/local/lib/dovecot/libdovecot.so.0(+0x6752e) [0x7ff99d8ce52e] -> /usr/local/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x7ff99d88955e] -> /usr/local/lib/dovecot/lib20_zlib_plugin.so(+0x4f0a) [0x7ff99ce87f0a] -> /usr/local/lib/dovecot/libdovecot.so.0(o_stream_flush+0x3e) [0x7ff99d8e6dbe] -> /usr/local/lib/dovecot/libdovecot.so.0(o_stream_nflush+0xf) [0x7ff99d8e71df] -> /usr/local/lib/dovecot/libdovecot.so.0(o_stream_nfinish+0x9) [0x7ff99d8e7229] -> /usr/local/lib/dovecot/libdovecot-storage.so.0(dbox_save_end+0x29) [0x7ff99db7cfa9] -> /usr/local/lib/dovecot/libdovecot-storage.so.0(mdbox_save_finish+0x39) [0x7ff99db76619] -> /usr/local/lib/dovecot/libdovecot-storage.so.0(mailbox_save_finish+0x4d) [0x7ff99dbb6a4d] -> dovecot/imap rrernboeck 127.0.0.1 APPEND [0x40be09] -> dovecot/imap rrernboeck 127.0.0.1 APPEND [0x416cfc] -> dovecot/imap rrernboeck 127.0.0.1 APPEND [0x40b491] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_call_io+0x27) [0x7ff99d8de527] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_handler_run+0xd7) [0x7ff99d8df2b7] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_run+0x38) [0x7ff99d8de0c8] -> /usr/local/lib/dovecot/libdovecot.so.0(master_service_run+0x13) [0x7ff99d88e753] -> dovecot/imap rrernboeck 127.0.0.1 APPEND
[0x40b06f] -> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7ff99d4c0de5] -> dovecot/imap rrernboeck 127.0.0.1 APPEND [0x40b1cc] May 5 14:53:10 vm-imap dovecot: imap(rrernboeck): Fatal: master: service(imap): child 47957 killed with signal 6 (core dumped) May 5 14:53:10 vm-imap dovecot: imap(rrernboeck): Error: Corrupted dbox file /home/vmail/virtualmailboxes/rrernboeck/storage/m.1 (around offset=1302): msg header has bad magic value
On 05/06/2014 04:39 PM, Robert Nowotny wrote:
I propose it would be good practice to write some logging in the default switch cases, to see the unexpected returnvalues in the errorlog.
something like :
ostream-lzma.c : -- Line 147 : i_unreached(); ++ Line 147 : i_fatal("unexpected lzma errorcode when flushing : %s", ret);
after that modification the errorlog shows :
May 6 14:03:05 vm-imap dovecot: imap(test): Fatal: lzma errorcode when flushing : (null) The code change should be i_fatal("unexpected lzma errorcode when flushing : %d", ret); since ret is an integer.
A logging change like that is a good idea and we have been meaning to do something similar. It would help debugging, if you could continue running it in your setup, where the error seems to occur quite frequently. Also tell me if you figure out what data or action triggers it.
br, Teemu Huovila
On 05/06/2014 04:39 PM, Robert Nowotny wrote:
I propose it would be good practice to write some logging in the default
switch cases, to see the unexpected returnvalues in the errorlog.
something like :
ostream-lzma.c : -- Line 147 : i_unreached(); ++ Line 147 : i_fatal("unexpected lzma errorcode when flushing : %s", ret);
after that modification the errorlog shows :
May 6 14:03:05 vm-imap dovecot: imap(test): Fatal: lzma errorcode when flushing : (null)
The code change should be i_fatal("unexpected lzma errorcode when flushing : %d", ret); since ret is an integer.
sure - You are Right.
After changing the code, the Error Log shows : lzma errorcode when flushing : 0
0 means LZMA_OK
from the description in the LMZA base.h :
LZMA_FINISH = 3 /**< * \brief Finish the coding operation * * All the input data must have been given to the encoder * (the last bytes can still be pending in next_in). * Call lzma_code() with LZMA_FINISH until it returns * LZMA_STREAM_END. Once LZMA_FINISH has been used, * the amount of input must no longer be changed by * the application. * * When decoding, using LZMA_FINISH is optional unless the * LZMA_CONCATENATED flag was used when the decoder was * initialized. When LZMA_CONCATENATED was not used, the only * effect of LZMA_FINISH is that the amount of input must not * be changed just like in the encoder. */
A logging change like that is a good idea and we have been meaning to do something similar. It would help debugging, if you could continue running it in your setup, where the error seems to occur quite frequently. Also tell me if you figure out what data or action triggers it.
The error only occurs if I sync a lot (thousands) of emails through IMAPSYNC to a new mailbox. It seems calling lzma_code(zs, LZMA_FINISH); returns LZMA_OK - but is still not ready. I guess the call gives back LZMA_OK (meaning - ok I will do that) but needs some more time to LZMA_FINISH
On low load it seems to be able to finish in time - so that error never occurs. But on high (write to SSD Drive) load, it returns LZMA_OK instead of LZMA_FINISH.
I changed the code now as follows and it seems to work now. I will convert some other huge boxes and lat You know if we will have some otrher negative effects on that ...
This seems to work:
ostream-lzma.c :
Line 137 fff :
ret = lzma_code(zs, LZMA_FINISH);
switch (ret) {
case LZMA_STREAM_END:
done = TRUE;
break;
/* LZMA_FINISH should be called until LZMA_STREAM_END */
case LZMA_OK:
o_stream_get_name(&zstream->ostream.ostream)); default:break; case LZMA_MEM_ERROR: i_fatal_status(FATAL_OUTOFMEM, "lzma.write(%s): Out of memory",
/* got error on high load syncing through imap */
i_fatal("lzma errorcode when flushing : %d", ret); i_unreached(); } } while (zs->avail_out != sizeof(zstream->outbuf));
since I am not into lmza programming - is the while clause correct, or do we need another loop until lzma_code(zs, LZMA_FINISH) returns LZMA_STREAM_END ???
br
Ing. Robert Nowotny Vienna
when using xz compression, on high load (syncing thousands of emails) dovecot panics with : Panic: file ostream-lzma.c: line 147: unreached
after some investigation I found out that the call
ret = lzma_code(zs, LZMA_FINISH);
returns LZMA_OK (what is enumerated integer 0) sometimes.
again - this error will only show up under heavy load, for instance when syncing a lot of emails to a new mailbox with imapsync or dsync.
After patching the file ostream-lzma.c as follows, it seems to work now correctly.
Someone with more knowledge about the lzma library and that codepart should probably look over it, before merging it into the code.
I did extensive testing with huge mailboxes successfully- just insert the lines marked with '++' and recompile.
-- code snipped of ostream-lzma.c, around line 124 --
do {
len = sizeof(zstream->outbuf) - zs->avail_out;
if (len != 0) {
zs->next_out = zstream->outbuf;
zs->avail_out = sizeof(zstream->outbuf);
zstream->outbuf_used = len;
if ((ret = o_stream_zlib_send_outbuf(zstream)) <=
++ /* LZMA_FINISH should be called until LZMA_STREAM_END */ ++ /* got LZMA_OK on high server load when syncing through imap */ ++ case LZMA_OK: ++ break; case LZMA_MEM_ERROR: i_fatal_status(FATAL_OUTOFMEM, "lzma.write(%s): Out of memory", o_stream_get_name(&zstream->ostream.ostream)); default: ++ /* log the unexpected error code if any */ ++ i_fatal("lzma returncode when flushing : %d", ret); i_unreached(); } } while (zs->avail_out != sizeof(zstream->outbuf));return ret; if (done) break; } ret = lzma_code(zs, LZMA_FINISH); switch (ret) { case LZMA_STREAM_END: done = TRUE; break;
-- /code snipped --
yours sincerely
Ing. Robert Nowotny Rotek GmbH Vienna, Austria
On 6.5.2014, at 23.24, Robert Nowotny <rnowotny@rotek.at> wrote:
when using xz compression, on high load (syncing thousands of emails) dovecot panics with : Panic: file ostream-lzma.c: line 147: unreached
after some investigation I found out that the call
ret = lzma_code(zs, LZMA_FINISH);
returns LZMA_OK (what is enumerated integer 0) sometimes.
again - this error will only show up under heavy load, for instance when syncing a lot of emails to a new mailbox with imapsync or dsync.
After patching the file ostream-lzma.c as follows, it seems to work now correctly.
Someone with more knowledge about the lzma library and that codepart should probably look over it, before merging it into the code.
I think your change is ok, but I did a bit more changes to make it work exactly the same as the example included in xz-utils, just in case something else might be still wrong. http://hg.dovecot.org/dovecot-2.2/rev/8e6d98d34dbf
participants (3)
-
Robert Nowotny
-
Teemu Huovila
-
Timo Sirainen