Fatal: master: service(indexer-worker): child 493 killed with signal 11 (core dumped)

Tim Mohlmann tim at usrpro.com
Fri Dec 21 15:01:13 EET 2018


Hi, I now this is kinda impolite to ask, but I would like some feedback
on this issue. I'm representing the Mailu user group here, plus the fact
that my own production server is affected and starts to crash due to
file corruption as a result of the indexer-worker segfaults. Last night
my server became unavailable with logs like:

Error: imap(tim at usrpro.com)<131><2RGyoYd9Jq/AqMsO>: Corrupted
transaction log file /mail/tim at usrpro.com/dovecot.index.log seq 6:
record size too small (type=0x2272560, offset=26340, size=0)
(sync_offset=26592)

Deleting the corrupted files brought it back, but I don't really like
the prospect of cleaning corrupted files during christmas. ;)

I would like to know: do you consider this a real dovecot bug? Or is
this something as a result of the way  Alpine configures the package?
Should I cross post the report to Alpine? Or is it worth changing the
image to Debian instead?

Or maybe you don't consider the trace good enough to work on? Also that
I would like to know.

Many thanks in advance!

Tim

On 12/20/18 9:49 AM, Tim Mohlmann via dovecot wrote:
> I'm really sorry, but it was a long day yesterday. I forgot to mention
> that I used alpine:edge (based on the alpine master branch) image to set
> up the build environment. So there were some version changes.
>
> dovecot --version: 2.3.3 (dcead646b)
>
> apk info musl:
>
> musl-1.1.20-r2 description:
> the musl c library (libc) implementation
>
> Configuration was unchanged wrt to my first mail. Maybe it's helpful to
> share the build flags?
>
> export CFLAGS="-O0"
> export CXXFLAGS="$CFLAGS"
>
> export CPPFLAGS="$CFLAGS"
>
> export LDFLAGS="-Wl,--as-needed"
>
> In the APKBUILD file can be found here:
> https://git.alpinelinux.org/cgit/aports/tree/main/dovecot/APKBUILD and
> I've modified the following:
>
> options="libtool !strip"
>
> subpackages="
>
>      ....
>
>      $pkgname-dbg
>
> "
>
> And if it is of any interest, I've done a `docker commit` on the
> container and pushed the resulting image as muhlemmer/dovecot:debug.
> https://cloud.docker.com/repository/docker/muhlemmer/dovecot/tags. Don't
> know if its of much use by itself, without the Mailu ecosystem around.
> Just in case.
>
> Cheers!
>
> On 12/20/18 2:04 AM, Tim Mohlmann via dovecot wrote:
>> I've managed to re-compile the alpine packages inside the container,
>> creating the debug symbols and disabling compile time optimizations, as
>> they also disturbed the backtrace. The following output looks much more
>> meaningful. But unfortunately there are still some symbols missing for
>> lucene-core. Let me know if this is enough to work with. (It doesn't
>> seem to be musl after all?)
>>
>> Reading symbols from /usr/libexec/dovecot/indexer-worker...Reading
>> symbols from
>> /usr/lib/debug//usr/libexec/dovecot/indexer-worker.debug...done.
>> done.
>> [New LWP 30417]
>>
>> warning: Can't read pathname for load map: No error information.
>> Core was generated by `dovecot/indexer-worker'.
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> #0  0x00007f1653f12c31 in ?? () from /usr/lib/libclucene-core.so.1
>> (gdb) bt full
>> #0  0x00007f1653f12c31 in ?? () from /usr/lib/libclucene-core.so.1
>> No symbol table info available.
>> #1  0x00007f1653edeba2 in
>> lucene::index::KeepOnlyLastCommitDeletionPolicy::onCommit(std::vector<lucene::index::IndexCommitPoint*,
>> std::allocator<lucene::index::IndexCommitPoint*> >&) ()
>>    from /usr/lib/libclucene-core.so.1
>> No symbol table info available.
>> #2  0x00007f1653f15312 in ?? () from /usr/lib/libclucene-core.so.1
>> No symbol table info available.
>> #3  0x00007f1653f325e3 in lucene::index::IndexWriter::doFlush(bool) ()
>> from /usr/lib/libclucene-core.so.1
>> No symbol table info available.
>> #4  0x00007f1653f32f9c in lucene::index::IndexWriter::flush(bool, bool)
>> () from /usr/lib/libclucene-core.so.1
>> No symbol table info available.
>> #5  0x00007f1653f33052 in
>> lucene::index::IndexWriter::closeInternal(bool) () from
>> /usr/lib/libclucene-core.so.1
>> No symbol table info available.
>> #6  0x00007f1653f337c4 in lucene::index::IndexWriter::close(bool) ()
>> from /usr/lib/libclucene-core.so.1
>> No symbol table info available.
>> #7  0x00007f16541fc8c6 in lucene_index_build_deinit
>> (index=0x55945fe7a860) at lucene-wrapper.cc:646
>>         ret = 0
>> #8  0x00007f16541f9ba4 in fts_backend_lucene_update_deinit
>> (_ctx=0x55945fe910a0) at fts-backend-lucene.c:259
>>         ctx = 0x55945fe910a0
>>         backend = 0x55945fe7a760
>>         ret = 0
>>         __func__ = "fts_backend_lucene_update_deinit"
>> #9  0x00007f1654649baa in fts_backend_update_deinit
>> (_ctx=0x55945fe7a480) at fts-api.c:144
>>         ctx = 0x55945fe910a0
>>         backend = 0x55945fe7a760
>>         ret = 0
>> #10 0x00007f1654655d83 in fts_transaction_end (t=0x55945fe88820,
>> error_r=0x7fff1b28d1b8) at fts-storage.c:613
>>         ft = 0x55945fe84400
>>         flist = 0x55945fe7a3a0
>>         ret = 0
>>         __func__ = "fts_transaction_end"
>> #11 0x00007f16546562e8 in fts_transaction_commit (t=0x55945fe88820,
>> changes_r=0x7fff1b28d2c0) at fts-storage.c:696
>>         ft = 0x55945fe84400
>> --Type <RET> for more, q to quit, c to continue without paging--c
>>         fbox = 0x55945fe7f798
>>         box = 0x55945fe7efa8
>>         autoindex = false
>>         ret = 0
>>         error = 0x7f1654813665 <p_strdup_vprintf+90>
>> "H\213E\350\353\067\213E\344\211\302H\213E\330H\211\326H\211\307\350\301\367\377\377H\211E\360\213E\344\203\350\001\211\302H\213M\350H\213E\360H\211\316H\211\307\350\302\357\354\377H\213E\360H\213M\370dH3\f%("
>> #12 0x00007f165469b83e in quota_mailbox_transaction_commit
>> (ctx=0x55945fe88820, changes_r=0x7fff1b28d2c0) at quota-storage.c:174
>>         qbox = 0x55945fe7f600
>>         qt = 0x55945fe8f160
>>         __func__ = "quota_mailbox_transaction_commit"
>> #13 0x00007f16548fefad in mailbox_transaction_commit_get_changes
>> (_t=0x7fff1b28d348, changes_r=0x7fff1b28d2c0) at mail-storage.c:2215
>>         _data_stack_cur_id = 4
>>         t = 0x55945fe88820
>>         box = 0x55945fe7efa8
>>         save_count = 0
>>         ret = 21908
>>         __func__ = "mailbox_transaction_commit_get_changes"
>> #14 0x00007f16548feefc in mailbox_transaction_commit (t=0x7fff1b28d348)
>> at mail-storage.c:2197
>>         changes = {pool = 0x0, uid_validity = 1418718114, saved_uids =
>> {arr = {buffer = 0x7f16548e5d9e <__x86_return_thunk+5>, element_size =
>> 140733649048408}, v = 0x7f16548e5d9e <__x86_return_thunk+5>,
>> v_modifiable = 0x7f16548e5d9e <__x86_return_thunk+5>},
>> ignored_modseq_changes = 455660384, changes_mask = (unknown: 0),
>> no_read_perm = false}
>>         ret = 21908
>> #15 0x000055945e4abdcb in index_mailbox_precache (conn=0x55945fe57440,
>> box=0x55945fe7efa8) at master-connection.c:126
>>         storage = 0x55945fe77f28
>>         username = 0x55945fe73ee0 "admin at usrpro.io"
>>         box_vname = 0x55945fe7f310 "Sent"
>>         errstr = 0x7f165469cb53 <quota_mailbox_sync_deinit+164>
>> "\213E\344H\203\304([]\351\320f\377\377UH\211\345H\203\354\060H\211}\330dH\213\004%("
>>         error = MAIL_ERROR_TEMP
>>         status = {messages = 10, recent = 0, unseen = 0, uidvalidity =
>> 1545140227, uidnext = 11, first_unseen_seq = 0, first_recent_uid = 3,
>> last_cached_seq = 1, highest_modseq = 0, highest_pvt_modseq = 0,
>> keywords = 0x0, permanent_flags = 0, flags = 0, permanent_keywords =
>> false, allow_new_keywords = false, nonpermanent_modseqs = false,
>> no_modseq_tracking = false, have_guids = true, have_save_guids = true,
>> have_only_guid128 = false}
>>         trans = 0x0
>>         search_args = 0x0
>>         ctx = 0x0
>>         mail = 0x0
>>         metadata = {guid = '\000' <repeats 15 times>, virtual_size = 0,
>> physical_size = 0, first_save_date = 0, cache_fields = 0x0,
>> precache_fields = (MAIL_FETCH_STREAM_HEADER | MAIL_FETCH_STREAM_BODY |
>> MAIL_FETCH_RECEIVED_DATE | MAIL_FETCH_SAVE_DATE |
>> MAIL_FETCH_PHYSICAL_SIZE | MAIL_FETCH_VIRTUAL_SIZE |
>> MAIL_FETCH_UIDL_BACKEND | MAIL_FETCH_GUID | MAIL_FETCH_POP3_ORDER),
>> backend_ns_prefix = 0x0, backend_ns_type = (unknown: 0)}
>>         seq = 2
>>         first_uid = 2
>>         last_uid = 10
>>         percentage_str = "\026\177\000"
>>         counter = 9
>>         max = 9
>>         percentage = 32534
>>         percentage_sent = 0
>>         ret = 0
>>         uids = 0x55945fe48938 " (UIDs 2..10)"
>> #16 0x000055945e4ac1c1 in index_mailbox (conn=0x55945fe57440,
>> user=0x55945fe73dc8, mailbox=0x55945fe46058 "Sent", max_recent_msgs=0,
>> what=0x55945fe46070 "i") at master-connection.c:205
>>         ns = 0x55945fe77da0
>>         box = 0x55945fe7efa8
>>         status = {messages = 1608873868, recent = 0, unseen =
>> 1581972288, uidvalidity = 21908, uidnext = 455660880, first_unseen_seq =
>> 32767, first_recent_uid = 1608810272, last_cached_seq = 21908,
>> highest_modseq = 139733882519610, highest_pvt_modseq = 94095725476155,
>> keywords = 0x7fff1b28d540, permanent_flags = 0, flags = 0,
>> permanent_keywords = false, allow_new_keywords = false,
>> nonpermanent_modseqs = false, no_modseq_tracking = true, have_guids =
>> true, have_save_guids = false, have_only_guid128 = true}
>>         path = 0x55945fe47f38 "/mail/admin at usrpro.io/.Sent"
>>         errstr = 0x7f16546e403a <__x86_return_thunk+5>
>> "\363\220\017\256\350\353\371H\215d$\b\303\350\a"
>>         error = 32534
>>         sync_flags = MAILBOX_SYNC_FLAG_FULL_READ
>>         ret = 0
>> #17 0x000055945e4ac3d8 in master_connection_input_line
>> (conn=0x55945fe57440, line=0x55945fe5778c
>> "admin at usrpro.io\tSent\t2HvXT2h9epPAqMsG\t0\ti") at master-connection.c:247
>>         args = 0x55945fe46078
>>         input = {parent_event = 0x0, module = 0x55945e4ad2ec "mail",
>> service = 0x55945e4ad2f1 "indexer-worker", username = 0x55945fe46048
>> "admin at usrpro.io", session_id = 0x0, session_id_prefix = 0x55945fe4605d
>> "2HvXT2h9epPAqMsG", session_create_time = 0, local_ip = {family = 0, u =
>> {ip6 = {__in6_union = {__s6_addr = '\000' <repeats 15 times>,
>> __s6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __s6_addr32 = {0, 0, 0, 0}}},
>> ip4 = {s_addr = 0}}}, remote_ip = {family = 0, u = {ip6 = {__in6_union =
>> {__s6_addr = '\000' <repeats 15 times>, __s6_addr16 = {0, 0, 0, 0, 0, 0,
>> 0, 0}, __s6_addr32 = {0, 0, 0, 0}}}, ip4 = {s_addr = 0}}}, local_port =
>> 0, remote_port = 0, userdb_fields = 0x0, flags_override_add = (unknown:
>> 0), flags_override_remove = (unknown: 0), no_userdb_lookup = false,
>> debug = false, conn_secured = false, conn_ssl_secured = false}
>>         service_user = 0x55945fe59958
>>         user = 0x55945fe73dc8
>>         str = 0x7f16547f6231 <pool_system_free+28>
>> "\220\311\351\375\335\356\377UH\211\345H\203\354
>> H\211}\370H\211u\360H\211U\350H\211M\340H\203", <incomplete sequence \340>
>>         error = 0x0
>>         max_recent_msgs = 0
>>         ret = 32767
>> #18 0x000055945e4ac58f in master_connection_input (conn=0x55945fe57440)
>> at master-connection.c:287
>>         _data_stack_cur_id = 3
>>         line = 0x55945fe5778c
>> "admin at usrpro.io\tSent\t2HvXT2h9epPAqMsG\t0\ti"
>>         ret = 32534
>> #19 0x00007f16547e73c8 in io_loop_call_io (io=0x55945fe57480) at
>> ioloop.c:698
>>         ioloop = 0x55945e4afca0
>>         t_id = 2
>>         __func__ = "io_loop_call_io"
>> #20 0x00007f16547ea409 in io_loop_handler_run_internal
>> (ioloop=0x55945e4afca0) at ioloop-epoll.c:221
>>         ctx = 0x7f1654b0dda0
>>         events = 0x55945fe548c0
>>         event = 0x55945fe548c0
>>         list = 0x55945fe57500
>>         io = 0x55945fe57480
>>         tv = {tv_sec = 2147483, tv_usec = 0}
>>         events_count = 5
>>         msecs = -1
>>         ret = 1
>>         i = 0
>>         j = 0
>>         call = true
>>         __func__ = "io_loop_handler_run_internal"
>> #21 0x00007f16547e7657 in io_loop_handler_run (ioloop=0x55945e4afca0) at
>> ioloop.c:750
>>         __func__ = "io_loop_handler_run"
>> #22 0x00007f16547e752e in io_loop_run (ioloop=0x55945e4afca0) at
>> ioloop.c:723
>>         __func__ = "io_loop_run"
>> #23 0x00007f165471b3d5 in master_service_run (service=0x55945e4afb40,
>> callback=0x55945e4ab5ed <client_connected>) at master-service.c:775
>> No locals.
>> #24 0x000055945e4ab7ce in main (argc=1, argv=0x55945e4af8e0) at
>> indexer-worker.c:78
>>         service_flags = (MASTER_SERVICE_FLAG_KEEP_CONFIG_OPEN |
>> MASTER_SERVICE_FLAG_SEND_STATS)
>>         storage_service_flags = (MAIL_STORAGE_SERVICE_FLAG_USERDB_LOOKUP
>> | MAIL_STORAGE_SERVICE_FLAG_TEMP_PRIV_DROP |
>> MAIL_STORAGE_SERVICE_FLAG_NO_IDLE_TIMEOUT)
>>         c = -1
>>
>>
>> On 12/19/18 8:15 PM, Tim Mohlmann via dovecot wrote:
>>
>> <Deleted due to message size limit>
>>
>>



More information about the dovecot mailing list