On 19 December 2018 at 20:00 Tim Mohlmann via dovecot < dovecot@dovecot.org> wrote:


Dear list,

We been having some issues where the indexer-worker is crashing. This
happens on production servers which are handling a slight amount of
mail, but is also reproducible by moving messages. Also, users on my
server are complaining about "Trashed" items coming back etc.

Some details:

- Dovecot 2.3.2.1 in alpine:3.8 based Docker container. As part of the
Mailu distribution. ( https://github.com/Mailu/Mailu)

- I've first seen this issues on my production server, which stores mail
on GlusterFS

- I've been able to reproduce running the Docker container on a Virtual
machine, using local storage.

- There is another Mailu user reporting the same problem on a different
VM provider / disk infrastructure: https://github.com/Mailu/Mailu/issues/751

- Libc: musl-1.1.19-r10

Output of dovecot -n:

# dovecot -n
# 2.3.2.1 (0719df592): /etc/dovecot/dovecot.conf
# Pigeonhole version 0.5.2 (7704de5e)
# OS: Linux 4.16.3-301.fc28.x86_64 x86_64  ext4
# Hostname: 98a2726271d3
auth_mechanisms = plain login
disable_plaintext_auth = no
first_valid_gid = 8
first_valid_uid = 8
hostname = mail.usrpro.io
log_path = /dev/stderr
mail_access_groups = mail
mail_gid = mail
mail_home = /mail/%u
mail_location = maildir:/mail/%u
mail_plugins = " fts fts_lucene quota quota_clone zlib"
mail_privileged_group = mail
mail_uid = mail
mail_vsize_bg_after_count = 100
maildir_stat_dirs = yes
managesieve_notify_capability = mailto
managesieve_sieve_capability = fileinto reject envelope
encoded-character vacation subaddress comparator-i;ascii-numeric
relational regex imap4flags copy include variables body enotify
environment mailbox date index ihave duplicate mime foreverypart
extracttext spamtest spamtestplus editheader imapsieve vnd.dovecot.imapsieve
namespace inbox {
  inbox = yes
  location =
  mailbox Drafts {
    auto = subscribe
    special_use = \Drafts
  }
  mailbox Junk {
    auto = subscribe
    special_use = \Junk
  }
  mailbox Sent {
    auto = subscribe
    special_use = \Sent
  }
  mailbox Trash {
    auto = subscribe
    special_use = \Trash
  }
  prefix =
}
passdb {
  args = /etc/dovecot/auth.conf
  driver = dict
}
plugin {
  fts = lucene
  fts_autoindex = yes
  fts_autoindex_exclude = \Junk
  fts_lucene = whitespace_chars=@.
  imapsieve_mailbox1_before = file:/conf/report-spam.sieve
  imapsieve_mailbox1_causes = COPY
  imapsieve_mailbox1_name = Junk
  imapsieve_mailbox2_before = file:/conf/report-ham.sieve
  imapsieve_mailbox2_causes = COPY
  imapsieve_mailbox2_from = Junk
  imapsieve_mailbox2_name = *
  quota = count:User quota
  quota_clone_dict = proxy:/tmp/podop.socket:quota
  quota_vsizes = yes
  sieve = file:~/sieve;active=~/.dovecot.sieve
  sieve_before = dict:proxy:/tmp/podop.socket:sieve
  sieve_execute_bin_dir = /conf/bin
  sieve_extensions = +spamtest +spamtestplus +editheader
  sieve_global_extensions = +vnd.dovecot.execute
  sieve_plugins = sieve_imapsieve sieve_extprograms
  sieve_spamtest_max_value = 15
  sieve_spamtest_status_header = X-Spam-Level
  sieve_spamtest_status_type = strlen
  sieve_vacation_dont_check_recipient = yes
  sieve_vacation_send_from_recipient = yes
}
postmaster_address = admin@usrpro.io
protocols = imap pop3 lmtp sieve
service auth-worker {
  unix_listener auth-worker {
    group = mail
    mode = 0660
    user = dovecot
  }
  user = mail
}
service auth {
  user = dovecot
}
service imap-login {
  inet_listener imap {
    port = 143
  }
}
service lmtp {
  inet_listener lmtp {
    port = 2525
  }
}
service managesieve-login {
  inet_listener sieve {
    port = 4190
  }
}
submission_host = 192.168.203.6
userdb {
  args = /etc/dovecot/auth.conf
  driver = dict
}
protocol imap {
  mail_plugins = " fts fts_lucene quota quota_clone zlib imap_quota
imap_sieve"
}
protocol lmtp {
  mail_plugins = " fts fts_lucene quota quota_clone zlib sieve"
  recipient_delimiter = +
}

And the actual error log:

imap_1       | Dec 19 16:31:08
indexer-worker( admin@usrpro.io)<490><m+t5VmJ93K7AqMsG:grc9HUxyGlzqAQAANEhNiw>:
Fatal: master: service(indexer-worker): child 490 killed with signal 11
(core dumped)
imap_1       | Dec 19 16:31:09
indexer-worker( admin@usrpro.io)<493><m+t5VmJ93K7AqMsG:HRLEK0xyGlztAQAANEhNiw>:
Error: lucene index /mail/admin@usrpro.io/lucene-indexes: IndexWriter()
failed (#1): Lock obtain timed out
imap_1       | Dec 19 16:31:09
indexer-worker( admin@usrpro.io)<493><m+t5VmJ93K7AqMsG:HRLEK0xyGlztAQAANEhNiw>:
Error: Mailbox INBOX: Mail search failed: Internal error occurred. Refer
to server log for more information. [2018-12-19 16:31:08]
imap_1       | Dec 19 16:31:09
indexer-worker( admin@usrpro.io)<493><m+t5VmJ93K7AqMsG:HRLEK0xyGlztAQAANEhNiw>:
Error: Mailbox INBOX: Transaction commit failed: FTS transaction commit
failed: backend deinit (attempted to index 1 messages (UIDs 1299..1299))
imap_1       | Dec 19 16:31:10
indexer-worker( admin@usrpro.io)<493><m+t5VmJ93K7AqMsG:GKWdMU1yGlztAQAANEhNiw>:
Fatal: master: service(indexer-worker): child 493 killed with signal 11
(core dumped)
imap_1       | Dec 19 16:31:10 indexer: Error: Indexer worker
disconnected, discarding 1 requests for admin@usrpro.io
imap_1       | Dec 19 16:31:11
indexer-worker( admin@usrpro.io)<494><m+t5VmJ93K7AqMsG:MRCzBE5yGlzuAQAANEhNiw>:
Error: lucene index /mail/admin@usrpro.io/lucene-indexes: IndexWriter()
failed (#1): Lock obtain timed out
imap_1       | Dec 19 16:31:11
indexer-worker( admin@usrpro.io)<494><m+t5VmJ93K7AqMsG:MRCzBE5yGlzuAQAANEhNiw>:
Error: Mailbox INBOX: Mail search failed: Internal error occurred. Refer
to server log for more information. [2018-12-19 16:31:10]
imap_1       | Dec 19 16:31:11
indexer-worker( admin@usrpro.io)<494><m+t5VmJ93K7AqMsG:MRCzBE5yGlzuAQAANEhNiw>:
Error: Mailbox INBOX: Transaction commit failed: FTS transaction commit
failed: backend deinit (attempted to index 1 messages (UIDs 1310..1310))
imap_1       | Dec 19 16:31:11 indexer: Error: Indexer worker
disconnected, discarding 1 requests for admin@usrpro.io

I managed to find a core dump file, which appeared outside of the
container. So I copied it back in, installed and ran gdb:

GNU gdb (GDB) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
< http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-alpine-linux-musl".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
< ">http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
< ">http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/libexec/dovecot/indexer-worker...(no debugging
symbols found)...done.
[New LWP 1075]

warning: Can't read pathname for load map: No error information.
Core was generated by `dovecot/indexer-worker'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fbd9a31c11a in free () from /lib/ld-musl-x86_64.so.1

So this seems musl related. I installed musl-dbg and ran again:

GNU gdb (GDB) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
< http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-alpine-linux-musl".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
< ">http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
< ">http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/libexec/dovecot/indexer-worker...(no debugging
symbols found)...done.
[New LWP 1075]

warning: Can't read pathname for load map: No error information.
Core was generated by `dovecot/indexer-worker'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  a_crash () at ./arch/x86_64/atomic_arch.h:108
108     ./arch/x86_64/atomic_arch.h: No such file or directory.

Now I'm kinda lost in space. I don't know where that header file is.
Tried running "find" on the filesystem and a google search. But nothing
specific showed up.

I am starting to feel this bug is more musl related than Dovecot. Since
this has bitten our project more in the past, I'm considering to move
the project to Debian based images. But I want to be 100% sure this is
not a dovecot bug.

Note, earlier I created an image with Alpine:edge with musl 1.1.20-r2
and Dovecot 2.3.3. Running that image I saw the same error symptoms (Was
just a hopeful trail and error). I did not do any debugging on that one.

Thanks in advance! Tim


Can you run bt full on gdb and post that?
---
Aki Tuomi