Dear list,
We been having some issues where the indexer-worker is crashing. This happens on production servers which are handling a slight amount of mail, but is also reproducible by moving messages. Also, users on my server are complaining about "Trashed" items coming back etc.
Some details:
Dovecot 2.3.2.1 in alpine:3.8 based Docker container. As part of the Mailu distribution. (https://github.com/Mailu/Mailu)
I've first seen this issues on my production server, which stores mail on GlusterFS
I've been able to reproduce running the Docker container on a Virtual machine, using local storage.
There is another Mailu user reporting the same problem on a different VM provider / disk infrastructure: https://github.com/Mailu/Mailu/issues/751
Libc: musl-1.1.19-r10
Output of dovecot -n:
# dovecot -n # 2.3.2.1 (0719df592): /etc/dovecot/dovecot.conf # Pigeonhole version 0.5.2 (7704de5e) # OS: Linux 4.16.3-301.fc28.x86_64 x86_64 ext4 # Hostname: 98a2726271d3 auth_mechanisms = plain login disable_plaintext_auth = no first_valid_gid = 8 first_valid_uid = 8 hostname = mail.usrpro.io log_path = /dev/stderr mail_access_groups = mail mail_gid = mail mail_home = /mail/%u mail_location = maildir:/mail/%u mail_plugins = " fts fts_lucene quota quota_clone zlib" mail_privileged_group = mail mail_uid = mail mail_vsize_bg_after_count = 100 maildir_stat_dirs = yes managesieve_notify_capability = mailto managesieve_sieve_capability = fileinto reject envelope encoded-character vacation subaddress comparator-i;ascii-numeric relational regex imap4flags copy include variables body enotify environment mailbox date index ihave duplicate mime foreverypart extracttext spamtest spamtestplus editheader imapsieve vnd.dovecot.imapsieve namespace inbox { inbox = yes location = mailbox Drafts { auto = subscribe special_use = \Drafts } mailbox Junk { auto = subscribe special_use = \Junk } mailbox Sent { auto = subscribe special_use = \Sent } mailbox Trash { auto = subscribe special_use = \Trash } prefix = } passdb { args = /etc/dovecot/auth.conf driver = dict } plugin { fts = lucene fts_autoindex = yes fts_autoindex_exclude = \Junk fts_lucene = whitespace_chars=@. imapsieve_mailbox1_before = file:/conf/report-spam.sieve imapsieve_mailbox1_causes = COPY imapsieve_mailbox1_name = Junk imapsieve_mailbox2_before = file:/conf/report-ham.sieve imapsieve_mailbox2_causes = COPY imapsieve_mailbox2_from = Junk imapsieve_mailbox2_name = * quota = count:User quota quota_clone_dict = proxy:/tmp/podop.socket:quota quota_vsizes = yes sieve = file:~/sieve;active=~/.dovecot.sieve sieve_before = dict:proxy:/tmp/podop.socket:sieve sieve_execute_bin_dir = /conf/bin sieve_extensions = +spamtest +spamtestplus +editheader sieve_global_extensions = +vnd.dovecot.execute sieve_plugins = sieve_imapsieve sieve_extprograms sieve_spamtest_max_value = 15 sieve_spamtest_status_header = X-Spam-Level sieve_spamtest_status_type = strlen sieve_vacation_dont_check_recipient = yes sieve_vacation_send_from_recipient = yes } postmaster_address = admin@usrpro.io protocols = imap pop3 lmtp sieve service auth-worker { unix_listener auth-worker { group = mail mode = 0660 user = dovecot } user = mail } service auth { user = dovecot } service imap-login { inet_listener imap { port = 143 } } service lmtp { inet_listener lmtp { port = 2525 } } service managesieve-login { inet_listener sieve { port = 4190 } } submission_host = 192.168.203.6 userdb { args = /etc/dovecot/auth.conf driver = dict } protocol imap { mail_plugins = " fts fts_lucene quota quota_clone zlib imap_quota imap_sieve" } protocol lmtp { mail_plugins = " fts fts_lucene quota quota_clone zlib sieve" recipient_delimiter = + }
And the actual error log:
imap_1 | Dec 19 16:31:08 indexer-worker(admin@usrpro.io)<490><m+t5VmJ93K7AqMsG:grc9HUxyGlzqAQAANEhNiw>: Fatal: master: service(indexer-worker): child 490 killed with signal 11 (core dumped) imap_1 | Dec 19 16:31:09 indexer-worker(admin@usrpro.io)<493><m+t5VmJ93K7AqMsG:HRLEK0xyGlztAQAANEhNiw>: Error: lucene index /mail/admin@usrpro.io/lucene-indexes: IndexWriter() failed (#1): Lock obtain timed out imap_1 | Dec 19 16:31:09 indexer-worker(admin@usrpro.io)<493><m+t5VmJ93K7AqMsG:HRLEK0xyGlztAQAANEhNiw>: Error: Mailbox INBOX: Mail search failed: Internal error occurred. Refer to server log for more information. [2018-12-19 16:31:08] imap_1 | Dec 19 16:31:09 indexer-worker(admin@usrpro.io)<493><m+t5VmJ93K7AqMsG:HRLEK0xyGlztAQAANEhNiw>: Error: Mailbox INBOX: Transaction commit failed: FTS transaction commit failed: backend deinit (attempted to index 1 messages (UIDs 1299..1299)) imap_1 | Dec 19 16:31:10 indexer-worker(admin@usrpro.io)<493><m+t5VmJ93K7AqMsG:GKWdMU1yGlztAQAANEhNiw>: Fatal: master: service(indexer-worker): child 493 killed with signal 11 (core dumped) imap_1 | Dec 19 16:31:10 indexer: Error: Indexer worker disconnected, discarding 1 requests for admin@usrpro.io imap_1 | Dec 19 16:31:11 indexer-worker(admin@usrpro.io)<494><m+t5VmJ93K7AqMsG:MRCzBE5yGlzuAQAANEhNiw>: Error: lucene index /mail/admin@usrpro.io/lucene-indexes: IndexWriter() failed (#1): Lock obtain timed out imap_1 | Dec 19 16:31:11 indexer-worker(admin@usrpro.io)<494><m+t5VmJ93K7AqMsG:MRCzBE5yGlzuAQAANEhNiw>: Error: Mailbox INBOX: Mail search failed: Internal error occurred. Refer to server log for more information. [2018-12-19 16:31:10] imap_1 | Dec 19 16:31:11 indexer-worker(admin@usrpro.io)<494><m+t5VmJ93K7AqMsG:MRCzBE5yGlzuAQAANEhNiw>: Error: Mailbox INBOX: Transaction commit failed: FTS transaction commit failed: backend deinit (attempted to index 1 messages (UIDs 1310..1310)) imap_1 | Dec 19 16:31:11 indexer: Error: Indexer worker disconnected, discarding 1 requests for admin@usrpro.io
I managed to find a core dump file, which appeared outside of the container. So I copied it back in, installed and ran gdb:
GNU gdb (GDB) 8.0.1 Copyright (C) 2017 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-alpine-linux-musl". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/libexec/dovecot/indexer-worker...(no debugging symbols found)...done. [New LWP 1075]
warning: Can't read pathname for load map: No error information. Core was generated by `dovecot/indexer-worker'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007fbd9a31c11a in free () from /lib/ld-musl-x86_64.so.1
So this seems musl related. I installed musl-dbg and ran again:
GNU gdb (GDB) 8.0.1 Copyright (C) 2017 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-alpine-linux-musl". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/libexec/dovecot/indexer-worker...(no debugging symbols found)...done. [New LWP 1075]
warning: Can't read pathname for load map: No error information. Core was generated by `dovecot/indexer-worker'. Program terminated with signal SIGSEGV, Segmentation fault. #0 a_crash () at ./arch/x86_64/atomic_arch.h:108 108 ./arch/x86_64/atomic_arch.h: No such file or directory.
Now I'm kinda lost in space. I don't know where that header file is. Tried running "find" on the filesystem and a google search. But nothing specific showed up.
I am starting to feel this bug is more musl related than Dovecot. Since this has bitten our project more in the past, I'm considering to move the project to Debian based images. But I want to be 100% sure this is not a dovecot bug.
Note, earlier I created an image with Alpine:edge with musl 1.1.20-r2 and Dovecot 2.3.3. Running that image I saw the same error symptoms (Was just a hopeful trail and error). I did not do any debugging on that one.
Thanks in advance! Tim