[Dovecot] Problem with fts lucene, on solaris 10
Hi all, I'm planning to migrate my courier-imap imap server to dovecot, but I'm experiencing a strange issue with fts-lucene plugin.
Basically, every time I start a search, the log starts to write:
Apr 05 19:30:53 indexer: Error: Indexer worker disconnected, discarding 1 requests for XXXXXX Apr 05 19:30:53 indexer-worker(XXXXX): Fatal: master: service(indexer-worker): child 809 killed with signal 11 (core not dumped) Apr 05 19:30:53 imap(XXXXXX): Error: indexer failed to index mailbox INBOX Apr 05 19:30:56 indexer-worker(XXXXX): Error: lucene index /home/XXXXX/Maildir/lucene-indexes: IndexWriter() failed (#1): Lock obtain timed out
When trussing the indexer-worker pid I can see it ends with: ...SNIP... 527: brk(0x005A7840) = 0 527: brk(0x005AD840) = 0 527: pread64(16, " I G E N E R I C I < /".., 8170, 24514) = 8170 527: brk(0x005AD840) = 0 527: brk(0x005B3840) = 0 527: Incurred fault #5, FLTACCESS %pc = 0xFEBD6DDC 527: siginfo: SIGBUS BUS_ADRALN addr=0x005B50BA 527: Received signal #10, SIGBUS [default] 527: siginfo: SIGBUS BUS_ADRALN addr=0x005B50BA
I cannot get a core, but either root ( user that is starting the dovecot ) than the user XXXXX have unlimited coresize XXXXX /home/XXXXXXX > ulimit -a time(seconds) unlimited file(blocks) unlimited data(kbytes) unlimited stack(kbytes) 8192 coredump(blocks) unlimited nofiles(descriptors) 256 vmemory(kbytes) unlimited
root # ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited file size (blocks, -f) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 10 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 29995 virtual memory (kbytes, -v) unlimited
The mailbox for user XXXXXX ( Maildir stile, for speed-up migration ) has ~ 4000 messages.
The search ends, but I think it don't use the lucene plugin ( it crashes ).
I'm sure I'm missing something, but I can't really find what.
All stuff have been self compiled, using standard flags as per "configure" scripts.
Thanks for any hint.
Pigi
here the environment:
uname -a
SunOS puma 5.10 Generic_137137-09 sun4u sparc SUNW,Sun-Fire-V240
( yes, I know, it's quite old ;) )
======= dovecot --version 2.1.15
======= doveconf -n # 2.1.15: /usr/local/dovecot/etc/dovecot/dovecot.conf # OS: SunOS 5.10 sun4u debug_log_path = /usr/local/dovecot/var/log/dovecot-debug.log disable_plaintext_auth = no first_valid_uid = 100 info_log_path = /usr/local/dovecot/var/log/dovecot-info.log log_path = /usr/local/dovecot/var/log/dovecot.log mail_location = maildir:~/Maildir mail_plugins = " fts fts_lucene" namespace { inbox = yes location = prefix = INBOX. separator = . } passdb { driver = pam } plugin { fts = lucene fts_lucene = whitespace_chars=@. } service imap-login { inet_listener imap { port = 144 } } service indexer-worker { drop_priv_before_exec = yes } service pop3-login { inet_listener pop3 { port = 112 } } ssl = no userdb { driver = passwd }
============= gcc -v Reading specs from /usr/sfw/lib/gcc/sparc-sun-solaris2.10/3.4.3/specs Configured with: /sfw10/builds/build/sfw10-patch/usr/src/cmd/gcc/gcc-3.4.3/configure --prefix=/usr/sfw --with-as=/usr/ccs/bin/as --without-gnu-as --with-ld=/usr/ccs/bin/ld --without-gnu-ld --enable-languages=c,c++ --enable-shared Thread model: posix gcc version 3.4.3 (csl-sol210-3_4-branch+sol_rpath)
============= clucene clucene-core-2.3.3.4 libstemmer_c libtextcat-2.2 openssl-1.0.1e
On 5.4.2013, at 22.59, Pigi <pigi@frumar.it> wrote:
I'm planning to migrate my courier-imap imap server to dovecot, but I'm experiencing a strange issue with fts-lucene plugin.
Basically, every time I start a search, the log starts to write:
Apr 05 19:30:53 indexer: Error: Indexer worker disconnected, discarding 1 requests for XXXXXX Apr 05 19:30:53 indexer-worker(XXXXX): Fatal: master: service(indexer-worker): child 809 killed with signal 11 (core not dumped) Apr 05 19:30:53 imap(XXXXXX): Error: indexer failed to index mailbox INBOX Apr 05 19:30:56 indexer-worker(XXXXX): Error: lucene index /home/XXXXX/Maildir/lucene-indexes: IndexWriter() failed (#1): Lock obtain timed out
When trussing the indexer-worker pid I can see it ends with:
If you can truss it, you can also gdb it, which is much more helpful:
gdb -p <pid of index-worker> cont <make it crash> bt full
527: siginfo: SIGBUS BUS_ADRALN addr=0x005B50BA 527: Received signal #10, SIGBUS [default] 527: siginfo: SIGBUS BUS_ADRALN addr=0x005B50BA
That says it's a SIGBUS, but the Dovecot log said it's a SIGSEGV. Wonder what's the difference, or maybe you're getting both. Anyway I'd guess this is a clucene bug related to CPU alignment with Sparc CPU. gdb would confirm and could help with fixing it.
I haven't had too much luck with gdb:
0xfec4c554 in ?? () (gdb) cont Continuing.
Program received signal SIGSEGV, Segmentation fault. 0xfebd710c in ?? () (gdb) bt full #0 0xfebd710c in ?? () No symbol table info available. (gdb) quit The program is running. Quit anyway (and detach it)? (y or n) y Detaching from program: , process 16400
That's rather strange, as :
root# file /usr/local/dovecot/sbin/dovecot dovecot: ELF 32-bit MSB executable SPARC Version 1, dynamically linked, not stripped root# file /usr/local/clucene/lib/lib* /usr/local/clucene/lib/libclucene-core.so: ELF 32-bit MSB dynamic lib SPARC Version 1, dynamically linked, not stripped /usr/local/clucene/lib/libclucene-core.so.1: ELF 32-bit MSB dynamic lib SPARC Version 1, dynamically linked, not stripped /usr/local/clucene/lib/libclucene-core.so.2.3.3.4: ELF 32-bit MSB dynamic lib SPARC Version 1, dynamically linked, not stripped /usr/local/clucene/lib/libclucene-shared.so: ELF 32-bit MSB dynamic lib SPARC Version 1, dynamically linked, not stripped /usr/local/clucene/lib/libclucene-shared.so.1: ELF 32-bit MSB dynamic lib SPARC Version 1, dynamically linked, not stripped /usr/local/clucene/lib/libclucene-shared.so.2.3.3.4: ELF 32-bit MSB dynamic lib SPARC Version 1, dynamically linked, not stripped root #
Thus, every binary, or lib, is not stripped.
If I invoke gdb on those binary I can get every simbol: root # gdb /usr/local/dovecot/sbin/dovecot ... SNIP ... (gdb) list 736 "\n", IO_BLOCK_SIZE); 737 } 738 739 int main(int argc, char *argv[]) 740 { 741 struct master_settings *set; 742 const char *error, *doveconf_arg = NULL; 743 failure_callback_t *orig_info_callback, *orig_debug_callback; 744 bool foreground = FALSE, ask_key_pass = FALSE; 745 bool doubleopts[argc];
or even root # gdb /usr/local/clucene/lib/libclucene-core.so ... SNIP ... (gdb) list 1 /*------------------------------------------------------------------------------ 2 * Copyright (C) 2003-2006 Ben van Klinken and the CLucene Team 3 * 4 * Distributable under the terms of either the Apache License (Version 2.0) or 5 * the GNU Lesser General Public License, as specified in the COPYING file. 6 ------------------------------------------------------------------------------*/ 7 #include "CLucene/_ApiHeader.h" 8 #include "CLucene/util/Misc.h" 9 10 #include "CLucene/search/Sort.h" (gdb)
Should I compile with -g (and in case, how do I pass this flag to cmake ) ?
Thanks.
Pigi
On Sunday 07 April 2013, Timo Sirainen wrote:
On 5.4.2013, at 22.59, Pigi <pigi@frumar.it> wrote:
I'm planning to migrate my courier-imap imap server to dovecot, but I'm experiencing a strange issue with fts-lucene plugin.
Basically, every time I start a search, the log starts to write:
Apr 05 19:30:53 indexer: Error: Indexer worker disconnected, discarding 1 requests for XXXXXX Apr 05 19:30:53 indexer-worker(XXXXX): Fatal: master: service(indexer-worker): child 809 killed with signal 11 (core not dumped) Apr 05 19:30:53 imap(XXXXXX): Error: indexer failed to index mailbox INBOX Apr 05 19:30:56 indexer-worker(XXXXX): Error: lucene index /home/XXXXX/Maildir/lucene-indexes: IndexWriter() failed (#1): Lock obtain timed out
When trussing the indexer-worker pid I can see it ends with:
If you can truss it, you can also gdb it, which is much more helpful:
gdb -p <pid of index-worker> cont <make it crash> bt full
527: siginfo: SIGBUS BUS_ADRALN addr=0x005B50BA 527: Received signal #10, SIGBUS [default] 527: siginfo: SIGBUS BUS_ADRALN addr=0x005B50BA
That says it's a SIGBUS, but the Dovecot log said it's a SIGSEGV. Wonder what's the difference, or maybe you're getting both. Anyway I'd guess this is a clucene bug related to CPU alignment with Sparc CPU. gdb would confirm and could help with fixing it.
On 8.4.2013, at 11.54, Pigi <pigi@frumar.it> wrote:
I haven't had too much luck with gdb:
0xfec4c554 in ?? () (gdb) cont Continuing.
Program received signal SIGSEGV, Segmentation fault. 0xfebd710c in ?? () (gdb) bt full #0 0xfebd710c in ?? () No symbol table info available. (gdb) quit The program is running. Quit anyway (and detach it)? (y or n) y Detaching from program: , process 16400
gdb is just confused. Try another way if it works:
gdb /usr/local/libexec/dovecot/imap 16400 cont bt full
On Monday 08 April 2013, Timo Sirainen wrote:
On 8.4.2013, at 11.54, Pigi <pigi@frumar.it> wrote:
gdb is just confused. Try another way if it works:
gdb /usr/local/libexec/dovecot/imap 16400 cont bt full
Timo,
I took a sligtly different approach:
gdb /usr/local/dovecot/libexec/dovecot/indexer-worker ps -ef | grep indexer-worker | grep -v grep |awk '{print $2}'
thus attaching to indexer-worker, which I think ( but not sure ) it should be fine.
Here the result: GNU gdb 6.6 Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.10"... Attaching to program `/usr/local/dovecot/libexec/dovecot/indexer-worker', process 21672 Reading symbols from /usr/local/dovecot/lib/dovecot/libdovecot-storage.so.0...done. Loaded symbols for /usr/local/dovecot/lib/dovecot/libdovecot-storage.so.0 Reading symbols from /usr/local/dovecot/lib/dovecot/libdovecot.so.0...done. Loaded symbols for /usr/local/dovecot/lib/dovecot/libdovecot.so.0 Reading symbols from /usr/local/openssl-1.0.1e/lib/libssl.so.1.0.0...done. Loaded symbols for /usr/local/openssl-1.0.1e/lib//libssl.so.1.0.0 Reading symbols from /usr/local/openssl-1.0.1e/lib/libcrypto.so.1.0.0...done. Loaded symbols for /usr/local/openssl-1.0.1e/lib//libcrypto.so.1.0.0 Reading symbols from /lib/libdl.so.1... warning: Lowest section in /lib/libdl.so.1 is .hash at 000000b4 done. Loaded symbols for /lib/libdl.so.1 Reading symbols from /usr/local/lib/libiconv.so.2...done. Loaded symbols for /usr/local/lib/libiconv.so.2 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libsocket.so.1...done. Loaded symbols for /lib/libsocket.so.1 Reading symbols from /lib/librt.so.1...done. Loaded symbols for /lib/librt.so.1 Reading symbols from /lib/libsendfile.so.1...done. Loaded symbols for /lib/libsendfile.so.1 Reading symbols from /lib/libc.so.1...done. Loaded symbols for /lib/libc.so.1 Reading symbols from /usr/sfw/lib/libgcc_s.so.1...done. Loaded symbols for /usr/sfw/lib/libgcc_s.so.1 Reading symbols from /lib/libaio.so.1...done. Loaded symbols for /lib/libaio.so.1 Reading symbols from /lib/libmd.so.1...done. Loaded symbols for /lib/libmd.so.1 Reading symbols from /platform/SUNW,Sun-Fire-V240/lib/libc_psr.so.1...done. Loaded symbols for /platform/SUNW,Sun-Fire-V240/lib/libc_psr.so.1 Reading symbols from /usr/local/dovecot/lib/dovecot/lib20_fts_plugin.so...done. Loaded symbols for /usr/local/dovecot/lib/dovecot/lib20_fts_plugin.so Reading symbols from /lib/libscf.so.1...done. Loaded symbols for /lib/libscf.so.1 Reading symbols from /lib/libdoor.so.1...done. Loaded symbols for /lib/libdoor.so.1 Reading symbols from /lib/libuutil.so.1...done. Loaded symbols for /lib/libuutil.so.1 Reading symbols from /lib/libgen.so.1...done. Loaded symbols for /lib/libgen.so.1 Reading symbols from /platform/SUNW,Sun-Fire-V240/lib/libmd_psr.so.1...done. Loaded symbols for /platform/SUNW,Sun-Fire-V240/lib/libmd_psr.so.1 Reading symbols from /lib/libmp.so.2...done. Loaded symbols for /lib/libmp.so.2 Reading symbols from /usr/local/dovecot/lib/dovecot/lib21_fts_lucene_plugin.so...done. Loaded symbols for /usr/local/dovecot/lib/dovecot/lib21_fts_lucene_plugin.so Reading symbols from /usr/local/clucene/lib/libclucene-shared.so.1...done. Loaded symbols for /usr/local/clucene/lib//libclucene-shared.so.1 Reading symbols from /usr/local/clucene/lib/libclucene-core.so.1...done. Loaded symbols for /usr/local/clucene/lib//libclucene-core.so.1 Reading symbols from /usr/sfw/lib/libstdc++.so.6...done. Loaded symbols for /usr/sfw/lib/libstdc++.so.6 Reading symbols from /lib/libm.so.2...done. Loaded symbols for /lib/libm.so.2 Reading symbols from /usr/lib/libz.so...done. Loaded symbols for /usr/lib/libz.so Reading symbols from /lib/ld.so.1...done. Loaded symbols for /lib/ld.so.1 sol-thread active. Retry #1: Retry #2: Retry #3: Retry #4: [New LWP 1 ] [New Thread 1 (LWP 1)] Symbols already loaded for /usr/local/dovecot/lib/dovecot/libdovecot-storage.so.0 Symbols already loaded for /usr/local/dovecot/lib/dovecot/libdovecot.so.0 Symbols already loaded for /usr/local/openssl-1.0.1e/lib//libssl.so.1.0.0 Symbols already loaded for /usr/local/openssl-1.0.1e/lib//libcrypto.so.1.0.0 Symbols already loaded for /lib/libdl.so.1 Symbols already loaded for /usr/local/lib/libiconv.so.2 Symbols already loaded for /lib/libnsl.so.1 Symbols already loaded for /lib/libsocket.so.1 Symbols already loaded for /lib/librt.so.1 Symbols already loaded for /lib/libsendfile.so.1 Symbols already loaded for /lib/libc.so.1 Symbols already loaded for /usr/sfw/lib/libgcc_s.so.1 Symbols already loaded for /lib/libaio.so.1 Symbols already loaded for /lib/libmd.so.1 Symbols already loaded for /platform/SUNW,Sun-Fire-V240/lib/libc_psr.so.1 Symbols already loaded for /usr/local/dovecot/lib/dovecot/lib20_fts_plugin.so Symbols already loaded for /lib/libscf.so.1 Symbols already loaded for /lib/libdoor.so.1 Symbols already loaded for /lib/libuutil.so.1 Symbols already loaded for /lib/libgen.so.1 Symbols already loaded for /platform/SUNW,Sun-Fire-V240/lib/libmd_psr.so.1 Symbols already loaded for /lib/libmp.so.2 Symbols already loaded for /usr/local/dovecot/lib/dovecot/lib21_fts_lucene_plugin.so Symbols already loaded for /usr/local/clucene/lib//libclucene-shared.so.1 Symbols already loaded for /usr/local/clucene/lib//libclucene-core.so.1 Symbols already loaded for /usr/sfw/lib/libstdc++.so.6 Symbols already loaded for /lib/libm.so.2 Symbols already loaded for /usr/lib/libz.so Symbols already loaded for /lib/ld.so.1 [Switching to Thread 1 (LWP 1)] 0xfec4c554 in _pread64 () from /lib/libc.so.1 (gdb) cont Continuing.
Program received signal SIGSEGV, Segmentation fault. 0xfebd65dc in _malloc_unlocked () from /lib/libc.so.1 (gdb) bt full #0 0xfebd65dc in _malloc_unlocked () from /lib/libc.so.1 No symbol table info available. #1 0xfebd66e4 in realloc () from /lib/libc.so.1 No symbol table info available. #2 0xff1d9a70 in pool_system_realloc (pool=0xff21576c, mem=0xff21576c, old_size=0, new_size=8192) at mempool-system.c:115 No locals. #3 0xff1cda28 in i_realloc (mem=0x0, old_size=0, new_size=8192) at imem.c:14 No locals. #4 0xff1cf268 in i_stream_grow_buffer (stream=0x1df318, bytes=0) at istream.c:482 old_size = 0 __FUNCTION__ = "i_stream_grow_buffer" #5 0xff1cf33c in i_stream_get_buffer_space (stream=0x1df318, wanted_size=1, size_r=0xffbff35c) at istream.c:500 __FUNCTION__ = "i_stream_get_buffer_space" #6 0xff1d11c8 in i_stream_file_read (stream=0x1df318) at istream-file.c:58 size = 89 ret = 1962776 __FUNCTION__ = "i_stream_file_read" #7 0xff1ce8e4 in i_stream_read (stream=0x1df348) at istream.c:130 _stream = (struct istream_private *) 0x1df318 old_size = 0 ret = 1962824 __FUNCTION__ = "i_stream_read" #8 0xff1ceb9c in i_stream_read_copy_from_parent (istream=0x0) at istream.c:172 stream = (struct istream_private *) 0x194da8 pos = 0 ret = 0 __FUNCTION__ = "i_stream_read_copy_from_parent" #9 0xff30e258 in i_stream_mail_read (stream=0x194da8) at istream-mail.c:67 size = 4281606432 ret = 0 #10 0xff1ce9ac in i_stream_read (stream=0x194dd8) at istream.c:130 _stream = (struct istream_private *) 0x194da8 old_size = 0 ret = 1658328 __FUNCTION__ = "i_stream_read" #11 0xff1cf094 in i_stream_read_data (stream=0x194dd8, data_r=0xffbff614, size_r=0xffbff610, threshold=1) at istream.c:428 ret = -14592148 read_more = false __FUNCTION__ = "i_stream_read_data" #12 0xff1ad9e4 in message_parse_header_next (ctx=0x2a0c88, hdr_r=0xffbff694) at message-header-parser.c:89 msg = (const unsigned char *) 0x0 i = 4279949788 size = 0 startpos = 0 colon_pos = 4294967295 parse_size = 1 ret = 1658328 continued = false continues = false no_newline = false crlf_newline = false __FUNCTION__ = "message_parse_header_next" #13 0xff1af2a4 in parse_next_header (ctx=0x203cf8, block_r=0xffbff810) at message-parser.c:478 part = (struct message_part *) 0x245c0 hdr = (struct message_header_line *) 0x0 size = 0 ret = 2112760 __FUNCTION__ = "parse_next_header" #14 0xff1aff10 in message_parser_parse_next_block (ctx=0x203cf8, block_r=0xffbff810) at message-parser.c:784 ret = 2112760 eof = false full = false __FUNCTION__ = "message_parser_parse_next_block" #15 0xfecd5a4c in fts_build_mail (update_ctx=0x818a8, mail=0x7d0b8) at fts-build-mail.c:295 _data_stack_cur_id = 5 #16 0xfecda9cc in fts_mail_precache (_mail=0x7d0b8) at fts-storage.c:443 _data_stack_cur_id = 4 fmail = (struct fts_mail *) 0xfecee4cc ft = (struct fts_transaction_context *) 0x2bc00 #17 0xff2eff30 in mail_precache (mail=0x7d0b8) at mail.c:281 No locals. #18 0x00011cd0 in master_connection_input (conn=0x2baa0) at master-connection.c:86 _data_stack_cur_id = 3 line = 0xdea8 <Address 0xdea8 out of bounds> #19 0xff1d5090 in io_loop_call_io (io=0x30be8) at ioloop.c:379 ioloop = (struct ioloop *) 0x2c8f0 t_id = 2 #20 0xff1d5a9c in io_loop_handler_run (ioloop=0x2c8f0) at ioloop-poll.c:211 ctx = (struct ioloop_handler_context *) 0x2ba60 pollfd = (struct pollfd *) 0x2fc38 tv = {tv_sec = 2147483, tv_usec = 0} io = (struct io_file *) 0x30be8 msecs = -1 ret = 0 call = true #21 0xff1d4a84 in io_loop_run (ioloop=0x2c8f0) at ioloop.c:398 No locals. #22 0xff1bd98c in master_service_run (service=0x2c240, callback=0x117c0 <client_connected>) at master-service.c:544 No locals. #23 0x000118d8 in main (argc=1, argv=0xffbffde4) at indexer-worker.c:78 storage_service_flags = 547 c = 180800
In case I need to attach to imap, let me know.
Thanks
Pigi
On 8.4.2013, at 12.19, Pigi <pigi@frumar.it> wrote:
gdb /usr/local/libexec/dovecot/imap 16400 .. I took a sligtly different approach: gdb /usr/local/dovecot/libexec/dovecot/indexer-worker
ps -ef | grep indexer-worker | grep -v grep |awk '{print $2}'
thus attaching to indexer-worker, which I think ( but not sure ) it should be fine.
Oh, right, I forgot which process you were gdbing :)
Program received signal SIGSEGV, Segmentation fault. 0xfebd65dc in _malloc_unlocked () from /lib/libc.so.1 (gdb) bt full #0 0xfebd65dc in _malloc_unlocked () from /lib/libc.so.1 No symbol table info available. #1 0xfebd66e4 in realloc () from /lib/libc.so.1 No symbol table info available. #2 0xff1d9a70 in pool_system_realloc (pool=0xff21576c, mem=0xff21576c, old_size=0, new_size=8192) at mempool-system.c:115 No locals. #3 0xff1cda28 in i_realloc (mem=0x0, old_size=0, new_size=8192) at imem.c:14 No locals.
Looks to me like memory corruption, which is just about impossible to debug if you don't have valgrind or a similar tool (and I don't think Solaris has any such free tools).
Do you happen to get any other crashes where it doesn't crash in malloc code? You showed previously that it was crashing with SIGBUS somewhere.
On Monday 08 April 2013, Timo Sirainen wrote:
On 8.4.2013, at 12.19, Pigi <pigi@frumar.it> wrote:
Program received signal SIGSEGV, Segmentation fault. 0xfebd65dc in _malloc_unlocked () from /lib/libc.so.1 (gdb) bt full #0 0xfebd65dc in _malloc_unlocked () from /lib/libc.so.1 No symbol table info available. #1 0xfebd66e4 in realloc () from /lib/libc.so.1 No symbol table info available. #2 0xff1d9a70 in pool_system_realloc (pool=0xff21576c, mem=0xff21576c, old_size=0, new_size=8192) at mempool-system.c:115 No locals. #3 0xff1cda28 in i_realloc (mem=0x0, old_size=0, new_size=8192) at imem.c:14 No locals.
Looks to me like memory corruption, which is just about impossible to debug if you don't have valgrind or a similar tool (and I don't think Solaris has any such free tools).
Yes, you're right. There aren't ( already packaged ) tool for valgrinding or similar.
Do you happen to get any other crashes where it doesn't crash in malloc code? You showed previously that it was crashing with SIGBUS somewhere.
For what I can see, form dovecot.log I have a couple of error: Apr 08 12:02:18 indexer: Error: Indexer worker disconnected, discarding 1 requests for xxxx Apr 08 12:02:18 imap(xxxx): Error: indexer failed to index mailbox INBOX Apr 08 12:02:18 indexer-worker(xxxx): Fatal: master: service(indexer-worker): child 27242 killed with signal 11 (core not dumped) Apr 08 12:02:19 indexer-worker(xxxx): Error: lucene index /home/xxxx/Maildir/lucene-indexes: IndexWriter() failed (#1): Lock obtain timed out Apr 08 12:03:46 indexer: Error: Indexer worker disconnected, discarding 1 requests for xxxx Apr 08 12:03:46 imap(xxxx): Error: indexer failed to index mailbox INBOX Apr 08 12:03:46 indexer-worker(xxxx): Fatal: master: service(indexer-worker): child 27445 killed with signal 10 (core not dumped) Apr 08 12:06:57 indexer: Error: Indexer worker disconnected, discarding 1 requests for xxxx Apr 08 12:06:57 imap(xxxx): Error: indexer failed to index mailbox INBOX Apr 08 12:06:57 indexer-worker(xxxx): Fatal: master: service(indexer-worker): child 27761 killed with signal 11 (core not dumped) Apr 08 12:08:33 indexer: Error: Indexer worker disconnected, discarding 1 requests for xxxx
Here a couple or more of gdb"ed" process, but tied to SEGV ( I almost suspect that when gdbing I always catch a SEGV ) :
============================================================== First gdb
No symbol table info available. #1 0xfebd6884 in realloc () from /lib/libc.so.1 No symbol table info available. #2 0xfe83d270 in lucene::util::StringReader::init (this=0x8d328, _value=0x2bb288, _length=3958, copyData=true) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/util/Reader.cpp:48 tmp = (TCHAR *) 0x2051e0 length = 3958 #3 0xfe8837f0 in lucene::index::DocumentsWriter::ThreadState::FieldData::invertField (this=0x92df8, field=0x8d238, analyzer=0x2cb60, maxFieldLength=10000) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/index/DocumentsWriterThreadState.cpp:887 stringValue = (const TCHAR *) 0x2bb288 stringValueLength = 3958 stream = (class lucene::analysis::TokenStream *) 0x0 streamValue = (class lucene::analysis::TokenStream *) 0x8d328 #4 0xfe883a58 in lucene::index::DocumentsWriter::ThreadState::FieldData::processField (this=0x92df8, analyzer=0x2cb60) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/index/DocumentsWriterThreadState.cpp:798 field = (class lucene::document::Field *) 0x8d238 #5 0xfe883d6c in lucene::index::DocumentsWriter::ThreadState::processDocument (this=0x8c0a8, analyzer=0x2cb60) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/util/Array.h:90 i = 6 #6 0xfe87ea00 in lucene::index::DocumentsWriter::updateDocument (this=0x84790, doc=0x8b838, analyzer=0x2cb60, delTerm=0x0) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/index/DocumentsWriter.cpp:946 state = (class lucene::index::DocumentsWriter::ThreadState *) 0x8c0a8 #7 0xfe87eb5c in lucene::index::DocumentsWriter::addDocument (this=0x84790, doc=0x8b838, analyzer=0x2cb60) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/index/DocumentsWriter.cpp:930 No locals. #8 0xfe8b1ea0 in lucene::index::IndexWriter::addDocument (this=0x8cc18, doc=0x8b838, analyzer=0x2cb60) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/index/IndexWriter.cpp:682 No locals. #9 0xfea18e04 in lucene_index_build_flush (index=0x37ba8) at lucene-wrapper.cc:484 ret = 0 #10 0xfea18f88 in lucene_index_build_more (index=0x37ba8, uid=2995, data=0x185d18 "<xxxxxx@xxxxxxxx.xx>", size=23, hdr_name=0x8ba60 "Return-Path") at lucene-wrapper.cc:507 id = {4, 0, 0, 0, -4196824, -15011172, 534768, -4196336, 516256, 0, 11560, -20155708, 9, 10, 36, 2727408, 516256, 148, 0, 441, 148864, 534768, 3} namesize = 9216 datasize = 9216 dest = Cannot access memory at address 0xbb3 (gdb) quit
============================================================== Second gdb
gdb) bt full #0 0xfebd6b24 in realfree () from /lib/libc.so.1 No symbol table info available. #1 0xfebd7354 in cleanfree () from /lib/libc.so.1 No symbol table info available. #2 0xfebd64ac in _malloc_unlocked () from /lib/libc.so.1 No symbol table info available. #3 0xfebd639c in malloc () from /lib/libc.so.1 No symbol table info available. #4 0xfe737b38 in operator new () from /usr/sfw/lib/libstdc++.so.6 No symbol table info available. #5 0xfe8613f4 in Document (this=0x8b838) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/document/Document.cpp:57 No locals. #6 0xfea18fa8 in lucene_index_build_more (index=0x37ba8, uid=2995, data=0x8d7b0 "<xxxxxx@xxxxxxxx.xx>", size=23, hdr_name=0x8b9d0 "Return-Path") at lucene-wrapper.cc:511 id = {4, 0, 0, 0, -4196824, -15011172, 1605360, -4196336, 516256, 0, 11560, -20155708, 9, 10, 36, 2122656, 516256, 148, 0, 441, 148864, 1605360, 3} namesize = 9216 datasize = 9216 dest = Cannot access memory at address 0xbb3 (gdb) quit
=============================================== third gdb
(gdb) cont Continuing.
Program received signal SIGSEGV, Segmentation fault. 0xfebd710c in t_splay () from /lib/libc.so.1 (gdb) bt full #0 0xfebd710c in t_splay () from /lib/libc.so.1 No symbol table info available. #1 0xfebd6eb8 in t_delete () from /lib/libc.so.1 No symbol table info available. #2 0xfebd6acc in realfree () from /lib/libc.so.1 No symbol table info available. #3 0xfebd72c8 in _free_unlocked () from /lib/libc.so.1 No symbol table info available. #4 0xfebd7204 in free () from /lib/libc.so.1 No symbol table info available. #5 0xfe862144 in lucene::document::Field::_resetValue (this=0x71e288) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/document/Field.cpp:300 No locals. #6 0xfe862248 in ~Field (this=0x71e288) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/document/Field.cpp:103 No locals. #7 0xfe90dcc8 in ~CLVector (this=0x23c248) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/util/Equators.h:227 itr = {_M_current = 0x71eb00} #8 0xfe860dfc in ~Document (this=0x832e0) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/document/Document.cpp:69 No locals. #9 0xfea18e30 in lucene_index_build_flush (index=0x8adc0) at lucene-wrapper.cc:493 ret = 0 #10 0xfea18f88 in lucene_index_build_more (index=0x8adc0, uid=4128, data=0x6b2b00 "<YYYYYYYYY@yyyyyyyyyy.yyy>", size=23, hdr_name=0x1b6508 "Return-Path") at lucene-wrapper.cc:507 id = {4, 0, 0, 0, -4196824, -15011172, 3114656, -4196336, 584416, 0, 11560, 0, 0, 0, 36, 7697128, 584416, 845, 0, 2382, 148880, 3114656, 3} namesize = 9216 datasize = 9216 dest = Cannot access memory at address 0x1020 (gdb)
Here a couple of ( probably ) interesting stuff :
root # grep "signal 10" /usr/local/dovecot/var/log/dovecot.log |wc 33 284 2308 root # grep "signal 11" /usr/local/dovecot/var/log/dovecot.log |wc 69 549 4457 root # grep "Lock obtain timed out" /usr/local/dovecot/var/log/dovecot.log |wc 74 420 3976
Thus, 74 times of 102 seems to be tied to "IndexWriter() failed (#1): Lock obtain timed out" ( which I don't really understand, or at least I have not enough comprehension of code to understand). The path where lucene try to get a lock is not NFS, but is an automount on the same machine ( no NFS then ).
It seems that the "Lock obtain timed out" happens every time I read a message with the MUA.
This error ("Lock obtain") happens every time I read a message from my MUA, I think it's due to the indexing process running on READ imap command:
Apr 08 15:18:13 indexer: Error: Indexer worker disconnected, discarding 1 requests for xxxx Apr 08 15:18:13 imap(xxxx): Error: indexer failed to index mailbox INBOX Apr 08 15:18:13 indexer-worker(xxxx): Fatal: master: service(indexer-worker): child 18171 killed with signal 11 (core not dumped) Apr 08 15:18:15 indexer-worker(xxxx): Error: lucene index /home/xxxx/Maildir/lucene-indexes: IndexWriter() failed (#1): Lock obtain timed out
When searching, instead, get the "Lock timeout" followed by another error:
Apr 08 15:11:37 indexer: Error: Indexer worker disconnected, discarding 1 requests for xxxx Apr 08 15:11:37 imap(xxxx): Error: indexer failed to index mailbox INBOX Apr 08 15:11:37 indexer-worker(xxxx): Fatal: master: service(indexer-worker): child 17438 killed with signal 10 (core not dumped) Apr 08 15:11:38 indexer-worker(xxxx): Error: lucene index /home/xxxx/Maildir/lucene-indexes: IndexWriter() failed (#1): Lock obtain timed out Apr 08 15:11:51 indexer: Error: Indexer worker disconnected, discarding 1 requests for xxxx Apr 08 15:11:51 imap(xxxx): Error: indexer failed to index mailbox INBOX
In both case the signal ( 11 or 10, SEGV or SIGBUS ) it's indifferent ( sometimes is 10 sometimes is 11 ).
Hope I've been able to explain, as my english seems to get worst every time ;)
Pigi
participants (2)
-
Pigi
-
Timo Sirainen