On Monday 08 April 2013, Timo Sirainen wrote:
On 8.4.2013, at 12.19, Pigi pigi@frumar.it wrote:
Program received signal SIGSEGV, Segmentation fault. 0xfebd65dc in _malloc_unlocked () from /lib/libc.so.1 (gdb) bt full #0 0xfebd65dc in _malloc_unlocked () from /lib/libc.so.1 No symbol table info available. #1 0xfebd66e4 in realloc () from /lib/libc.so.1 No symbol table info available. #2 0xff1d9a70 in pool_system_realloc (pool=0xff21576c, mem=0xff21576c, old_size=0, new_size=8192) at mempool-system.c:115 No locals. #3 0xff1cda28 in i_realloc (mem=0x0, old_size=0, new_size=8192) at imem.c:14 No locals.
Looks to me like memory corruption, which is just about impossible to debug if you don't have valgrind or a similar tool (and I don't think Solaris has any such free tools).
Yes, you're right. There aren't ( already packaged ) tool for valgrinding or similar.
Do you happen to get any other crashes where it doesn't crash in malloc code? You showed previously that it was crashing with SIGBUS somewhere.
For what I can see, form dovecot.log I have a couple of error: Apr 08 12:02:18 indexer: Error: Indexer worker disconnected, discarding 1 requests for xxxx Apr 08 12:02:18 imap(xxxx): Error: indexer failed to index mailbox INBOX Apr 08 12:02:18 indexer-worker(xxxx): Fatal: master: service(indexer-worker): child 27242 killed with signal 11 (core not dumped) Apr 08 12:02:19 indexer-worker(xxxx): Error: lucene index /home/xxxx/Maildir/lucene-indexes: IndexWriter() failed (#1): Lock obtain timed out Apr 08 12:03:46 indexer: Error: Indexer worker disconnected, discarding 1 requests for xxxx Apr 08 12:03:46 imap(xxxx): Error: indexer failed to index mailbox INBOX Apr 08 12:03:46 indexer-worker(xxxx): Fatal: master: service(indexer-worker): child 27445 killed with signal 10 (core not dumped) Apr 08 12:06:57 indexer: Error: Indexer worker disconnected, discarding 1 requests for xxxx Apr 08 12:06:57 imap(xxxx): Error: indexer failed to index mailbox INBOX Apr 08 12:06:57 indexer-worker(xxxx): Fatal: master: service(indexer-worker): child 27761 killed with signal 11 (core not dumped) Apr 08 12:08:33 indexer: Error: Indexer worker disconnected, discarding 1 requests for xxxx
Here a couple or more of gdb"ed" process, but tied to SEGV ( I almost suspect that when gdbing I always catch a SEGV ) :
============================================================== First gdb
No symbol table info available. #1 0xfebd6884 in realloc () from /lib/libc.so.1 No symbol table info available. #2 0xfe83d270 in lucene::util::StringReader::init (this=0x8d328, _value=0x2bb288, _length=3958, copyData=true) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/util/Reader.cpp:48 tmp = (TCHAR *) 0x2051e0 length = 3958 #3 0xfe8837f0 in lucene::index::DocumentsWriter::ThreadState::FieldData::invertField (this=0x92df8, field=0x8d238, analyzer=0x2cb60, maxFieldLength=10000) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/index/DocumentsWriterThreadState.cpp:887 stringValue = (const TCHAR *) 0x2bb288 stringValueLength = 3958 stream = (class lucene::analysis::TokenStream *) 0x0 streamValue = (class lucene::analysis::TokenStream *) 0x8d328 #4 0xfe883a58 in lucene::index::DocumentsWriter::ThreadState::FieldData::processField (this=0x92df8, analyzer=0x2cb60) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/index/DocumentsWriterThreadState.cpp:798 field = (class lucene::document::Field *) 0x8d238 #5 0xfe883d6c in lucene::index::DocumentsWriter::ThreadState::processDocument (this=0x8c0a8, analyzer=0x2cb60) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/util/Array.h:90 i = 6 #6 0xfe87ea00 in lucene::index::DocumentsWriter::updateDocument (this=0x84790, doc=0x8b838, analyzer=0x2cb60, delTerm=0x0) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/index/DocumentsWriter.cpp:946 state = (class lucene::index::DocumentsWriter::ThreadState *) 0x8c0a8 #7 0xfe87eb5c in lucene::index::DocumentsWriter::addDocument (this=0x84790, doc=0x8b838, analyzer=0x2cb60) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/index/DocumentsWriter.cpp:930 No locals. #8 0xfe8b1ea0 in lucene::index::IndexWriter::addDocument (this=0x8cc18, doc=0x8b838, analyzer=0x2cb60) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/index/IndexWriter.cpp:682 No locals. #9 0xfea18e04 in lucene_index_build_flush (index=0x37ba8) at lucene-wrapper.cc:484 ret = 0 #10 0xfea18f88 in lucene_index_build_more (index=0x37ba8, uid=2995, data=0x185d18 "xxxxxx@xxxxxxxx.xx", size=23, hdr_name=0x8ba60 "Return-Path") at lucene-wrapper.cc:507 id = {4, 0, 0, 0, -4196824, -15011172, 534768, -4196336, 516256, 0, 11560, -20155708, 9, 10, 36, 2727408, 516256, 148, 0, 441, 148864, 534768, 3} namesize = 9216 datasize = 9216 dest = Cannot access memory at address 0xbb3 (gdb) quit
============================================================== Second gdb
gdb) bt full #0 0xfebd6b24 in realfree () from /lib/libc.so.1 No symbol table info available. #1 0xfebd7354 in cleanfree () from /lib/libc.so.1 No symbol table info available. #2 0xfebd64ac in _malloc_unlocked () from /lib/libc.so.1 No symbol table info available. #3 0xfebd639c in malloc () from /lib/libc.so.1 No symbol table info available. #4 0xfe737b38 in operator new () from /usr/sfw/lib/libstdc++.so.6 No symbol table info available. #5 0xfe8613f4 in Document (this=0x8b838) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/document/Document.cpp:57 No locals. #6 0xfea18fa8 in lucene_index_build_more (index=0x37ba8, uid=2995, data=0x8d7b0 "xxxxxx@xxxxxxxx.xx", size=23, hdr_name=0x8b9d0 "Return-Path") at lucene-wrapper.cc:511 id = {4, 0, 0, 0, -4196824, -15011172, 1605360, -4196336, 516256, 0, 11560, -20155708, 9, 10, 36, 2122656, 516256, 148, 0, 441, 148864, 1605360, 3} namesize = 9216 datasize = 9216 dest = Cannot access memory at address 0xbb3 (gdb) quit
=============================================== third gdb
(gdb) cont Continuing.
Program received signal SIGSEGV, Segmentation fault. 0xfebd710c in t_splay () from /lib/libc.so.1 (gdb) bt full #0 0xfebd710c in t_splay () from /lib/libc.so.1 No symbol table info available. #1 0xfebd6eb8 in t_delete () from /lib/libc.so.1 No symbol table info available. #2 0xfebd6acc in realfree () from /lib/libc.so.1 No symbol table info available. #3 0xfebd72c8 in _free_unlocked () from /lib/libc.so.1 No symbol table info available. #4 0xfebd7204 in free () from /lib/libc.so.1 No symbol table info available. #5 0xfe862144 in lucene::document::Field::_resetValue (this=0x71e288) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/document/Field.cpp:300 No locals. #6 0xfe862248 in ~Field (this=0x71e288) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/document/Field.cpp:103 No locals. #7 0xfe90dcc8 in ~CLVector (this=0x23c248) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/util/Equators.h:227 itr = {_M_current = 0x71eb00} #8 0xfe860dfc in ~Document (this=0x832e0) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/document/Document.cpp:69 No locals. #9 0xfea18e30 in lucene_index_build_flush (index=0x8adc0) at lucene-wrapper.cc:493 ret = 0 #10 0xfea18f88 in lucene_index_build_more (index=0x8adc0, uid=4128, data=0x6b2b00 "YYYYYYYYY@yyyyyyyyyy.yyy", size=23, hdr_name=0x1b6508 "Return-Path") at lucene-wrapper.cc:507 id = {4, 0, 0, 0, -4196824, -15011172, 3114656, -4196336, 584416, 0, 11560, 0, 0, 0, 36, 7697128, 584416, 845, 0, 2382, 148880, 3114656, 3} namesize = 9216 datasize = 9216 dest = Cannot access memory at address 0x1020 (gdb)
Here a couple of ( probably ) interesting stuff :
root # grep "signal 10" /usr/local/dovecot/var/log/dovecot.log |wc 33 284 2308 root # grep "signal 11" /usr/local/dovecot/var/log/dovecot.log |wc 69 549 4457 root # grep "Lock obtain timed out" /usr/local/dovecot/var/log/dovecot.log |wc 74 420 3976
Thus, 74 times of 102 seems to be tied to "IndexWriter() failed (#1): Lock obtain timed out" ( which I don't really understand, or at least I have not enough comprehension of code to understand). The path where lucene try to get a lock is not NFS, but is an automount on the same machine ( no NFS then ).
It seems that the "Lock obtain timed out" happens every time I read a message with the MUA.
This error ("Lock obtain") happens every time I read a message from my MUA, I think it's due to the indexing process running on READ imap command:
Apr 08 15:18:13 indexer: Error: Indexer worker disconnected, discarding 1 requests for xxxx Apr 08 15:18:13 imap(xxxx): Error: indexer failed to index mailbox INBOX Apr 08 15:18:13 indexer-worker(xxxx): Fatal: master: service(indexer-worker): child 18171 killed with signal 11 (core not dumped) Apr 08 15:18:15 indexer-worker(xxxx): Error: lucene index /home/xxxx/Maildir/lucene-indexes: IndexWriter() failed (#1): Lock obtain timed out
When searching, instead, get the "Lock timeout" followed by another error:
Apr 08 15:11:37 indexer: Error: Indexer worker disconnected, discarding 1 requests for xxxx Apr 08 15:11:37 imap(xxxx): Error: indexer failed to index mailbox INBOX Apr 08 15:11:37 indexer-worker(xxxx): Fatal: master: service(indexer-worker): child 17438 killed with signal 10 (core not dumped) Apr 08 15:11:38 indexer-worker(xxxx): Error: lucene index /home/xxxx/Maildir/lucene-indexes: IndexWriter() failed (#1): Lock obtain timed out Apr 08 15:11:51 indexer: Error: Indexer worker disconnected, discarding 1 requests for xxxx Apr 08 15:11:51 imap(xxxx): Error: indexer failed to index mailbox INBOX
In both case the signal ( 11 or 10, SEGV or SIGBUS ) it's indifferent ( sometimes is 10 sometimes is 11 ).
Hope I've been able to explain, as my english seems to get worst every time ;)
Pigi