[Dovecot] Problem with fts lucene, on solaris 10

Timo Sirainen tss at iki.fi
Thu Apr 11 19:36:31 EEST 2013


On 11.4.2013, at 18.29, Pigi <pigi at frumar.it> wrote:

> Sorry for top posting, but I have no additional info on this.
> BTW I need to refresh the question as I have tried squat, and I'm biting the dust with the already known problems 
> ( "Corrupted squat uidlist file" or "Panic: file squat-trie.c:" ).
> 
> Is there some pointer on where/how to find assist on fts-lucene to keep on analisys for this problem on sparc ?

Sorry, I don't have any good ideas on how to debug this further. I'd use valgrind on Linux to debug it, but Solaris doesn't have valgrind. Then again if there was a generic bug in CLucene it would be visible by running the same tests in Linux+valgrind, but I'm not aware of CLucene having such generic bugs, so it's probably somehow Solaris or Sparc specific.

> 
> T.I.A.
> 
> Pigi
> 
> On Monday 08 April 2013, Pigi wrote:
>> On Monday 08 April 2013, Timo Sirainen wrote:
>>> On 8.4.2013, at 12.19, Pigi <pigi at frumar.it> wrote:
>>> 
>>>> Program received signal SIGSEGV, Segmentation fault.
>>>> 0xfebd65dc in _malloc_unlocked () from /lib/libc.so.1
>>>> (gdb) bt full
>>>> #0  0xfebd65dc in _malloc_unlocked () from /lib/libc.so.1
>>>> No symbol table info available.
>>>> #1  0xfebd66e4 in realloc () from /lib/libc.so.1
>>>> No symbol table info available.
>>>> #2  0xff1d9a70 in pool_system_realloc (pool=0xff21576c, mem=0xff21576c, old_size=0, new_size=8192) at mempool-system.c:115
>>>> No locals.
>>>> #3  0xff1cda28 in i_realloc (mem=0x0, old_size=0, new_size=8192) at imem.c:14
>>>> No locals.
>>> 
>>> Looks to me like memory corruption, which is just about impossible to debug if you don't have valgrind or a similar tool (and I don't think Solaris has any such free tools).
>> 
>> Yes, you're right. There aren't ( already packaged ) tool for valgrinding or similar.
>> 
>>> Do you happen to get any other crashes where it doesn't crash in malloc code? You showed previously that it was crashing with SIGBUS somewhere.
>> 
>> For what I can see, form dovecot.log I have a couple of error:
>> Apr 08 12:02:18 indexer: Error: Indexer worker disconnected, discarding 1 requests for xxxx
>> Apr 08 12:02:18 imap(xxxx): Error: indexer failed to index mailbox INBOX
>> Apr 08 12:02:18 indexer-worker(xxxx): Fatal: master: service(indexer-worker): child 27242 killed with signal 11 (core not dumped)
>> Apr 08 12:02:19 indexer-worker(xxxx): Error: lucene index /home/xxxx/Maildir/lucene-indexes: IndexWriter() failed (#1): Lock obtain timed out
>> Apr 08 12:03:46 indexer: Error: Indexer worker disconnected, discarding 1 requests for xxxx
>> Apr 08 12:03:46 imap(xxxx): Error: indexer failed to index mailbox INBOX
>> Apr 08 12:03:46 indexer-worker(xxxx): Fatal: master: service(indexer-worker): child 27445 killed with signal 10 (core not dumped)
>> Apr 08 12:06:57 indexer: Error: Indexer worker disconnected, discarding 1 requests for xxxx
>> Apr 08 12:06:57 imap(xxxx): Error: indexer failed to index mailbox INBOX
>> Apr 08 12:06:57 indexer-worker(xxxx): Fatal: master: service(indexer-worker): child 27761 killed with signal 11 (core not dumped)
>> Apr 08 12:08:33 indexer: Error: Indexer worker disconnected, discarding 1 requests for xxxx
>> 
>> Here a couple or more of gdb"ed" process, but tied to SEGV ( I almost suspect that when gdbing I always catch a SEGV ) : 
>> 
>> ==============================================================
>> First gdb
>> ==============================================================
>> No symbol table info available.
>> #1  0xfebd6884 in realloc () from /lib/libc.so.1
>> No symbol table info available.
>> #2  0xfe83d270 in lucene::util::StringReader::init (this=0x8d328, _value=0x2bb288, _length=3958, copyData=true)
>>    at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/util/Reader.cpp:48
>>        tmp = (TCHAR *) 0x2051e0
>>        length = 3958
>> #3  0xfe8837f0 in lucene::index::DocumentsWriter::ThreadState::FieldData::invertField (this=0x92df8, field=0x8d238, analyzer=0x2cb60, maxFieldLength=10000)
>>    at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/index/DocumentsWriterThreadState.cpp:887
>>        stringValue = (const TCHAR *) 0x2bb288
>>        stringValueLength = 3958
>>        stream = (class lucene::analysis::TokenStream *) 0x0
>>        streamValue = (class lucene::analysis::TokenStream *) 0x8d328
>> #4  0xfe883a58 in lucene::index::DocumentsWriter::ThreadState::FieldData::processField (this=0x92df8, analyzer=0x2cb60)
>>    at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/index/DocumentsWriterThreadState.cpp:798
>>        field = (class lucene::document::Field *) 0x8d238
>> #5  0xfe883d6c in lucene::index::DocumentsWriter::ThreadState::processDocument (this=0x8c0a8, analyzer=0x2cb60)
>>    at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/util/Array.h:90
>>        i = 6
>> #6  0xfe87ea00 in lucene::index::DocumentsWriter::updateDocument (this=0x84790, doc=0x8b838, analyzer=0x2cb60, delTerm=0x0)
>>    at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/index/DocumentsWriter.cpp:946
>>        state = (class lucene::index::DocumentsWriter::ThreadState *) 0x8c0a8
>> #7  0xfe87eb5c in lucene::index::DocumentsWriter::addDocument (this=0x84790, doc=0x8b838, analyzer=0x2cb60)
>>    at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/index/DocumentsWriter.cpp:930
>> No locals.
>> #8  0xfe8b1ea0 in lucene::index::IndexWriter::addDocument (this=0x8cc18, doc=0x8b838, analyzer=0x2cb60)
>>    at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/index/IndexWriter.cpp:682
>> No locals.
>> #9  0xfea18e04 in lucene_index_build_flush (index=0x37ba8) at lucene-wrapper.cc:484
>>        ret = 0
>> #10 0xfea18f88 in lucene_index_build_more (index=0x37ba8, uid=2995, data=0x185d18 "<xxxxxx at xxxxxxxx.xx>", size=23, hdr_name=0x8ba60 "Return-Path")
>>    at lucene-wrapper.cc:507
>>        id = {4, 0, 0, 0, -4196824, -15011172, 534768, -4196336, 516256, 0, 11560, -20155708, 9, 10, 36, 2727408, 516256, 148, 0, 441, 148864, 534768, 3}
>>        namesize = 9216
>>        datasize = 9216
>>        dest = Cannot access memory at address 0xbb3
>> (gdb) quit
>> 
>> ==============================================================
>> Second gdb
>> ==============================================================
>> 
>> gdb) bt full
>> #0  0xfebd6b24 in realfree () from /lib/libc.so.1
>> No symbol table info available.
>> #1  0xfebd7354 in cleanfree () from /lib/libc.so.1
>> No symbol table info available.
>> #2  0xfebd64ac in _malloc_unlocked () from /lib/libc.so.1
>> No symbol table info available.
>> #3  0xfebd639c in malloc () from /lib/libc.so.1
>> No symbol table info available.
>> #4  0xfe737b38 in operator new () from /usr/sfw/lib/libstdc++.so.6
>> No symbol table info available.
>> #5  0xfe8613f4 in Document (this=0x8b838) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/document/Document.cpp:57
>> No locals.
>> #6  0xfea18fa8 in lucene_index_build_more (index=0x37ba8, uid=2995, data=0x8d7b0 "<xxxxxx at xxxxxxxx.xx>", size=23, hdr_name=0x8b9d0 "Return-Path")
>>    at lucene-wrapper.cc:511
>>        id = {4, 0, 0, 0, -4196824, -15011172, 1605360, -4196336, 516256, 0, 11560, -20155708, 9, 10, 36, 2122656, 516256, 148, 0, 441, 148864, 1605360, 3}
>>        namesize = 9216
>>        datasize = 9216
>>        dest = Cannot access memory at address 0xbb3
>> (gdb) quit
>> 
>> ===============================================
>> third gdb 
>> =====================================================
>> (gdb) cont
>> Continuing.
>> 
>> Program received signal SIGSEGV, Segmentation fault.
>> 0xfebd710c in t_splay () from /lib/libc.so.1
>> (gdb) bt full
>> #0  0xfebd710c in t_splay () from /lib/libc.so.1
>> No symbol table info available.
>> #1  0xfebd6eb8 in t_delete () from /lib/libc.so.1
>> No symbol table info available.
>> #2  0xfebd6acc in realfree () from /lib/libc.so.1
>> No symbol table info available.
>> #3  0xfebd72c8 in _free_unlocked () from /lib/libc.so.1
>> No symbol table info available.
>> #4  0xfebd7204 in free () from /lib/libc.so.1
>> No symbol table info available.
>> #5  0xfe862144 in lucene::document::Field::_resetValue (this=0x71e288) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/document/Field.cpp:300
>> No locals.
>> #6  0xfe862248 in ~Field (this=0x71e288) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/document/Field.cpp:103
>> No locals.
>> #7  0xfe90dcc8 in ~CLVector (this=0x23c248) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/util/Equators.h:227
>>        itr = {_M_current = 0x71eb00}
>> #8  0xfe860dfc in ~Document (this=0x832e0) at /extpool/PACKAGES/DOVECOT/clucene-core-2.3.3.4/src/core/CLucene/document/Document.cpp:69
>> No locals.
>> #9  0xfea18e30 in lucene_index_build_flush (index=0x8adc0) at lucene-wrapper.cc:493
>>        ret = 0
>> #10 0xfea18f88 in lucene_index_build_more (index=0x8adc0, uid=4128, data=0x6b2b00 "<YYYYYYYYY at yyyyyyyyyy.yyy>", size=23, hdr_name=0x1b6508 "Return-Path")
>>    at lucene-wrapper.cc:507
>>        id = {4, 0, 0, 0, -4196824, -15011172, 3114656, -4196336, 584416, 0, 11560, 0, 0, 0, 36, 7697128, 584416, 845, 0, 2382, 148880, 3114656, 3}
>>        namesize = 9216
>>        datasize = 9216
>>        dest = Cannot access memory at address 0x1020
>> (gdb)                                                                                   
>> 
>> 
>> 
>> Here a couple of ( probably ) interesting stuff :
>> 
>> root # grep "signal 10" /usr/local/dovecot/var/log/dovecot.log  |wc
>>      33     284    2308
>> root # grep "signal 11" /usr/local/dovecot/var/log/dovecot.log  |wc
>>      69     549    4457
>> root # grep "Lock obtain timed out" /usr/local/dovecot/var/log/dovecot.log  |wc
>>      74     420    3976
>> 
>> Thus, 74 times of 102 seems to be tied to "IndexWriter() failed (#1): Lock obtain timed out" ( which I don't really understand,
>> or at least I have not enough comprehension of code to understand).
>> The path where lucene try to get a lock is not NFS, but is an automount on the same machine ( no NFS then ).
>> 
>> It seems that the "Lock obtain timed out" happens every time I read a message with the MUA.
>> 
>> This error ("Lock obtain") happens every time I read a message from my MUA, I think it's due to the indexing process running on READ
>> imap command:
>> 
>> Apr 08 15:18:13 indexer: Error: Indexer worker disconnected, discarding 1 requests for xxxx
>> Apr 08 15:18:13 imap(xxxx): Error: indexer failed to index mailbox INBOX
>> Apr 08 15:18:13 indexer-worker(xxxx): Fatal: master: service(indexer-worker): child 18171 killed with signal 11 (core not dumped)
>> Apr 08 15:18:15 indexer-worker(xxxx): Error: lucene index /home/xxxx/Maildir/lucene-indexes: IndexWriter() failed (#1): Lock obtain timed out
>> 
>> 
>> When searching, instead, get the "Lock timeout" followed by another error:
>> 
>> Apr 08 15:11:37 indexer: Error: Indexer worker disconnected, discarding 1 requests for xxxx
>> Apr 08 15:11:37 imap(xxxx): Error: indexer failed to index mailbox INBOX
>> Apr 08 15:11:37 indexer-worker(xxxx): Fatal: master: service(indexer-worker): child 17438 killed with signal 10 (core not dumped)
>> Apr 08 15:11:38 indexer-worker(xxxx): Error: lucene index /home/xxxx/Maildir/lucene-indexes: IndexWriter() failed (#1): Lock obtain timed out
>> Apr 08 15:11:51 indexer: Error: Indexer worker disconnected, discarding 1 requests for xxxx
>> Apr 08 15:11:51 imap(xxxx): Error: indexer failed to index mailbox INBOX
>> 
>> 
>> 
>> In both case the signal ( 11 or 10, SEGV or SIGBUS ) it's indifferent ( sometimes is 10 sometimes is 11 ).
>> 
>> Hope I've been able to explain, as my english seems to get worst every time ;)
>> 
>> Pigi
>> 
>> 
> 



More information about the dovecot mailing list