On 5.2.2013, at 15.58, Valery V. Sedletski valerius@afterlogic.com wrote:
Hi, Timo and all!
I am trying to index mail in a test mailbox using fts_solr plugin for full-text search. On most mailboxes, it works fine, but on some big messages I get warnings like the following, and then I get an Out of memory error from Solr, then the indexer-worker process (or doveadm) crashes with "assertion failed" error and the backtrace:
========================================================== doveadm(valerius@test.afterlogic.com): Warning: fts-solr(valerius@test.afterlogic.com): Mailbox gmail.com UID=48 header size is huge
I'm not sure why Solr would become out of memory. If it handles huge message bodies then I don't really see why it couldn't handle huge headers..
doveadm(valerius@test.afterlogic.com): Panic: file ../../../../src/plugins/fts-solr/solr-connection.c: line 548 (solr_connection_post_more): assertion failed: (maxfd >= 0)
This is hopefully fixed by v2.2, which uses its own lib-http instead of libcurl (which I'm apparently not using correctly).
So, it seems that Dovecot tries to parse messages in the mailbox, and can't correctly determine where the message header ends. So, it thinks that the message header is big, and passes very big data to Solr. When trying to index it, Solr exhausts the available memory (though, I have 8 Gb of RAM on my machine, and java eats more than 2 Gb when indexing). Then connections to Solr get closed, and maxfd is invalid, hence the assertion is failed.
Note also the following error
========================================================== SEVERE: org.apache.solr.common.SolrException: undefined field text
before an out of memory error.
I don't know about that one.
I also tried to tweak the decode2text.sh script to ignore all attachments bigger than 1 Mb (just test if the file is bigger than 1 Mb, and if so, return "1"). This won't help. As I understood, this is because of big header, so attachments doesn't matter.
Yes.
I separated the set of messages which cause this error (by their UID's). So, I can give them as a testcase, the size of them all in archive is about 40 Mb. The error can be reproduced if put all these messages into an empty mailbox, and do reindexing, via IMAP search, or via "doveadm index -u ".
Is it really a message with huge header? Also MIME headers are counted as headers.
Anyway, http://hg.dovecot.org/dovecot-2.1/rev/0a932ba1f01f hopefully helps?