On 2019-01-30 07:33, Stephan Bosch wrote:
(forgot to CC mailing list)
Op 26/01/2019 om 20:07 schreef Joan Moreau via dovecot:
*- Bugs so far*
-> Line 620 of fts_solr dovecot plugin : the size oof header is improperly calculated ("huge header" warning for a simple email, which kilss the index of that considered email, so basically MOST emails as the calculation is wrong) *You can check that regularly in dovecot log file. My guess is the mix of Unicode which is not properly addressed here.*
Does this happen with specific messages? Do you have a sample message for me? I don't see how Unicode could cause this.
MY ONLY GUESS IS THAT IT REFERS TO SOME 'STRLEN', WHICH IS WRONG OF COURSE IN CASE OF UNICODE EMAILS. THIS IS JUST A GUESS.
BUT DO A GREP FOR "HUGE" IN THE DOVECOT LOG OF A BUSY SERVER TO FIND EXAMPLES.
(SORRY, I SWITCHED TO XAPIAN, AS SOLR IS CREATING TOO MUCH TROUBLES FOR MY SERVER, SO NO MORE CONCRETE EXAMPLE)
-> The UID returned by SOlr is to be considered as a STRING (and that is maybe the source of problem of the "out of bound" errors in fts_solr dovecot, as "long" is not enough) *This is just highly visible in Solr schema.xml. Swithcing it to "long" in schema.xml returns plenty of errors.*
I cannot reproduce this so far (see modified schema below). In a simple test I just get the desired results and no errors logged.
I got this with large mailboxes (where UID seems not acceptable for Solr ). The fault is not on Dovecot side but Solr, and the returned UID(s) for a search is garbage instead of a proper value -> Putting it as string solves this
-> Java errors : A lot of non sense for me, I am not expert in Java. But, with increased memory, it seems not crashing, even if complaining quite a lot in the logs
Can you elaborate on the errors you have seen so far? When do these happen? How can I reproduce them? *Honestly, I have no clue what the problems are. I just increased the memory of the JVM and the systems stopped crashing. Log files are huge anyway.*
What errors do you see? I see only INFO entries in my /var/solr/logs/solr.log. Looks like Solr is pretty verbose by default (lots of INFO output), but there must be a way to reduce that.
I DELETED SOLR. NO MORE LOGS. MAYBE SOMEONE ELSE CAN TELL.
<?xml version="1.0" encoding="UTF-8"?> <schema name="dovecot" version="2.0"> <uniqueKey>id</uniqueKey> <fieldType name="long" class="solr.LongPointField" positionIncrementGap="0"/> <fieldType name="dovecottext" class="solr.TextField" autoGeneratePhraseQueries="true" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.ClassicTokenizerFactory"/> <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="1" generateWordParts="1" splitOnNumerics="1" catenateAll="1" catenateWords="1" preserveOriginal="1"/> <filter class="solr.FlattenGraphFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.TrimFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.TrimFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> <fieldType name="dovecotfield" class="solr.TextField" autoGeneratePhraseQueries="true"> <analyzer type="index"> <tokenizer class="solr.ClassicTokenizerFactory"/> <filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="25"/> <filter class="solr.TrimFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.TrimFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType>
<fieldType name="string" class="solr.StrField"/> <field name="_version_" type="string" indexed="true" stored="true"/> <field name="bcc" type="string" indexed="false" stored="false"/> <field name="body" type="dovecottext" indexed="true" stored="false"/> <field name="box" type="string" indexed="true" required="true" stored="true"/> <field name="cc" type="dovecotfield" indexed="true" stored="false"/> <field name="from" type="dovecotfield" indexed="true" stored="false"/> <field name="hdr" type="string" indexed="false" stored="false"/> <field name="id" type="string" indexed="true" required="true" stored="true"/> <field name="subject" type="dovecottext" indexed="true" stored="false"/> <field name="to" type="dovecotfield" indexed="true" stored="false"/> <field name="uid" type="long" indexed="true" required="true" stored="true"/> <field name="user" type="string" indexed="true" required="true" stored="true"/> </schema>