In the code, it says

#define SOLR_HEADER_MAX_SIZE (1024*1024)

and line 620

if (!ctx->truncate_header &&
str_len(ctx->cur_value) >= SOLR_HEADER_MAX_SIZE) {
/* a large header */


so this is a 1Mo header. This is of course completely wrong.


Maybe this is not the root cause of the errors of fts_solr, but maybe this will help





 


On 2019-01-02 18:00, Joan Moreau via dovecot wrote:

Another bug appearing today:

Jan 02 09:59:08 indexer-worker(jom@grosjo.net)<6777><MOgFATKLLFwPHAAA0thIag:oLJjJjKLLFx5GgAA0thIag>: Warning: fts-solr(jom@grosjo.net): Mailbox XXXXXX UID=121635 header size is huge, truncating

header of the said email has nothing of "huge"

 


On 2019-01-02 15:22, Joan Moreau via dovecot wrote:

Refinement of the schema.xml (below)


THis however does not solve the "no results" and "Out of range" errors in Dovecot and Solr


<?xml version="1.0" encoding="UTF-8"?>
<schema name="dovecot" version="2.0">
<uniqueKey>id</uniqueKey>
<fieldType name="booleans" class="solr.BoolField" sortMissingLast="true" multiValued="true"/>
<fieldType name="gjlong" class="solr.LongPointField" positionIncrementGap="0"/>
<fieldType name="gjtext" class="solr.TextField" autoGeneratePhraseQueries="true" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="1" generateWordParts="1" splitOnNumerics="1" catenateAll="1" catenateWords="1" preserveOriginal="1"/>
<filter class="solr.FlattenGraphFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
</fieldType>
<fieldType name="gjfield" class="solr.TextField" autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.NGramTokenizerFactory" maxGramSize="25" minGramSize="3" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
</fieldType>

<fieldType name="string" class="solr.StrField"/>
<field name="_version_" type="string" indexed="true" stored="true"/>
<field name="bcc" type="string" indexed="false" stored="false"/>
<field name="body" type="gjtext" indexed="true" stored="false"/>
<field name="box" type="string" indexed="true" required="true" stored="true"/>
<field name="cc" type="gjfield" indexed="true" stored="false"/>
<field name="from" type="gjfield" indexed="true" stored="false"/>
<field name="hdr" type="string" indexed="false" stored="false"/>
<field name="id" type="string" indexed="true" required="true" stored="true"/>
<field name="subject" type="gjtext" indexed="true" stored="false"/>
<field name="to" type="gjfield" indexed="true" stored="false"/>
<field name="uid" type="string" indexed="true" required="true" stored="true"/>
<field name="user" type="string" indexed="true" required="true" stored="true"/>
</schema>