[Dovecot] solr substring schema.xml

29 May 2011


      I'm trying a modified schema.xml with solr - it appears I now have
substring searches!
I took the schema.xml file shipped with Dovecot, and modified the text
field definition to be:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="15"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
Changing to the new "SnowballPorter" instead of the deprecated
"EnglishPorter" filters is probably minor - the magic is the
"NGramFilterFactory".  3 & 15 seemed reasonable defaults for the min/max
size to search on.
Daniel

[Dovecot] solr substring schema.xml

Daniel Miller

Changing to the new "SnowballPorter" instead of the deprecated "EnglishPorter" filters is probably minor - the magic is the "NGramFilterFactory". 3 & 15 seemed reasonable defaults for the min/max size to search on.