Solr - complete setup (update)

Joan Moreau jom at grosjo.net
Fri Jan 4 06:36:14 EET 2019


Hi 

This is the summary of my work with SOLR-Dovecot, in my QUEST TO
REPRODUCE THE PREVIOULSY EXCELLENT WORK OF FTS_SQUAT 

@Aki : Based on the time I have spent on this, I would love to see you
updating the Wiki with those improvements, and adding my name somewhere 

@All : Hope it helps 

- INSTALLATION: 

-> Create a clean install using the default, (at least in the Archlinux
package), and do a "sudo -u solr solr create -c dovecot ". The config
files are then in /opt/solr/server/solr/dovecot/conf and datafiles in
/opt/solr/server/solr/dovecot/data 

-> In /opt/solr/server/solr/dovecot/conf/solrconfig.xml: 

     * around line 313, change <openSearcher>false</openSearcher> to
<openSearcher>true</openSearcher> 

     * around line 147, set <writeLockTimeout>2000</writeLockTimeout>
(or above) 

     * around line 696 : uncomment <str name="df">hdr</str> 

     * around line 1127, before <updateProcessor
class="solr.UUIDUpdateProcessorFactory" name="uuid"/>, add
<schemaFactory class="ClassicIndexSchemaFactory"></schemaFactory> 

     * around line 1161, delete the whole <updateProcessor
class="solr.AddSchemaFieldsUpdateProcessorFactory"
name="add-schema-fields"> 

    * around line 1192, remove the whole <updateRequestProcessorChain
name="add-unknown-fields-to-the-schema" ... /> 

-> Remove /opt/solr/server/solr/dovecot/conf/managed-schema 

-> Change "schema.xml" by the one below to reproduce fts_squat behavior 
(equivalent to " fts_squat = partial=3 full=25" in dovecot.conf) (note :
such a huge trouble to replace a single line setup, anyway...) 

-> Move /opt/solr/server/solr (or the subfolder data) to a partition
with *space*, ideally ext4 or faster file system (it looks like Solr is
not considering using a simple mysql database, which would make sense to
avoid all the fuzz and let it transit to a non-java state, but that is
another story) 

-> Config of dovecot.conf is as below 

-> The systemd unit shall specify high ulimit for files and proc (see
below) 

-> Increase the memory available for the JavaVM (I put 12Gb as I have
quite a space on my server, but you may adapt it as per your specs) : in
/opt/solr/bin/solr.in.sh, set SOLR_HEAP="12288m" 

-> As Solr is complaining a lot, you may consider a filter for it in
your syslog-ng or journald as it pollutes greatly your audit files 

-> (re)Start solr (first) and dovecot by systemctl 

-> Launch redindex ( doveadm fts rescan -u <username> ) 

-> wait for a big while to let the system re-index all your mail boxes 

- BUGS SO FAR 

-> Line 620 of fts_solr dovecot plugin : the size oof header is
improperly calculated ("huge header" warning for a simple email, which
kilss the index of that considered email, so basically MOST emails as
the calculation is wrong) 

-> The UID returned by SOlr is to be considered as a STRING (and that is
maybe the source of problem of the "out of bound" errors in fts_solr
dovecot, as "long" is not enough) 

-> Java errors : A lot of non sense for me, I am not expert in Java.
But, with increased memory, it seems not crashing, even if complaining
quite a lot in the logs 

-------SCHEMA.XML IN /OPT/SOLR/SERVER/SOLR/DOVECOT/CONF 

<?xml version="1.0" encoding="UTF-8"?>
<schema name="dovecot" version="2.0">
<uniqueKey>id</uniqueKey>
<fieldType name="dovecottext" class="solr.TextField"
autoGeneratePhraseQueries="true" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1"
generateNumberParts="1" splitOnCaseChange="1" generateWordParts="1"
splitOnNumerics="1" catenateAll="1" catenateWords="1"
preserveOriginal="1"/>
<filter class="solr.FlattenGraphFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="dovecotfield" class="solr.TextField"
autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="3"
maxGramSize="25"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType> 

<fieldType name="string" class="solr.StrField"/>
<field name="_version_" type="string" indexed="true" stored="true"/>
<field name="bcc" type="string" indexed="false" stored="false"/>
<field name="body" type="dovecottext" indexed="true" stored="false"/>
<field name="box" type="string" indexed="true" required="true"
stored="true"/>
<field name="cc" type="dovecotfield" indexed="true" stored="false"/>
<field name="from" type="dovecotfield" indexed="true" stored="false"/>
<field name="hdr" type="string" indexed="false" stored="false"/>
<field name="id" type="string" indexed="true" required="true"
stored="true"/>
<field name="subject" type="dovecottext" indexed="true" stored="false"/>
<field name="to" type="dovecotfield" indexed="true" stored="false"/>
<field name="uid" type="string" indexed="true" required="true"
stored="true"/>
<field name="user" type="string" indexed="true" required="true"
stored="true"/>
</schema> 

-- DOVECOT.CONF 

mail_plugins = fts fts_solr 

plugin {
plugin = fts fts_solr managesieve sieve 

fts = solr
fts_autoindex = yes
fts_enforced = yes
fts_solr = url=http://127.0.0.1:8983/solr/dovecot/ 

(replace 127.0.0.1 by your solr server if you want to use an external
server)
(...) 

} 

-- /ETC/SYSTEMD/SYSTEM/MULTI-USER.TARGET.WANTS/SOLR.SERVICE 

[Unit]
Description=Solr full text search engine
After=network.target 

[Service]
Type=simple
User=solr
Group=solr
PrivateTmp=yes
WorkingDirectory=/opt/solr
LIMITNOFILE=65000
LIMITNPROC=65000
ExecStart=/opt/solr/bin/solr start -f 

[Install]
WantedBy=multi-user.target
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://dovecot.org/pipermail/dovecot/attachments/20190104/d2a25b49/attachment.html>


More information about the dovecot mailing list