Solr - complete setup (update)

Stephan Bosch stephan at rename-it.nl
Sat Jan 26 14:44:16 EET 2019


Hi Joan,

Op 14/01/2019 om 07:44 schreef Joan Moreau via dovecot:
>
> Hi Stephan,
>
> What's up with that ?
>
> Thank you so much
>
> On 2019-01-05 02:04, Stephan Bosch wrote:
>
>> Hi,
>>
>> Op 04/01/2019 om 05:36 schreef Joan Moreau via dovecot:
>>>
>>> Hi
>>>
>>> This is the summary of my work with SOLR-Dovecot, in my *quest to 
>>> reproduce the previoulsy excellent work of fts_squat*
>>>
>>>
>>> @Aki : Based on the time I have spent on this, I would love to see 
>>> you updating the Wiki with those improvements, and adding my name 
>>> somewhere
>>>
>>> @All : Hope it helps
>>>


>>
>>>
>>> *- Installation:*
>>>
>>> -> Create a clean install using the default, (at least in the 
>>> Archlinux package), and do a "sudo -u solr solr create -c dovecot ". 
>>> The config files are then in /opt/solr/server/solr/dovecot/conf and 
>>> datafiles in /opt/solr/server/solr/dovecot/data

On my system (Debian) these directories are wildly different (e.g. data 
is under /var), but other than that, this information is OK.

Used this as a side-reference for Debian installation: 
https://tecadmin.net/install-apache-solr-on-debian/

Accessed http://solr-host.tld:8983/solr/ to check whether all is OK.

>>>
>>> -> In /opt/solr/server/solr/dovecot/conf/solrconfig.xml:
>>>
>>>      * around line 313, change <openSearcher>false</openSearcher> to 
>>> <openSearcher>true</openSearcher>
>>>
>>>      * around line 147, set 
>>> <writeLockTimeout>2000</writeLockTimeout> (or above)
>>>
>>>      * around line 696 : uncomment <str name="df">hdr</str>
>>>
>>>      * around line 1127, before <updateProcessor 
>>> class="solr.UUIDUpdateProcessorFactory" name="uuid"/>, add 
>>> <schemaFactory class="ClassicIndexSchemaFactory"></schemaFactory>
>>>
>>>      * around line 1161, delete the whole <updateProcessor 
>>> class="solr.AddSchemaFieldsUpdateProcessorFactory" 
>>> name="add-schema-fields">
>>>
>>>     * around line 1192, remove the whole 
>>> <updateRequestProcessorChain name="add-unknown-fields-to-the-schema" 
>>> ... />

Applied these changes. We should probably provide an example config file 
on the Wiki that incorporates all this.. or maybe a diff.

We also need to evaluate what the merit of all of this is. I did 
something similar in my previous effort, but it was all based on getting 
an error from Solr and then removing that section of the config file 
with the assumption it wasn't needed. So far, I have little clue what 
these things are and why these things are enabled by default. As I said 
in an earlier mail, there is an option to leave some of this cruft out 
at backend initialization, but I haven't tried that yet.

>>>
>>> -> Remove /opt/solr/server/solr/dovecot/conf/managed-schema
>>>
>>> -> Change "schema.xml" by the one below to reproduce fts_squat 
>>> behavior  (equivalent to " fts_squat = partial=3 full=25" in 
>>> dovecot.conf) (note : such a huge trouble to replace a single line 
>>> setup, anyway...)

Did that too.

>>>
>>> -> Move /opt/solr/server/solr (or the subfolder data) to a partition 
>>> with *space*, ideally ext4 or faster file system (it looks like Solr 
>>> is not considering using a simple mysql database, which would make 
>>> sense to avoid all the fuzz and let it transit to a non-java state, 
>>> but that is another story)

Skipped that.
>>>
>>> -> Config of dovecot.conf is as below

I also enabled debug for fts_solr.

>>>
>>> -> The systemd unit shall specify high ulimit for files and proc 
>>> (see below)

Debian does something weird here. It doesn't use an explicit systemd 
unit. It is generated from the SysV init file. I ended up setting the 
ulimits in /etc/security/limits.conf for user solr.

>>>
>>> -> Increase the memory available for the JavaVM (I put 12Gb as I 
>>> have quite a space on my server, but you may adapt it as per your 
>>> specs) : in /opt/solr/bin/solr.in.sh, set SOLR_HEAP="12288m"

Skipped that.

>>>
>>> -> As Solr is complaining a lot, you may consider a filter for it in 
>>> your syslog-ng or journald as it pollutes greatly your audit files

What does it complain about and when does it happen? I haven't seen much 
logging from Solr so far.

>>>
>>> -> (re)Start solr (first) and dovecot by systemctl
>>>
>>> -> Launch redindex ( doveadm fts rescan -u <username> )
>>>
>>> -> wait for a big while to let the system re-index all your mail boxes

Weirdly, rescan returns immediately here. When I perform `doveadm index 
INBOX` for my test user, I do see a lot of fts and HTTP activity.

>>> *- Bugs so far*
>>>
>>> -> Line 620 of fts_solr dovecot plugin : the size oof header is 
>>> improperly calculated ("huge header" warning for a simple email, 
>>> which kilss the index of that considered email, so basically MOST 
>>> emails as the calculation is wrong)
>>>
>>> -> The UID returned by SOlr is to be considered as a STRING (and 
>>> that is maybe the source of problem of the "out of bound" errors in 
>>> fts_solr dovecot, as "long" is not enough)
>>>
>>> -> Java errors : A lot of non sense for me, I am not expert in Java. 
>>> But, with increased memory, it seems not crashing, even if 
>>> complaining quite a lot in the logs

Can you elaborate on the errors you have seen so far? When do these 
happen? How can I reproduce them?

Regards,

Stephan.



>>>
>>> *-------SCHEMA.XML in /opt/solr/server/solr/dovecot/conf*
>>>
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <schema name="dovecot" version="2.0">
>>> <uniqueKey>id</uniqueKey>
>>> <fieldType name="dovecottext" class="solr.TextField" 
>>> autoGeneratePhraseQueries="true" positionIncrementGap="100">
>>> <analyzer type="index">
>>> <tokenizer class="solr.ClassicTokenizerFactory"/>
>>> <filter class="solr.WordDelimiterGraphFilterFactory" 
>>> catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="1" 
>>> generateWordParts="1" splitOnNumerics="1" catenateAll="1" 
>>> catenateWords="1" preserveOriginal="1"/>
>>> <filter class="solr.FlattenGraphFilterFactory"/>
>>> <filter class="solr.LowerCaseFilterFactory"/>
>>> <filter class="solr.TrimFilterFactory"/>
>>> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>> </analyzer>
>>> <analyzer type="query">
>>> <tokenizer class="solr.KeywordTokenizerFactory"/>
>>> <filter class="solr.LowerCaseFilterFactory"/>
>>> <filter class="solr.TrimFilterFactory"/>
>>> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>> </analyzer>
>>> </fieldType>
>>> <fieldType name="dovecotfield" class="solr.TextField" 
>>> autoGeneratePhraseQueries="true">
>>> <analyzer type="index">
>>> <tokenizer class="solr.ClassicTokenizerFactory"/>
>>> <filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="25"/>
>>> <filter class="solr.TrimFilterFactory"/>
>>> <filter class="solr.LowerCaseFilterFactory"/>
>>> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>> </analyzer>
>>> <analyzer type="query">
>>> <tokenizer class="solr.KeywordTokenizerFactory"/>
>>> <filter class="solr.LowerCaseFilterFactory"/>
>>> <filter class="solr.TrimFilterFactory"/>
>>> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>> </analyzer>
>>> </fieldType>
>>>
>>> <fieldType name="string" class="solr.StrField"/>
>>> <field name="_version_" type="string" indexed="true" stored="true"/>
>>> <field name="bcc" type="string" indexed="false" stored="false"/>
>>> <field name="body" type="dovecottext" indexed="true" stored="false"/>
>>> <field name="box" type="string" indexed="true" required="true" 
>>> stored="true"/>
>>> <field name="cc" type="dovecotfield" indexed="true" stored="false"/>
>>> <field name="from" type="dovecotfield" indexed="true" stored="false"/>
>>> <field name="hdr" type="string" indexed="false" stored="false"/>
>>> <field name="id" type="string" indexed="true" required="true" 
>>> stored="true"/>
>>> <field name="subject" type="dovecottext" indexed="true" stored="false"/>
>>> <field name="to" type="dovecotfield" indexed="true" stored="false"/>
>>> <field name="uid" type="string" indexed="true" required="true" 
>>> stored="true"/>
>>> <field name="user" type="string" indexed="true" required="true" 
>>> stored="true"/>
>>> </schema>
>>>
>>>
>>> *-- DOVECOT.CONF*
>>>
>>> mail_plugins = fts fts_solr
>>>
>>> plugin {
>>> plugin = fts fts_solr managesieve sieve
>>>
>>> fts = solr
>>> fts_autoindex = yes
>>> fts_enforced = yes
>>> fts_solr = url=http://127.0.0.1:8983/solr/dovecot/
>>>
>>> (replace 127.0.0.1 by your solr server if you want to use an 
>>> external server)
>>> (...)
>>>
>>> }
>>>
>>>
>>>
>>> *-- /etc/systemd/system/multi-user.target.wants/solr.service*
>>>
>>> [Unit]
>>> Description=Solr full text search engine
>>> After=network.target
>>>
>>> [Service]
>>> Type=simple
>>> User=solr
>>> Group=solr
>>> PrivateTmp=yes
>>> WorkingDirectory=/opt/solr
>>> *LimitNOFILE=65000*
>>> *LimitNPROC=65000*
>>> ExecStart=/opt/solr/bin/solr start -f
>>>
>>> [Install]
>>> WantedBy=multi-user.target
>>>
>>>



More information about the dovecot mailing list