Solr - complete setup (update)

Joan Moreau jom at grosjo.net
Fri Jan 18 18:28:03 EET 2019


Yes, the " -property update.autoCreateFields -value false " seems
interesting 

However, we smash the created schema just after 

On 2019-01-14 23:25, Stephan Bosch wrote:

> Op 14/01/2019 om 07:44 schreef Joan Moreau via dovecot: 
> 
>> Hi Stephan,
>> 
>> What's up with that ?
>> 
>> Thank you so much
> 
> Working on it, somewhat anyway.
> 
> BTW, did you see this ? :
> 
> """
> $ sudo -u solr /opt/solr/bin/solr create -c dovecot
> WARNING: Using _default configset with data driven schema functionality. NOT RECOMMENDED for production use.
> To turn off: bin/solr config -c dovecot -p 8983 -action set-user-property -property update.autoCreateFields -value false
> INFO  - 2019-01-14 23:19:56.831; org.apache.solr.util.configuration.SSLCredentialProviderFactory; Processing SSL Credential Provider chain: env;sysprop
> 
> Created new core 'dovecot'
> """
> 
> I'll be trying your steps first, but the mentioned command might at least get rid of some of the cruft in the default config file.
> 
> Regards,
> 
> Stephan.
> 
> On 2019-01-05 02:04, Stephan Bosch wrote:
> 
> Hi,
> 
> Op 04/01/2019 om 05:36 schreef Joan Moreau via dovecot: 
> Hi
> 
> This is the summary of my work with SOLR-Dovecot, in my *quest to reproduce the previoulsy excellent work of fts_squat*
> 
> @Aki : Based on the time I have spent on this, I would love to see you updating the Wiki with those improvements, and adding my name somewhere
> 
> @All : Hope it helps
> 
> I'll be going through the description below soon. I've recently independently installed fts-solr from scratch. Although this wasn't a flawless effort, I managed to get some basic indexing going. From this mail thread I understand that there are quite a few more problems than I've seen myself so far. Then again, I didn't perform extensive tests with actual searches.
> 
> Maybe we can turn all this into a test suite that we can run internally here at Dovecot. At the very least, the described Dovecot bugs need to be addressed and the wiki needs to be updated.
> 
> I'll get back to you.
> 
> Regards,
> 
> Stephan.
> 
> *- Installation:*
> 
> -> Create a clean install using the default, (at least in the Archlinux package), and do a "sudo -u solr solr create -c dovecot ". The config files are then in /opt/solr/server/solr/dovecot/conf and datafiles in /opt/solr/server/solr/dovecot/data
> 
> -> In /opt/solr/server/solr/dovecot/conf/solrconfig.xml:
> 
> * around line 313, change <openSearcher>false</openSearcher> to <openSearcher>true</openSearcher>
> 
> * around line 147, set <writeLockTimeout>2000</writeLockTimeout> (or above)
> 
> * around line 696 : uncomment <str name="df">hdr</str>
> 
> * around line 1127, before <updateProcessor class="solr.UUIDUpdateProcessorFactory" name="uuid"/>, add <schemaFactory class="ClassicIndexSchemaFactory"></schemaFactory>
> 
> * around line 1161, delete the whole <updateProcessor class="solr.AddSchemaFieldsUpdateProcessorFactory" name="add-schema-fields">
> 
> * around line 1192, remove the whole <updateRequestProcessorChain name="add-unknown-fields-to-the-schema" ... />
> 
> -> Remove /opt/solr/server/solr/dovecot/conf/managed-schema
> 
> -> Change "schema.xml" by the one below to reproduce fts_squat behavior  (equivalent to " fts_squat = partial=3 full=25" in dovecot.conf) (note : such a huge trouble to replace a single line setup, anyway...)
> 
> -> Move /opt/solr/server/solr (or the subfolder data) to a partition with *space*, ideally ext4 or faster file system (it looks like Solr is not considering using a simple mysql database, which would make sense to avoid all the fuzz and let it transit to a non-java state, but that is another story)
> 
> -> Config of dovecot.conf is as below
> 
> -> The systemd unit shall specify high ulimit for files and proc (see below)
> 
> -> Increase the memory available for the JavaVM (I put 12Gb as I have quite a space on my server, but you may adapt it as per your specs) : in /opt/solr/bin/solr.in.sh, set SOLR_HEAP="12288m"
> 
> -> As Solr is complaining a lot, you may consider a filter for it in your syslog-ng or journald as it pollutes greatly your audit files
> 
> -> (re)Start solr (first) and dovecot by systemctl
> 
> -> Launch redindex ( doveadm fts rescan -u <username> )
> 
> -> wait for a big while to let the system re-index all your mail boxes
> 
> *- Bugs so far*
> 
> -> Line 620 of fts_solr dovecot plugin : the size oof header is improperly calculated ("huge header" warning for a simple email, which kilss the index of that considered email, so basically MOST emails as the calculation is wrong)
> 
> -> The UID returned by SOlr is to be considered as a STRING (and that is maybe the source of problem of the "out of bound" errors in fts_solr dovecot, as "long" is not enough)
> 
> -> Java errors : A lot of non sense for me, I am not expert in Java. But, with increased memory, it seems not crashing, even if complaining quite a lot in the logs
> 
> *-------SCHEMA.XML in /opt/solr/server/solr/dovecot/conf*
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <schema name="dovecot" version="2.0">
> <uniqueKey>id</uniqueKey>
> <fieldType name="dovecottext" class="solr.TextField" autoGeneratePhraseQueries="true" positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.ClassicTokenizerFactory"/>
> <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="1" generateWordParts="1" splitOnNumerics="1" catenateAll="1" catenateWords="1" preserveOriginal="1"/>
> <filter class="solr.FlattenGraphFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.TrimFilterFactory"/>
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.TrimFilterFactory"/>
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> </analyzer>
> </fieldType>
> <fieldType name="dovecotfield" class="solr.TextField" autoGeneratePhraseQueries="true">
> <analyzer type="index">
> <tokenizer class="solr.ClassicTokenizerFactory"/>
> <filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="25"/>
> <filter class="solr.TrimFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.TrimFilterFactory"/>
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> </analyzer>
> </fieldType>
> 
> <fieldType name="string" class="solr.StrField"/>
> <field name="_version_" type="string" indexed="true" stored="true"/>
> <field name="bcc" type="string" indexed="false" stored="false"/>
> <field name="body" type="dovecottext" indexed="true" stored="false"/>
> <field name="box" type="string" indexed="true" required="true" stored="true"/>
> <field name="cc" type="dovecotfield" indexed="true" stored="false"/>
> <field name="from" type="dovecotfield" indexed="true" stored="false"/>
> <field name="hdr" type="string" indexed="false" stored="false"/>
> <field name="id" type="string" indexed="true" required="true" stored="true"/>
> <field name="subject" type="dovecottext" indexed="true" stored="false"/>
> <field name="to" type="dovecotfield" indexed="true" stored="false"/>
> <field name="uid" type="string" indexed="true" required="true" stored="true"/>
> <field name="user" type="string" indexed="true" required="true" stored="true"/>
> </schema>
> 
> *-- DOVECOT.CONF*
> 
> mail_plugins = fts fts_solr
> 
> plugin {
> plugin = fts fts_solr managesieve sieve
> 
> fts = solr
> fts_autoindex = yes
> fts_enforced = yes
> fts_solr = url=http://127.0.0.1:8983/solr/dovecot/
> 
> (replace 127.0.0.1 by your solr server if you want to use an external server)
> (...)
> 
> }
> 
> *-- /etc/systemd/system/multi-user.target.wants/solr.service*
> 
> [Unit]
> Description=Solr full text search engine
> After=network.target
> 
> [Service]
> Type=simple
> User=solr
> Group=solr
> PrivateTmp=yes
> WorkingDirectory=/opt/solr
> *LimitNOFILE=65000*
> *LimitNPROC=65000*
> ExecStart=/opt/solr/bin/solr start -f
> 
> [Install]
> WantedBy=multi-user.target
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://dovecot.org/pipermail/dovecot/attachments/20190118/054d2416/attachment.html>


More information about the dovecot mailing list