[Dovecot] Solr FTS issues (Was fts-solr plugin issue (Marked invalid))
Dror Matalon
dror at zapatec.com
Wed May 6 23:19:49 EEST 2009
Hi,
Sorry for the change of thread, I just signed up to the list so I couldn't reply to the earlier message.
Let me clarify the issue that Nikolai was describing.
We're running dovecot 1.1.11 and solr 1.4.
The issue is quite simple.
1. I run a search.
2. Dovecot sends a list of emails to solr
3. Solr starts indexing them
4. Solr runs into a "bad" email and we get: SEVERE: java.io.IOException: Mark invalid
org.apache.solr.common.SolrException log
SEVERE: java.io.IOException: Mark invalid
at java.io.BufferedReader.reset(BufferedReader.java:485)
at org.apache.solr.analysis.HTMLStripReader.restoreState(HTMLStripReader.java:171)
at org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:728)
at org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:742)
at java.io.Reader.read(Reader.java:123)
at org.apache.lucene.analysis.CharTokenizer.next(CharTokenizer.java:109)
at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:159)
at org.apache.lucene.index.DocFieldConsumersPerField.processFields(DocFieldConsumersPerField.java:36)
at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:234)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:765)
...
5. All the above email messages that were indexed are rolled back, and we're stuck.
I think that the solution should be simple too :-).
When solr runs into a bad email, it should just ignore it and keep indexing. This seems much more robust since the emails are coming from a variety of sources, and we can assume that some of them are going to generate badly formated emails. Seems like having a few bad emails not indexed is much better than the current situation of a bad email stopping all searching.
I don't understand the architecture enough between fts, fts_solr and solr to know where this should be solved. Ideally, it would be a simple directive to solr.
Regards,
Dror
-----
Dror Matalon
President
Zapatec Inc
866 522-7941 X 704
1700 MLK Way
Berkeley, CA 94709
http://www.6zap.com
http://twitter.com/drormata
http://www.zapatec.com
More information about the dovecot
mailing list