[Dovecot] Solr FTS issues (Was fts-solr plugin issue (Marked invalid))

Dror Matalon dror at zapatec.com
Wed May 6 23:19:49 EEST 2009


Hi,

Sorry for the change of thread, I just signed up to the list so I couldn't reply to the earlier message. 
Let me clarify the issue that Nikolai was describing.

We're running dovecot 1.1.11 and solr 1.4. 
The issue is quite simple. 
1. I run a search.
2. Dovecot sends a list of emails to solr
3. Solr starts indexing them
4. Solr runs into a "bad" email and we get: SEVERE: java.io.IOException: Mark invalid
        org.apache.solr.common.SolrException log
        SEVERE: java.io.IOException: Mark invalid
        at java.io.BufferedReader.reset(BufferedReader.java:485)
        at org.apache.solr.analysis.HTMLStripReader.restoreState(HTMLStripReader.java:171)
        at org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:728)
        at org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:742)
        at java.io.Reader.read(Reader.java:123)
        at org.apache.lucene.analysis.CharTokenizer.next(CharTokenizer.java:109)
        at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:159)
        at org.apache.lucene.index.DocFieldConsumersPerField.processFields(DocFieldConsumersPerField.java:36)
        at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:234)
        at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:765)
				...
5. All the above email messages that were indexed are rolled back, and we're stuck.

I think that the solution should be simple too :-).
When solr runs into a bad email, it should just ignore it and keep indexing. This seems much more robust since the emails are coming from a variety of sources, and we can assume that some of them are going to generate badly formated emails. Seems like having a few bad emails not indexed is much better than the current situation of a bad email stopping all searching. 

I don't understand the architecture enough between fts, fts_solr and solr to know where this should be solved. Ideally, it would be a simple directive to solr. 


Regards,

Dror
 

-----
Dror Matalon
President
Zapatec Inc

866 522-7941 X 704
1700 MLK Way
Berkeley, CA 94709
http://www.6zap.com
http://twitter.com/drormata
http://www.zapatec.com


More information about the dovecot mailing list