Hi,
Sorry for the change of thread, I just signed up to the list so I couldn't reply to the earlier message. Let me clarify the issue that Nikolai was describing.
We're running dovecot 1.1.11 and solr 1.4. The issue is quite simple.
- I run a search.
- Dovecot sends a list of emails to solr
- Solr starts indexing them
- Solr runs into a "bad" email and we get: SEVERE: java.io.IOException: Mark invalid org.apache.solr.common.SolrException log SEVERE: java.io.IOException: Mark invalid at java.io.BufferedReader.reset(BufferedReader.java:485) at org.apache.solr.analysis.HTMLStripReader.restoreState(HTMLStripReader.java:171) at org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:728) at org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:742) at java.io.Reader.read(Reader.java:123) at org.apache.lucene.analysis.CharTokenizer.next(CharTokenizer.java:109) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:159) at org.apache.lucene.index.DocFieldConsumersPerField.processFields(DocFieldConsumersPerField.java:36) at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:234) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:765) ...
- All the above email messages that were indexed are rolled back, and we're stuck.
I think that the solution should be simple too :-). When solr runs into a bad email, it should just ignore it and keep indexing. This seems much more robust since the emails are coming from a variety of sources, and we can assume that some of them are going to generate badly formated emails. Seems like having a few bad emails not indexed is much better than the current situation of a bad email stopping all searching.
I don't understand the architecture enough between fts, fts_solr and solr to know where this should be solved. Ideally, it would be a simple directive to solr.
Regards,
Dror
Dror Matalon President Zapatec Inc
866 522-7941 X 704 1700 MLK Way Berkeley, CA 94709 http://www.6zap.com http://twitter.com/drormata http://www.zapatec.com