On 27.11.2012, at 17.38, Daniel L. Miller wrote:
On 11/27/2012 7:28 AM, Daniel L. Miller wrote:
On 11/26/2012 10:08 PM, Timo Sirainen wrote:
On 27.11.2012, at 7.50, Timo Sirainen wrote:
Nov 26, 2012 8:49:29 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Illegal character ((CTRL-CHAR, code 8)) at [row,col {unknown-source}]: [1011144,197790] Something's wrong. The Solr code was already supposed to catch all of these.
I was taking a brief scan of the code - and as usual I'm probably wrong - but I believe the protection comes from the xml_encode functions. Could it be that there are some solr writes that don't go through that function - because it is assumed that the data in question doesn't need that processing? Like mailbox names, field names, or uids - that SHOULDN'T have any garbage but maybe something is creeping in?
I did go through the code looking for that a few times already but didn't notice anything. I went through it once more, and finally found the problem. :) http://hg.dovecot.org/dovecot-2.1/rev/6a97faf3e500