[Bug] FTS double escaping

azurit at pobox.sk azurit at pobox.sk
Thu Apr 6 14:58:12 EEST 2017


Hi,

i'm trying to resolve few problems with indexing 'From' headers using  
FTS/Solr. I was tcpdumping the communication between Dovecot and  
Jetty/Solr and noticed that 'From' headers, which includes also  
sender's name, are double escaped. This is what was Dovecot sending to  
Solr:

</field><field name="from">Name Surname  
&lt;test at example.com&gt;</field></doc></add>

As you can see, characters < and > were escaped to < and > which  
were, again, escaped to &lt; and &gt;. This is doing problems  
while trying to index whole e-mail address, as Solr sees it as  
'<test at example.com>'.

I spend hours trying to figure out why i'm able to search in all parts  
of e-mail addresses but searching for full and exact e-mail address  
was successfull ONLY for messages which doesn't include sender's name  
in 'From' header. Finally, after i found this bug, this fixed all  
search problems:

<filter class="solr.PatternReplaceFilterFactory" pattern="&lt;"  
replacement=""/>
<filter class="solr.PatternReplaceFilterFactory" pattern="&gt;"  
replacement=""/>

I hope that, at least, this bug, reported by me, will be fixed. Thank you.

azur




More information about the dovecot mailing list