[Bug] FTS double escaping
Hi,
i'm trying to resolve few problems with indexing 'From' headers using
FTS/Solr. I was tcpdumping the communication between Dovecot and
Jetty/Solr and noticed that 'From' headers, which includes also
sender's name, are double escaped. This is what was Dovecot sending to
Solr:
</field><field name="from">Name Surname
<test@example.com></field></doc></add>
As you can see, characters < and > were escaped to < and > which
were, again, escaped to < and >. This is doing problems
while trying to index whole e-mail address, as Solr sees it as
'<test@example.com>'.
I spend hours trying to figure out why i'm able to search in all parts
of e-mail addresses but searching for full and exact e-mail address
was successfull ONLY for messages which doesn't include sender's name
in 'From' header. Finally, after i found this bug, this fixed all
search problems:
<filter class="solr.PatternReplaceFilterFactory" pattern="&lt;" replacement=""/> <filter class="solr.PatternReplaceFilterFactory" pattern="&gt;" replacement=""/>
I hope that, at least, this bug, reported by me, will be fixed. Thank you.
azur
On 06.04.2017 14:58, azurit@pobox.sk wrote:
Hi,
i'm trying to resolve few problems with indexing 'From' headers using FTS/Solr. I was tcpdumping the communication between Dovecot and Jetty/Solr and noticed that 'From' headers, which includes also sender's name, are double escaped. This is what was Dovecot sending to Solr:
</field><field name="from">Name Surname <test@example.com></field></doc></add>
As you can see, characters < and > were escaped to < and > which were, again, escaped to < and >. This is doing problems while trying to index whole e-mail address, as Solr sees it as '<test@example.com>'.
I spend hours trying to figure out why i'm able to search in all parts of e-mail addresses but searching for full and exact e-mail address was successfull ONLY for messages which doesn't include sender's name in 'From' header. Finally, after i found this bug, this fixed all search problems:
<filter class="solr.PatternReplaceFilterFactory" pattern="&lt;" replacement=""/> <filter class="solr.PatternReplaceFilterFactory" pattern="&gt;" replacement=""/>
I hope that, at least, this bug, reported by me, will be fixed. Thank you.
azur
Hi!
Which dovecot version was this?
Aki
Citát Aki Tuomi aki.tuomi@dovecot.fi:
On 06.04.2017 14:58, azurit@pobox.sk wrote:
Hi,
i'm trying to resolve few problems with indexing 'From' headers using FTS/Solr. I was tcpdumping the communication between Dovecot and Jetty/Solr and noticed that 'From' headers, which includes also sender's name, are double escaped. This is what was Dovecot sending to Solr:
</field><field name="from">Name Surname <test@example.com></field></doc></add>
As you can see, characters < and > were escaped to < and > which were, again, escaped to < and >. This is doing problems while trying to index whole e-mail address, as Solr sees it as '<test@example.com>'.
I spend hours trying to figure out why i'm able to search in all parts of e-mail addresses but searching for full and exact e-mail address was successfull ONLY for messages which doesn't include sender's name in 'From' header. Finally, after i found this bug, this fixed all search problems:
<filter class="solr.PatternReplaceFilterFactory" pattern="&lt;" replacement=""/> <filter class="solr.PatternReplaceFilterFactory" pattern="&gt;" replacement=""/>
I hope that, at least, this bug, reported by me, will be fixed. Thank you.
azur
Hi!
Which dovecot version was this?
Aki
Sorry, forgot to mention it, 2.2.27, Debian Jessie (backports), 64bit.
On 6 Apr 2017, at 14.58, azurit@pobox.sk wrote:
Hi,
i'm trying to resolve few problems with indexing 'From' headers using FTS/Solr. I was tcpdumping the communication between Dovecot and Jetty/Solr and noticed that 'From' headers, which includes also sender's name, are double escaped. This is what was Dovecot sending to Solr:
</field><field name="from">Name Surname <test@example.com></field></doc></add>
As you can see, characters < and > were escaped to < and > which were, again, escaped to < and >. This is doing problems while trying to index whole e-mail address, as Solr sees it as '<test@example.com>'.
I spend hours trying to figure out why i'm able to search in all parts of e-mail addresses but searching for full and exact e-mail address was successfull ONLY for messages which doesn't include sender's name in 'From' header. Finally, after i found this bug, this fixed all search problems:
<filter class="solr.PatternReplaceFilterFactory" pattern="&lt;" replacement=""/> <filter class="solr.PatternReplaceFilterFactory" pattern="&gt;" replacement=""/>
I hope that, at least, this bug, reported by me, will be fixed. Thank you.
The attached patch should also help.
Citát Timo Sirainen tss@iki.fi:
On 6 Apr 2017, at 14.58, azurit@pobox.sk wrote:
Hi,
i'm trying to resolve few problems with indexing 'From' headers
using FTS/Solr. I was tcpdumping the communication between Dovecot
and Jetty/Solr and noticed that 'From' headers, which includes also
sender's name, are double escaped. This is what was Dovecot sending
to Solr:</field><field name="from">Name Surname
<test@example.com></field></doc></add>As you can see, characters < and > were escaped to < and >
which were, again, escaped to < and >. This is doing
problems while trying to index whole e-mail address, as Solr sees
it as '<test@example.com>'.I spend hours trying to figure out why i'm able to search in all
parts of e-mail addresses but searching for full and exact e-mail
address was successfull ONLY for messages which doesn't include
sender's name in 'From' header. Finally, after i found this bug,
this fixed all search problems:<filter class="solr.PatternReplaceFilterFactory" pattern="&lt;" replacement=""/> <filter class="solr.PatternReplaceFilterFactory" pattern="&gt;" replacement=""/>
I hope that, at least, this bug, reported by me, will be fixed. Thank you.
The attached patch should also help.
Works fine, thank you!
Citát azurit@pobox.sk:
Citát Timo Sirainen tss@iki.fi:
On 6 Apr 2017, at 14.58, azurit@pobox.sk wrote:
Hi,
i'm trying to resolve few problems with indexing 'From' headers
using FTS/Solr. I was tcpdumping the communication between Dovecot
and Jetty/Solr and noticed that 'From' headers, which includes
also sender's name, are double escaped. This is what was Dovecot
sending to Solr:</field><field name="from">Name Surname
<test@example.com></field></doc></add>As you can see, characters < and > were escaped to < and >
which were, again, escaped to < and >. This is doing
problems while trying to index whole e-mail address, as Solr sees
it as '<test@example.com>'.I spend hours trying to figure out why i'm able to search in all
parts of e-mail addresses but searching for full and exact e-mail
address was successfull ONLY for messages which doesn't include
sender's name in 'From' header. Finally, after i found this bug,
this fixed all search problems:<filter class="solr.PatternReplaceFilterFactory" pattern="&lt;" replacement=""/> <filter class="solr.PatternReplaceFilterFactory" pattern="&gt;" replacement=""/>
I hope that, at least, this bug, reported by me, will be fixed. Thank you.
The attached patch should also help.
Works fine, thank you!
Will this fix gets into 2.2.29?
participants (3)
-
Aki Tuomi
-
azurit@pobox.sk
-
Timo Sirainen