Strange indexing behavior on HTML emails ..
Akash
akbwiz+dovecot at gmail.com
Wed Oct 14 10:20:09 UTC 2015
Hi,
In continuation to the issue I had posted about long back:
http://www.dovecot.org/list/dovecot/2014-August/097362.html
I did further testing today on a fresh new Debian & latest Dovecot and
observed a undesired behavior. I am using fts_lucene & following
sequence of commands on a empty test account me at myself.com:
doveadm expunge -u 'my at myself.com' mailbox 'INBOX' all
cat test.eml | /usr/lib/dovecot/dovecot-lda -e -f you at yourself.com -d
me at myself.com
doveadm search -u 'akash at ' mailbox 'INBOX' text ABCD
Search command does or doesn't find the email with slight variation in
the content of test.eml. Here are the results:
test.eml content:
-----------------------------
From: you at yourself.com
To: me at myself.com
Subject: Test Message
Content-Type: text/html
<div id="mydiv">ABCD 1234</div>
-----------------------------
RESULT: OK. The email is found.
test.eml content (double quotes inside div tag replaced with single):
-----------------------------
From: you at yourself.com
To: me at myself.com
Subject: Test Message
Content-Type: text/html
<div id='mydiv'>ABCD 1234</div>
-----------------------------
RESULT: None. The email isn't found.
test.eml content (single quotes in div but content/type header removed):
-----------------------------
From: you at yourself.com
To: me at myself.com
Subject: Test Message
<div id='mydiv'>ABCD 1234</div>
-----------------------------
RESULT: OK. The email is found.
What could be the reason for this?
-Akash
More information about the dovecot
mailing list