Strange indexing behavior on HTML emails ..

Akash akbwiz+dovecot at gmail.com
Wed Oct 14 10:20:09 UTC 2015


Hi,

In continuation to the issue I had posted about long back:

http://www.dovecot.org/list/dovecot/2014-August/097362.html

I did further testing today on a fresh new Debian & latest Dovecot and 
observed a undesired behavior. I am using fts_lucene & following 
sequence of commands on a empty test account me at myself.com:

doveadm expunge -u 'my at myself.com' mailbox 'INBOX' all
cat test.eml | /usr/lib/dovecot/dovecot-lda -e -f you at yourself.com -d 
me at myself.com
doveadm search -u 'akash at ' mailbox 'INBOX' text ABCD

Search command does or doesn't find the email with slight variation in 
the content of test.eml. Here are the results:

test.eml content:
-----------------------------
 From: you at yourself.com
To: me at myself.com
Subject: Test Message
Content-Type: text/html

<div id="mydiv">ABCD 1234</div>
-----------------------------
RESULT: OK. The email is found.


test.eml content (double quotes inside div tag replaced with single):
-----------------------------
 From: you at yourself.com
To: me at myself.com
Subject: Test Message
Content-Type: text/html

<div id='mydiv'>ABCD 1234</div>
-----------------------------
RESULT: None. The email isn't found.


test.eml content (single quotes in div but content/type header removed):
-----------------------------
 From: you at yourself.com
To: me at myself.com
Subject: Test Message

<div id='mydiv'>ABCD 1234</div>
-----------------------------
RESULT: OK. The email is found.

What could be the reason for this?

-Akash




More information about the dovecot mailing list