A few days back, I sent an overview of this problem, but received no responses. Since then, I have run dozens of traces to isolate the problem, difficult because there are timing issues involved. I have finally nailed it down. If this is not the proper place to report such bugs or if someone knows that this bug has been fixed, please let me know. As I noted in my earlier post, we have been running Dovecot 2.2.10 with a pair of CentOS 7 boxes with replications for the past year. We have been quite happy with the performance and reliability.
Recently we received a report that emails could reappear in the INBOX after being deleted. After running a pile of traces, I determined that the problem was strangely related to replications. For the purposes of this discussion, I will refer to the two symmetric replicating servers as A and B. Further, let us assume that during "normal" operation, all the emails are delivered to A via SMTP and are replicated to B. Under those assumptions, if the IMAP user connects to A (where the messages were originally delivered), there is no problem, at least no problem I was able to find. The problem I am describing only arises if the IMAP user connects to B. Connecting to B has never presented any other problems that I am aware of.
The test for which I have provided the trace starts with a test mailbox containing only 3 unread messages in the INBOX. Moving 1 of the unread messages to Trash is all that is needed to reproduce the problem. Remember this is ONLY a problem if the IMAP sessions do not connect to the server to which the messages were originally delivered. Also, I found that there is a timing window. The critical IMAP commands are:
UID STORE xxx +FLAGS.SILENT (\Seen) UID MOVE xxx Trash
If you introduce a large enough delay (I arbitrarily chose 5 seconds) between those two commands, there is no problem. Presumably this has to do with the two boxes syncing up some critical data structure.
What mailbox format do you use? Are you able to reproduce this by running doveadm sync commands manually instead of letting replication do it? For example:
- doveadm sync -s "" -d -u user@domain > state
- Run the UID STORE & UID MOVE
- doveadm sync -s "
cat state
" -d -u user@domainThere have been some fixes, especially recentlyhttps://github.com/dovecot/core/commit/950a6e61d6c2dac961ce031bdd8b2895bc32b... sounds a bit similar although I don't really see how it would apply here. Would be a good idea to try anyway with v2.2.22.rc1 (which seems to be stable enough that I'll make v2.2.22 release soon).
Anyway, I attempted a few times to reproduce it with your test but wasn't able to. I was out when you were kind enough to reply. To answer your question, we are using Maildir format. The trace I provided was based upon IMAP interactions with Roundcube (though the problem was reproducable with several mail clients). I left in a few more steps to make the trace look less contrived. However, I reduced it further to just a couple of connection sessions. What I found in that exercise that was not apparent to me in my prior posting was that the "STATUS INBOX" command that ultimately reveals the problem (it shows the message reappearing) only becomes "wrong" when it is done in a subsequent session. That is, even if I inject an artificially large delay after the "UID STORE" / "UID MOVE" commands before the "STATUS INBOX" command in the same session, that result is never "wrong". But, as soon as I open a subsequent IMAP session, the "STATUS INBOX" command then shows the problematic results. I have never dug into the Dovecot code base, but I assume this relates to how the session data is cached and how the replications update it. None of this is relevant if the problem has already been fixed, so I will endeavor to set up a couple of test boxes with the current version to verify. The link you provided does look quite hopeful. Thanks so much.