Re: [Dovecot] Question about Dovecot and NFS
On Wed, 01 Feb 2006 11:45:16 +0200, Timo Sirainen <tss@iki.fi> wrote:
On Tue, 2006-01-31 at 22:42 -0800, Tony Kay wrote:
How does dovecot handle this kind of conflict. I.e. how would one dovecot IMAP on server A detect that dovecot on server B had rewritten something like flags on the mailbox they are both accessing? I'm sure it doesn't leave the dot lock sitting around, since that would block mail delivery.
It just checks if mbox's mtime has changed. If it has, it checks if there are new mails or if it needs to do some other synchronization (mbox_dirty_syncs / mbox_very_dirty_syncs causes it to delay it as long as possible).
UW-IMAP also checks if mtime has changed, but instead of trying to figure out what changed it just disconnects the client with an "unexpected mbox change" error.
UW uses a "kiss of death" signal for collaborative locking...the mtime changes are all assumed to be appends as far as I know. The problem (which I can reproduce) comes with a situation like this:
- IMAP host A and B access INBOX for the same user from different physical servers which share the inbox via NFS
- Neither sees the other (because UW imap assumes lingering opens come from c-client apps, and it uses signals/flock/tmp to sync them, which are not available when A and B are on separate hosts).
- Delete and expunge a message on B
- 'A' can still read the expunged message.
- Mark two messages for delete on A, including the one previously expunged by B
- New message delivered to inbox (possibly by yet another host)
- Expunge on A
- New message is lost
My theory is that the two IMAPs are both holding a fd open, and this causes one of the copies to become stale (in the sense it no longer has a filesystem name). The new mail is delivered to the non-stale box, and then the expunge on 'A' causes the stale open file to be "rewritten" over the top of the non-stale one. Result: mail loss. We also see things like message flags that magically reset themselves, etc.
I did some testing w/dovecot 0.99 earlier in this same config, and it seems like it has some issues. For example a delete/expunge on IMAP A was "sort-of" seen by the B (fetch complained that the UIDs had changed, and would not work until a NOOP was done...this is technically OK), but B magically restored the deleted message with a new UID (which is not so good, since a client that caches stuff may now have a strange view of the world...I need to examine the set of "update" notices that went along with these events to better understand the implications).
This also worries me because the "magic" reappearance indicates a copy from a stale fd, which could exhibit the same mail loss I produced w/UW IMAP.
I am planning on updating to your latest source tree tomorrow (I am out of time today), and see if I can make it "lose" new mail. Opinions???
Thanks!
Tony
On Wed, 2006-02-01 at 13:17 -0800, Tony Kay wrote:
On Wed, 01 Feb 2006 11:45:16 +0200, Timo Sirainen <tss@iki.fi> wrote:
On Tue, 2006-01-31 at 22:42 -0800, Tony Kay wrote:
How does dovecot handle this kind of conflict. I.e. how would one dovecot IMAP on server A detect that dovecot on server B had rewritten something like flags on the mailbox they are both accessing? I'm sure it doesn't leave the dot lock sitting around, since that would block mail delivery.
It just checks if mbox's mtime has changed. If it has, it checks if there are new mails or if it needs to do some other synchronization (mbox_dirty_syncs / mbox_very_dirty_syncs causes it to delay it as long as possible).
UW-IMAP also checks if mtime has changed, but instead of trying to figure out what changed it just disconnects the client with an "unexpected mbox change" error.
UW uses a "kiss of death" signal for collaborative locking...
Right, although I'm not sure if locking is a good word for something which simply tries to kill the other process if it sees it.
the mtime changes are all assumed to be appends as far as I know.
Yes, but it also checks has some sanity checks where it can notice if mails have been moved. I don't know how good those are, but I've seen them happen at least.
The problem (which I can reproduce) comes with a situation like this:
- IMAP host A and B access INBOX for the same user from different physical servers which share the inbox via NFS
- Neither sees the other (because UW imap assumes lingering opens come from c-client apps, and it uses signals/flock/tmp to sync them, which are not available when A and B are on separate hosts).
- Delete and expunge a message on B
- 'A' can still read the expunged message.
- Mark two messages for delete on A, including the one previously expunged by B
- New message delivered to inbox (possibly by yet another host)
- Expunge on A
- New message is lost
My theory is that the two IMAPs are both holding a fd open, and this causes one of the copies to become stale (in the sense it no longer has a filesystem name). The new mail is delivered to the non-stale box, and then the expunge on 'A' causes the stale open file to be "rewritten" over the top of the non-stale one. Result: mail loss. We also see things like message flags that magically reset themselves, etc.
UW-IMAP (and Dovecot) doesn't rewrite the existing file, only move the data inside it. So keeping the fd open shouldn't be an issue.
Maybe it's simply that UW-IMAP doesn't do good enough sanity checks to notice if mbox has changed under it.
As long as both hosts keep using dotlocking, Dovecot should be able to handle the above without any corruption problems.
I did some testing w/dovecot 0.99 earlier in this same config,
Forget those results and try with 1.0beta instead :)
participants (2)
-
Timo Sirainen
-
Tony Kay