[Dovecot] Synchronization error in NFS

Andy YB Hu ybhu at hk1.ibm.com
Tue Feb 7 08:26:43 EET 2012




Hi there,

Not sure whether it's a bug or env setting issue.

I am running some concurrent testings under NFS.

The dovecot (version 2.0.13) are deployed in 2 servers (same setup), whose
maildir are on the file server via NFS.  Here are the output of dovecot -n:
[root at MyMachine src]$dovecot -n
# 2.0.13: /usr/local/etc/dovecot/dovecot.conf
# OS: Linux 2.6.18-274.3.1.el5 i686 Red Hat Enterprise Linux Server release
5.7 (Tikanga) nfs
auth_anonymous_username = andy
auth_debug = yes
auth_debug_passwords = yes
auth_mechanisms = anonymous plain
auth_verbose = yes
auth_verbose_passwords = plain
disable_plaintext_auth = no
lock_method = dotlock
log_path = /tmp/log
mail_debug = yes
mail_fsync = always
mail_gid = andy
mail_location = maildir:/tmp/NFS
mail_nfs_index = yes
mail_nfs_storage = yes
mail_uid = andy
mmap_disable = yes
passdb {
  driver = pam
}
ssl = no
userdb {
  driver = passwd
}

Here are what I am doing:   One session running loop of COPY commands
(while(1) COPY...) connects to one dovecot server;  The other session
running loop of SELECT commands (while(1) SELECT...) connects to the other
dovecot server. Both are accessing the same mail box (/tmp/NFS);

After some while (not accurate duration, maybe 2 seconds, or 1 min), I
found the number of EXISTS returned from SELECT command is not correct(less
than the real number). Then I stop the both sessions.  In the /tmp/log:
Feb 07 03:44:59 imap(andy): Error: Corrupted transaction log
file /tmp/NFS/dovecot.index.log seq 2: Unexpected garbage at EOF
(sync_offset=2204)
Feb 07 03:44:59 imap(andy): Error: Index /tmp/NFS/dovecot.index: Lost log
for seq=2 offset=2204
Feb 07 03:44:59 imap(andy): Warning: fscking index
file /tmp/NFS/dovecot.index

Then I tried to dump the index, sometime it's failed because of the index
corruption.  If not, it says:
[root at MyMachine src]$doveadm dump /tmp/NFS > /tmp/dump ; vi /tmp/dump
doveadm(root): Error: Log synchronization error at seq=2,offset=744
for /tmp/NFS/dovecot.index: Broken extension introduction: Headersize too
large (2273345664)
doveadm(root): Warning: fscking index file /tmp/NFS/dovecot.index
doveadm(root): Error: fcntl(write-lock) locking failed for
file /tmp/NFS/dovecot.index.log: Bad file descriptor
doveadm(root): Error: mail_index_wait_lock_fd() failed with
file /tmp/NFS/dovecot.index.log: Bad file descriptor
doveadm(root): Error: Log synchronization error at seq=2,offset=744
for /tmp/NFS/dovecot.index: Broken extension introduction: Headersize too
large (2273345664)
doveadm(root): Warning: fscking index file /tmp/NFS/dovecot.index
doveadm(root): Error: fcntl(write-lock) locking failed for
file /tmp/NFS/dovecot.index.log: Bad file descriptor
doveadm(root): Error: mail_index_wait_lock_fd() failed with
file /tmp/NFS/dovecot.index.log: Bad file descriptor

And sometime from the dump output, in the RECORD part:
 -- RECORDS: 5
 RECORD: seq=1, uid=1, flags=0x00

RECORD: seq=2, uid=2, flags=0x00

RECORD: seq=3, uid=3, flags=0x00

RECORD: seq=4, uid=4, flags=0x00

RECORD: seq=5, uid=6, flags=0x00

The uid 5 is missed. but in uidlist file, it's there..

Here are all what I found. If you need additional information, pls let me
know.

The clock on the 3 machines are synchronized.

You can also reproduce it if the 2 sessions are APPEND and SELECT.

If both sessions are running towards the same dovecot server, even the
maildir are on the NFS, it works very well without any error.


More information about the dovecot mailing list