Dovecot on GlusterFS via NFS is... strange.

Frank Wall fw at moov.de
Wed Feb 7 00:49:26 EET 2018


Hi,

I've discovered some interesting behaviour when running Dovecot
on a GlusterFS volume (3.10) that is mounted via NFS 4.1 (ganesha NFS).

First of all, using the usual settings for NFS storage seemed like
an obvious choice for my setup (Dovecot 2.2.33.2, CentOS 7.4):

    mail_nfs_storage = yes
    mail_nfs_index = yes
    mail_fsync = always
    maildir_copy_with_hardlinks = yes

    mmap_disable = yes
    lock_method = dotlock

(FWIW, I've been using these settings for many years on different
NFS storages without any issues. Another instance of Dovecot 2.2.33
on FreeBSD 11.1 with a more traditional NFS storage is rock-solid.)

SPOILER: These settings caused terrible issues.

For example, when trying to clone a mailbox from an old server
to this new Dovecot-on-GlusterFS setup (using imapsync), the following
errors would show up frequently:

Feb  6 16:44:53 dovecot: IMAP(user at example.com): Panic: file maildir-uidlist.c: line 1262 (maildir_uidlist_write_fd): assertion failed: (first_idx == 0)

Feb  6 16:44:53 dovecot: IMAP(user at example.com): Error: Raw backtrace: 
/usr/lib64/dovecot/libdovecot.so.0(+0x9f3de) [0x7f7b6b44f3de] 
-> /usr/lib64/dovecot/libdovecot.so.0(+0x9f4be) [0x7f7b6b44f4be] 
-> /usr/lib64/dovecot/libdovecot.so.0(i_fatal+0) [0x7f7b6b3e077c] 
-> /usr/lib64/dovecot/libdovecot-storage.so.0(+0x79605) [0x7f7b6b75b605] 
-> /usr/lib64/dovecot/libdovecot-storage.so.0(maildir_uidlist_sync_finish+0x1ac) [0x7f7b6b75d61c] 
-> /usr/lib64/dovecot/libdovecot-storage.so.0(maildir_uidlist_sync_deinit+0x98) [0x7f7b6b75da48] 
-> /usr/lib64/dovecot/libdovecot-storage.so.0(maildir_transaction_save_commit_pre+0x41f) [0x7f7b6b75483f] 
-> /usr/lib64/dovecot/libdovecot-storage.so.0(+0xcccd0) [0x7f7b6b7aecd0] 
-> /usr/lib64/dovecot/libdovecot-storage.so.0(mail_index_transaction_commit_full+0x9d) [0x7f7b6b7cb3cd] 
-> /usr/lib64/dovecot/libdovecot-storage.so.0(index_transaction_commit+0x107) [0x7f7b6b7af217] 
-> /usr/lib64/dovecot/lib10_quota_plugin.so(+0xe17c) [0x7f7b6a9ca17c] 
-> /usr/lib64/dovecot/lib01_acl_plugin.so(+0xda0a) [0x7f7b6abe0a0a] 
-> /usr/lib64/dovecot/libdovecot-storage.so.0(mailbox_transaction_commit_get_changes+0x51) [0x7f7b6b72c161] 
-> dovecot/imap [user at example.com 10.0.0.1 APPEND](+0xf255) [0x55ca8238f255] 
-> dovecot/imap [user at example.com 10.0.0.1 APPEND](command_exec+0x5c) [0x55ca8239cf0c] 
-> dovecot/imap [user at example.com 10.0.0.1 APPEND](+0xe771) [0x55ca8238e771] 
-> /usr/lib64/dovecot/libdovecot.so.0(io_loop_call_io+0x52) [0x7f7b6b464cd2] 
-> /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run_internal+0x10f) [0x7f7b6b4663bf] 
-> /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0x3c) [0x7f7b6b464d6c] 
-> /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x38) [0x7f7b6b464f28] 
-> /usr/lib64/dovecot/libdovecot.so.0(master_service_run+0x13) [0x7f7b6b3eafa3] 
-> dovecot/imap [user at example.com 10.0.0.1 APPEND](main+0x333) [0x55ca8238e2e3] 
-> /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f7b6b00ec05] 
-> dovecot/imap [user at example.com 10.0.0.1 APPEND](+0xe476) [0x55ca8238e476]

Feb  6 16:44:53 dovecot: IMAP(user at example.com): Fatal: master: service(imap): child 84782 killed with signal 6 (core dumps disabled)

Feb  6 16:44:56 dovecot: IMAP(user at example.com): Error: readdir(/data/mail/abc/default/.Foldername 01_ Foo 1234/new) failed: Unknown error 523

Yes, the folder name is "special" in this example. (I'm not sure if
this readdir() error is actually related to the other issue.)

So, this caused the imap process to die repeatedly and the user got
disconnected many many times. In this state the mail services were
not really usable.

Side note: The NFS share was mounted by using the most conservative
moint options I could think of:
vers=4.1,rsize=131072,wsize=131072,nosharecache,noac,nordirplus,proto=tcp

And if you're wondering: Yes, I'm using Dovecot director for IMAP 
and LMTP (currently running 3 VMs for Dovecot director and 3 VMs for
Dovecot IMAP).

Then the fun began... I've decided to change the following settings:

    mail_nfs_storage = no
    mail_nfs_index = no

Re-created the maildir and restarted the imapsync copy process.
Out of sudden the errors are gone! Now Dovecot is performing
pretty well on my GlusterFS volume.

Does this make any sense to anyone?


Ciao
- Frank


More information about the dovecot mailing list