[Dovecot] mmap in GFS2 on rhel 6.1
Hello list, we continue our tests using Dovecot on a RHEL 6.1 Cluster Backend with GFS2, also we are using dovecot as a Director for user node persistence, everything was ok until we started stress testing the solution with imaptest, we had many deadlocks, cluster filesystems corruptions and hangs, specially in index filesystem, we have configured the backend as if they were on a NFS like setup but this seems not to work at least on GFS2 on rhel 6.1. We have a two node cluster sharing two GFS2 filesystem
- Index GFS2 filesystem to store users indexes
- Mailbox data on a GFS2 filesystem
The specific configs for NFS or cluster filesystem we used:
mmap_disable = yes mail_fsync = always mail_nfs_storage = yes mail_nfs_index = yes fsync_disable=no lock_method = fcntl
mail location :
mail_location = mdbox:/var/vmail/%d/%3n/%n/mdbox:INDEX=/var/indexes/%d/%3n/%n
But this seems not to work for GFS2 even doing user node persistence, maillog is plagged of errors and GFS2 hangs on stress testing with imaptest, many corrupted index for example, transaction logs etc, at this point we have many questions, first mmap... In Redhat GFS2 docs we read: Gold rules for performance: An inode is used in a read only fashion across all nodes An inode is written or modified from a single node only.
We have succesfull archived this using dovecot director
Now, for mmap rh says:
... If you mmap() a file on GFS2 with a read/write mapping, but only read from it, this only counts as a read. On GFS though, it counts as a write, so GFS2 is much more scalable with mmap() I/O...
But in our config we are using mmap_disable=yes, do we have to use mmap_disable=no with GFS2???
Also, how dovecot manage the cache flush on GFS2 filesystem???
Why, if we are doing user node persistence, dovecot indexes gets corrupted???
What lock method do we have to use??
How fsync should be used??
We know we have many questions, but this is really a very complex stuff and we are going to appreciate any help you can give us.
Thank you all for a great work, specially Timo... best regards
On 6/10/2011 11:24 PM, Aliet Santiesteban Sifontes wrote:
Hello list, we continue our tests using Dovecot on a RHEL 6.1 Cluster Backend with GFS2, also we are using dovecot as a Director for user node persistence, everything was ok until we started stress testing the solution with imaptest, we had many deadlocks, cluster filesystems corruptions and hangs, specially in index filesystem, we have configured the backend as if they were on a NFS like setup but this seems not to work at least on GFS2 on rhel 6.1.
Actual _filesystem_ corruption is typically unrelated to user space applications. You should be looking at a lower level for the cause, i.e. kernel, device driver, hardware, etc. Please post details of your shared storage hardware environment, including HBAs, SAN array brand/type, if you're using GFS2 over DRBD, etc.
We have a two node cluster sharing two GFS2 filesystem
- Index GFS2 filesystem to store users indexes
- Mailbox data on a GFS2 filesystem
Experience of many users has shown that neither popular cluster filesystems such as GFS2/OCFS, nor NFS, handle high metadata/IOPS workloads very well, especially those that make heavy use of locking.
The specific configs for NFS or cluster filesystem we used:
mmap_disable = yes mail_fsync = always mail_nfs_storage = yes mail_nfs_index = yes fsync_disable=no lock_method = fcntl
mail location :
mail_location = mdbox:/var/vmail/%d/%3n/%n/mdbox:INDEX=/var/indexes/%d/%3n/%n
For a Dovecot cluster using shared storage, you are probably better off using a mailbox format for which indexes are independent of mailbox files and are automatically [re]generated if absent.
Try using mbox or maildir and store indexes on local node disk/SSD instead of on the cluster filesystem. Only store the mailboxes on the cluster filesystem. If for any reason a user login gets bumped to a node lacking the index files they're automatically rebuilt.
Since dbox indexes aren't automatically generated if missing you can't do what I describe above with dbox storage. Given the limitations of cluster filesystem (and NFS) metadata IOPS and locking, you'll likely achieve best performance and stability using local disk index files and mbox format mailboxes on GFS2. Maildir format works in this setup as well, but the metadata load on the cluster filesystem is much higher, and thus peak performance will typically be lower.
-- Stan
https://bugzilla.redhat.com/show_bug.cgi?id=712139
Furhter investigating this bug I have tested all kinds of config with dovecot, and all of them gets gfs2 hanged, I have tested this scenario with bare metal hardware cluster, with virtualized cluster guests in vmware esxi 4.1, with a cluster test in vmware workstation and I can reproduce the problem in all the tests, even in different enviroments, we are testing if dovecot can be deployed on a Redhat Cluster of Active-Active Nodes doing user session persistence. This was my last test, I simplify the scenario with a cluster in my own laptop:
1- Used a two node rhel 6.1 cluster, virtualized in VMWare Workstation. 2- Used two shared iscsi devices from a NAS. 3- Used fence_scsi.
Cluster.conf
<?xml version="1.0"?>
<cluster config_version="9" name="MailCluster"> <clusternodes> <clusternode name="node0.local" nodeid="1"> <fence> <method name="fn_mt_scsi"> <device name="fn_scsi"/> </method> </fence> <unfence> <device action="on" name="fn_scsi"/> </unfence> </clusternode> <clusternode name="node1.local" nodeid="2"> <fence> <method name="fn_mt_scsi"> <device name="fn_scsi"/> </method> </fence> <unfence> <device action="on" name="fn_scsi"/> </unfence> </clusternode> </clusternodes> <cman expected_votes="1" two_node="1"/> <fencedevices> <fencedevice agent="fence_scsi" logfile="/var/log/cluster/fence_scsi.log" name="fn_scsi"/> </fencedevices> </cluster>
4- Used the iscsi devices for the LVM stuff and created there the GFS2 filesystems.
fstab fragment
# GFS2 filesystem /dev/vg_indexes/lv_indexes /var/vmail/indexes gfs2 noatime,quota=off,errors=withdraw 0 0 /dev/vg_mailbox/lv_mailbox /var/vmail/mailbox gfs2 noatime,quota=off,errors=withdraw 0 0
5- Dovecot configured with users in ldap, in this case we tested the mbox mailbox format with fnctl and mmap_disable=yes, we have also tested all other mailboxes formats, indexes and mailboxes stored in gfs2 filesystems, here the conf:
[root@node0 ~]# dovecot -n # 2.0.9: /etc/dovecot/dovecot.conf # OS: Linux 2.6.32-131.2.1.el6.x86_64 x86_64 Red Hat Enterprise Linux Server release 6.1 (Santiago) gfs2 auth_default_realm = example.com auth_mechanisms = plain login auth_worker_max_count = 60 disable_plaintext_auth = no listen = * mail_fsync = always mail_gid = vmail mail_location = mbox:/var/vmail/mailbox/%d/%3n/%n:INDEX=/var/vmail/indexes/%d/%3n/%n mail_nfs_index = yes mail_nfs_storage = yes mail_uid = vmail mbox_write_locks = fcntl mmap_disable = yes passdb { args = /etc/dovecot/dovecot-ldap.conf.ext driver = ldap } ssl_cert =
6- started dovecot service in the nodes.
7- from another hosts tested with imaptest the first node:
imaptest host=192.168.164.95 userfile=userfile port=143 mbox=mail/dovecot-crlf no_tracking logout=0 clients=20 secs=30 seed=123
8- Repeated many times against that node. 9- Run the test against the second node: imaptest host=192.168.164.96 userfile=userfile port=143 mbox=mail/dovecot-crlf no_tracking logout=0 clients=20 secs=30 seed=123 10- First node hangs
GFS2: fsid=MailCluster:indexes.0: fatal: filesystem consistency error GFS2: fsid=MailCluster:indexes.0: inode = 468 525144 GFS2: fsid=MailCluster:indexes.0: function = gfs2_dinode_dealloc, file = fs/gfs2/inode.c, line = 352 GFS2: fsid=MailCluster:indexes.0: about to withdraw this file system GFS2: fsid=MailCluster:indexes.0: telling LM to unmount GFS2: fsid=MailCluster:indexes.0: withdrawn Pid: 3808, comm: delete_workqueu Not tainted 2.6.32-131.2.1.el6.x86_64 #1 Call Trace: [<ffffffffa064bfd2>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2] [<ffffffffa0621209>] ? trunc_dealloc+0xa9/0x130 [gfs2] [<ffffffffa064c1dd>] ? gfs2_consist_inode_i+0x5d/0x60 [gfs2] [<ffffffffa0631584>] ? gfs2_dinode_dealloc+0x64/0x210 [gfs2] [<ffffffffa064a1da>] ? gfs2_delete_inode+0x1ba/0x280 [gfs2] [<ffffffffa064a0ad>] ? gfs2_delete_inode+0x8d/0x280 [gfs2] [<ffffffffa064a020>] ? gfs2_delete_inode+0x0/0x280 [gfs2] [<ffffffff8118cfbe>] ? generic_delete_inode+0xde/0x1d0 [<ffffffffa062e940>] ? delete_work_func+0x0/0x80 [gfs2] [<ffffffff8118d115>] ? generic_drop_inode+0x65/0x80 [<ffffffffa0648c4e>] ? gfs2_drop_inode+0x2e/0x30 [gfs2] [<ffffffff8118bf82>] ? iput+0x62/0x70 [<ffffffffa062e994>] ? delete_work_func+0x54/0x80 [gfs2] [<ffffffff810887d0>] ? worker_thread+0x170/0x2a0 [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 [<ffffffff81088660>] ? worker_thread+0x0/0x2a0 [<ffffffff8108dd96>] ? kthread+0x96/0xa0 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 [<ffffffff8108dd00>] ? kthread+0x0/0xa0 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 no_formal_ino = 468 no_addr = 525144 i_disksize = 65536 blocks = 0 i_goal = 525170 i_diskflags = 0x00000000 i_height = 1 i_depth = 0 i_entries = 0 i_eattr = 0 GFS2: fsid=MailCluster:indexes.0: gfs2_delete_inode: -5
I I change to differents mailbox formats, they also hangs, only that messages in the kernel are little differents as the first post. any ideas??? Best regards
2011/6/11 Stan Hoeppner stan@hardwarefreak.com
On 6/10/2011 11:24 PM, Aliet Santiesteban Sifontes wrote:
Hello list, we continue our tests using Dovecot on a RHEL 6.1 Cluster Backend with GFS2, also we are using dovecot as a Director for user node persistence, everything was ok until we started stress testing the solution with imaptest, we had many deadlocks, cluster filesystems corruptions and hangs, specially in index filesystem, we have configured the backend as if they were on a NFS like setup but this seems not to work at least on GFS2 on rhel 6.1.
Actual _filesystem_ corruption is typically unrelated to user space applications. You should be looking at a lower level for the cause, i.e. kernel, device driver, hardware, etc. Please post details of your shared storage hardware environment, including HBAs, SAN array brand/type, if you're using GFS2 over DRBD, etc.
We have a two node cluster sharing two GFS2 filesystem
- Index GFS2 filesystem to store users indexes
- Mailbox data on a GFS2 filesystem
Experience of many users has shown that neither popular cluster filesystems such as GFS2/OCFS, nor NFS, handle high metadata/IOPS workloads very well, especially those that make heavy use of locking.
The specific configs for NFS or cluster filesystem we used:
mmap_disable = yes mail_fsync = always mail_nfs_storage = yes mail_nfs_index = yes fsync_disable=no lock_method = fcntl
mail location :
mail_location = mdbox:/var/vmail/%d/%3n/%n/mdbox:INDEX=/var/indexes/%d/%3n/%n
For a Dovecot cluster using shared storage, you are probably better off using a mailbox format for which indexes are independent of mailbox files and are automatically [re]generated if absent.
Try using mbox or maildir and store indexes on local node disk/SSD instead of on the cluster filesystem. Only store the mailboxes on the cluster filesystem. If for any reason a user login gets bumped to a node lacking the index files they're automatically rebuilt.
Since dbox indexes aren't automatically generated if missing you can't do what I describe above with dbox storage. Given the limitations of cluster filesystem (and NFS) metadata IOPS and locking, you'll likely achieve best performance and stability using local disk index files and mbox format mailboxes on GFS2. Maildir format works in this setup as well, but the metadata load on the cluster filesystem is much higher, and thus peak performance will typically be lower.
-- Stan
On Sat, 2011-06-11 at 00:24 -0400, Aliet Santiesteban Sifontes wrote:
Hello list, we continue our tests using Dovecot on a RHEL 6.1 Cluster Backend with GFS2, also we are using dovecot as a Director for user node persistence, everything was ok until we started stress testing the solution with imaptest, we had many deadlocks, cluster filesystems corruptions and hangs, specially in index filesystem, we have configured the backend as if they were on a NFS like setup but this seems not to work at least on GFS2 on rhel 6.1.
Since you're using director, you shouldn't really need any special Dovecot config.
The specific configs for NFS or cluster filesystem we used:
mmap_disable = yes mail_fsync = always mail_nfs_storage = yes mail_nfs_index = yes fsync_disable=no lock_method = fcntl
fsync_disable is a deprecated setting, which was replaced by mail_fsync. The mail_nfs_* settings will only slow things down, you should keep them as "no".
... If you mmap() a file on GFS2 with a read/write mapping, but only read from it, this only counts as a read. On GFS though, it counts as a write, so GFS2 is much more scalable with mmap() I/O...
But in our config we are using mmap_disable=yes, do we have to use mmap_disable=no with GFS2???
There are more potential bugs with mmap_disable=no, since it uses both read()/write() and mmap(), while with mmap_disable=yes it only uses read()/write().
Also, how dovecot manage the cache flush on GFS2 filesystem???
There shouldn't be any need for that with directors.
Why, if we are doing user node persistence, dovecot indexes gets corrupted???
Looks to me like GFS is still pretty buggy.
One thing you could test is if running imaptest directly against one backend server for one user triggers this. If not, run simultaneously another imaptest against another user on another server. Maybe then? The point being that try to find the simplest test that can break GFS, and once you have that try to get Redhat people to fix it.
participants (3)
-
Aliet Santiesteban Sifontes
-
Stan Hoeppner
-
Timo Sirainen