https://bugzilla.redhat.com/show_bug.cgi?id=712139
Furhter investigating this bug I have tested all kinds of config with dovecot, and all of them gets gfs2 hanged, I have tested this scenario with bare metal hardware cluster, with virtualized cluster guests in vmware esxi 4.1, with a cluster test in vmware workstation and I can reproduce the problem in all the tests, even in different enviroments, we are testing if dovecot can be deployed on a Redhat Cluster of Active-Active Nodes doing user session persistence. This was my last test, I simplify the scenario with a cluster in my own laptop:
1- Used a two node rhel 6.1 cluster, virtualized in VMWare Workstation. 2- Used two shared iscsi devices from a NAS. 3- Used fence_scsi.
Cluster.conf
<?xml version="1.0"?>
<cluster config_version="9" name="MailCluster"> <clusternodes> <clusternode name="node0.local" nodeid="1"> <fence> <method name="fn_mt_scsi"> <device name="fn_scsi"/> </method> </fence> <unfence> <device action="on" name="fn_scsi"/> </unfence> </clusternode> <clusternode name="node1.local" nodeid="2"> <fence> <method name="fn_mt_scsi"> <device name="fn_scsi"/> </method> </fence> <unfence> <device action="on" name="fn_scsi"/> </unfence> </clusternode> </clusternodes> <cman expected_votes="1" two_node="1"/> <fencedevices> <fencedevice agent="fence_scsi" logfile="/var/log/cluster/fence_scsi.log" name="fn_scsi"/> </fencedevices> </cluster>
4- Used the iscsi devices for the LVM stuff and created there the GFS2 filesystems.
fstab fragment
# GFS2 filesystem /dev/vg_indexes/lv_indexes /var/vmail/indexes gfs2 noatime,quota=off,errors=withdraw 0 0 /dev/vg_mailbox/lv_mailbox /var/vmail/mailbox gfs2 noatime,quota=off,errors=withdraw 0 0
5- Dovecot configured with users in ldap, in this case we tested the mbox mailbox format with fnctl and mmap_disable=yes, we have also tested all other mailboxes formats, indexes and mailboxes stored in gfs2 filesystems, here the conf:
[root@node0 ~]# dovecot -n # 2.0.9: /etc/dovecot/dovecot.conf # OS: Linux 2.6.32-131.2.1.el6.x86_64 x86_64 Red Hat Enterprise Linux Server release 6.1 (Santiago) gfs2 auth_default_realm = example.com auth_mechanisms = plain login auth_worker_max_count = 60 disable_plaintext_auth = no listen = * mail_fsync = always mail_gid = vmail mail_location = mbox:/var/vmail/mailbox/%d/%3n/%n:INDEX=/var/vmail/indexes/%d/%3n/%n mail_nfs_index = yes mail_nfs_storage = yes mail_uid = vmail mbox_write_locks = fcntl mmap_disable = yes passdb { args = /etc/dovecot/dovecot-ldap.conf.ext driver = ldap } ssl_cert =
6- started dovecot service in the nodes.
7- from another hosts tested with imaptest the first node:
imaptest host=192.168.164.95 userfile=userfile port=143 mbox=mail/dovecot-crlf no_tracking logout=0 clients=20 secs=30 seed=123
8- Repeated many times against that node. 9- Run the test against the second node: imaptest host=192.168.164.96 userfile=userfile port=143 mbox=mail/dovecot-crlf no_tracking logout=0 clients=20 secs=30 seed=123 10- First node hangs
GFS2: fsid=MailCluster:indexes.0: fatal: filesystem consistency error GFS2: fsid=MailCluster:indexes.0: inode = 468 525144 GFS2: fsid=MailCluster:indexes.0: function = gfs2_dinode_dealloc, file = fs/gfs2/inode.c, line = 352 GFS2: fsid=MailCluster:indexes.0: about to withdraw this file system GFS2: fsid=MailCluster:indexes.0: telling LM to unmount GFS2: fsid=MailCluster:indexes.0: withdrawn Pid: 3808, comm: delete_workqueu Not tainted 2.6.32-131.2.1.el6.x86_64 #1 Call Trace: [<ffffffffa064bfd2>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2] [<ffffffffa0621209>] ? trunc_dealloc+0xa9/0x130 [gfs2] [<ffffffffa064c1dd>] ? gfs2_consist_inode_i+0x5d/0x60 [gfs2] [<ffffffffa0631584>] ? gfs2_dinode_dealloc+0x64/0x210 [gfs2] [<ffffffffa064a1da>] ? gfs2_delete_inode+0x1ba/0x280 [gfs2] [<ffffffffa064a0ad>] ? gfs2_delete_inode+0x8d/0x280 [gfs2] [<ffffffffa064a020>] ? gfs2_delete_inode+0x0/0x280 [gfs2] [<ffffffff8118cfbe>] ? generic_delete_inode+0xde/0x1d0 [<ffffffffa062e940>] ? delete_work_func+0x0/0x80 [gfs2] [<ffffffff8118d115>] ? generic_drop_inode+0x65/0x80 [<ffffffffa0648c4e>] ? gfs2_drop_inode+0x2e/0x30 [gfs2] [<ffffffff8118bf82>] ? iput+0x62/0x70 [<ffffffffa062e994>] ? delete_work_func+0x54/0x80 [gfs2] [<ffffffff810887d0>] ? worker_thread+0x170/0x2a0 [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 [<ffffffff81088660>] ? worker_thread+0x0/0x2a0 [<ffffffff8108dd96>] ? kthread+0x96/0xa0 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 [<ffffffff8108dd00>] ? kthread+0x0/0xa0 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 no_formal_ino = 468 no_addr = 525144 i_disksize = 65536 blocks = 0 i_goal = 525170 i_diskflags = 0x00000000 i_height = 1 i_depth = 0 i_entries = 0 i_eattr = 0 GFS2: fsid=MailCluster:indexes.0: gfs2_delete_inode: -5
I I change to differents mailbox formats, they also hangs, only that messages in the kernel are little differents as the first post. any ideas??? Best regards
2011/6/11 Stan Hoeppner stan@hardwarefreak.com
On 6/10/2011 11:24 PM, Aliet Santiesteban Sifontes wrote:
Hello list, we continue our tests using Dovecot on a RHEL 6.1 Cluster Backend with GFS2, also we are using dovecot as a Director for user node persistence, everything was ok until we started stress testing the solution with imaptest, we had many deadlocks, cluster filesystems corruptions and hangs, specially in index filesystem, we have configured the backend as if they were on a NFS like setup but this seems not to work at least on GFS2 on rhel 6.1.
Actual _filesystem_ corruption is typically unrelated to user space applications. You should be looking at a lower level for the cause, i.e. kernel, device driver, hardware, etc. Please post details of your shared storage hardware environment, including HBAs, SAN array brand/type, if you're using GFS2 over DRBD, etc.
We have a two node cluster sharing two GFS2 filesystem
- Index GFS2 filesystem to store users indexes
- Mailbox data on a GFS2 filesystem
Experience of many users has shown that neither popular cluster filesystems such as GFS2/OCFS, nor NFS, handle high metadata/IOPS workloads very well, especially those that make heavy use of locking.
The specific configs for NFS or cluster filesystem we used:
mmap_disable = yes mail_fsync = always mail_nfs_storage = yes mail_nfs_index = yes fsync_disable=no lock_method = fcntl
mail location :
mail_location = mdbox:/var/vmail/%d/%3n/%n/mdbox:INDEX=/var/indexes/%d/%3n/%n
For a Dovecot cluster using shared storage, you are probably better off using a mailbox format for which indexes are independent of mailbox files and are automatically [re]generated if absent.
Try using mbox or maildir and store indexes on local node disk/SSD instead of on the cluster filesystem. Only store the mailboxes on the cluster filesystem. If for any reason a user login gets bumped to a node lacking the index files they're automatically rebuilt.
Since dbox indexes aren't automatically generated if missing you can't do what I describe above with dbox storage. Given the limitations of cluster filesystem (and NFS) metadata IOPS and locking, you'll likely achieve best performance and stability using local disk index files and mbox format mailboxes on GFS2. Maildir format works in this setup as well, but the metadata load on the cluster filesystem is much higher, and thus peak performance will typically be lower.
-- Stan