[Dovecot] mmap in GFS2 on rhel 6.1

Sun Jun 12 20:27:28 EEST 2011

https://bugzilla.redhat.com/show_bug.cgi?id=712139

Furhter investigating this bug I have tested all kinds of config with dovecot,
and all of them gets gfs2 hanged, I have tested this scenario with bare metal
hardware cluster, with virtualized cluster guests in vmware esxi 4.1, with a
cluster test in vmware workstation and I can reproduce the problem in all the
tests, even in different enviroments, we are testing if dovecot can be deployed
on a Redhat Cluster of Active-Active Nodes doing user session persistence.
This was my last test, I simplify the scenario with a cluster in my own laptop:

1- Used a two node rhel 6.1 cluster, virtualized in VMWare Workstation.
2- Used two shared iscsi devices from a NAS.
3- Used fence_scsi.

Cluster.conf
<?xml version="1.0"?>
<cluster config_version="9" name="MailCluster">
        <clusternodes>
                <clusternode name="node0.local" nodeid="1">
                        <fence>
                                <method name="fn_mt_scsi">
                                        <device name="fn_scsi"/>
                                </method>
                        </fence>
                        <unfence>
                                <device action="on" name="fn_scsi"/>
                        </unfence>
                </clusternode>
                <clusternode name="node1.local" nodeid="2">
                        <fence>
                                <method name="fn_mt_scsi">
                                        <device name="fn_scsi"/>
                                </method>
                        </fence>
                        <unfence>
                                <device action="on" name="fn_scsi"/>
                        </unfence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_scsi"
logfile="/var/log/cluster/fence_scsi.log" name="fn_scsi"/>
        </fencedevices>
</cluster>

4- Used the iscsi devices for the LVM stuff and created there the GFS2
filesystems.

fstab fragment

# GFS2 filesystem
/dev/vg_indexes/lv_indexes /var/vmail/indexes  gfs2
noatime,quota=off,errors=withdraw         0 0
/dev/vg_mailbox/lv_mailbox /var/vmail/mailbox  gfs2
noatime,quota=off,errors=withdraw         0 0

5- Dovecot configured with users in ldap, in this case we tested the mbox
mailbox format with fnctl and mmap_disable=yes, we have also tested all other
mailboxes formats, indexes and mailboxes stored in gfs2 filesystems, here the
conf:

[root at node0 ~]# dovecot -n
# 2.0.9: /etc/dovecot/dovecot.conf
# OS: Linux 2.6.32-131.2.1.el6.x86_64 x86_64 Red Hat Enterprise Linux Server
release 6.1 (Santiago) gfs2
auth_default_realm = example.com
auth_mechanisms = plain login
auth_worker_max_count = 60
disable_plaintext_auth = no
listen = *
mail_fsync = always
mail_gid = vmail
mail_location =
mbox:/var/vmail/mailbox/%d/%3n/%n:INDEX=/var/vmail/indexes/%d/%3n/%n
mail_nfs_index = yes
mail_nfs_storage = yes
mail_uid = vmail
mbox_write_locks = fcntl
mmap_disable = yes
passdb {
  args = /etc/dovecot/dovecot-ldap.conf.ext
  driver = ldap
}
ssl_cert = </etc/pki/dovecot/certs/dovecot.pem
ssl_key = </etc/pki/dovecot/private/dovecot.pem
userdb {
  args = /etc/dovecot/dovecot-ldap-userdb.conf.ext
  driver = ldap
}
[root at node0 ~]#

6- started dovecot service in the nodes.

7- from another hosts tested with imaptest the first node:

imaptest host=192.168.164.95 userfile=userfile port=143 mbox=mail/dovecot-crlf
no_tracking logout=0 clients=20 secs=30 seed=123

8- Repeated many times against that node.
9- Run the test against the second node:
imaptest host=192.168.164.96 userfile=userfile port=143 mbox=mail/dovecot-crlf
no_tracking logout=0 clients=20 secs=30 seed=123
10- First node hangs

GFS2: fsid=MailCluster:indexes.0: fatal: filesystem consistency error
GFS2: fsid=MailCluster:indexes.0:   inode = 468 525144
GFS2: fsid=MailCluster:indexes.0:   function = gfs2_dinode_dealloc, file =
fs/gfs2/inode.c, line = 352
GFS2: fsid=MailCluster:indexes.0: about to withdraw this file system
GFS2: fsid=MailCluster:indexes.0: telling LM to unmount
GFS2: fsid=MailCluster:indexes.0: withdrawn
Pid: 3808, comm: delete_workqueu Not tainted 2.6.32-131.2.1.el6.x86_64 #1
Call Trace:
 [<ffffffffa064bfd2>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2]
 [<ffffffffa0621209>] ? trunc_dealloc+0xa9/0x130 [gfs2]
 [<ffffffffa064c1dd>] ? gfs2_consist_inode_i+0x5d/0x60 [gfs2]
 [<ffffffffa0631584>] ? gfs2_dinode_dealloc+0x64/0x210 [gfs2]
 [<ffffffffa064a1da>] ? gfs2_delete_inode+0x1ba/0x280 [gfs2]
 [<ffffffffa064a0ad>] ? gfs2_delete_inode+0x8d/0x280 [gfs2]
 [<ffffffffa064a020>] ? gfs2_delete_inode+0x0/0x280 [gfs2]
 [<ffffffff8118cfbe>] ? generic_delete_inode+0xde/0x1d0
 [<ffffffffa062e940>] ? delete_work_func+0x0/0x80 [gfs2]
 [<ffffffff8118d115>] ? generic_drop_inode+0x65/0x80
 [<ffffffffa0648c4e>] ? gfs2_drop_inode+0x2e/0x30 [gfs2]
 [<ffffffff8118bf82>] ? iput+0x62/0x70
 [<ffffffffa062e994>] ? delete_work_func+0x54/0x80 [gfs2]
 [<ffffffff810887d0>] ? worker_thread+0x170/0x2a0
 [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81088660>] ? worker_thread+0x0/0x2a0
 [<ffffffff8108dd96>] ? kthread+0x96/0xa0
 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
 [<ffffffff8108dd00>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
  no_formal_ino = 468
  no_addr = 525144
  i_disksize = 65536
  blocks = 0
  i_goal = 525170
  i_diskflags = 0x00000000
  i_height = 1
  i_depth = 0
  i_entries = 0
  i_eattr = 0
GFS2: fsid=MailCluster:indexes.0: gfs2_delete_inode: -5

I I change to differents mailbox formats, they also hangs, only that messages
in the kernel are little differents as the first post.
any ideas???
Best regards

2011/6/11 Stan Hoeppner <stan at hardwarefreak.com>

> On 6/10/2011 11:24 PM, Aliet Santiesteban Sifontes wrote:
> > Hello list, we continue our tests using Dovecot on a RHEL 6.1 Cluster
> > Backend with GFS2, also we are using dovecot as a Director for user node
> > persistence, everything was ok until we started stress testing the
> solution
> > with imaptest, we had many deadlocks, cluster filesystems corruptions and
> > hangs, specially in index filesystem, we have configured the backend as
> if
> > they were on a NFS like setup but this seems not to work at least on GFS2
> on
> > rhel 6.1.
>
> Actual _filesystem_ corruption is typically unrelated to user space
> applications.  You should be looking at a lower level for the cause,
> i.e. kernel, device driver, hardware, etc.  Please post details of your
> shared storage hardware environment, including HBAs, SAN array
> brand/type, if you're using GFS2 over DRBD, etc.
>
> > We have a two node cluster sharing two GFS2 filesystem
> > - Index GFS2 filesystem to store users indexes
> > - Mailbox data on a GFS2 filesystem
>
> Experience of many users has shown that neither popular cluster
> filesystems such as GFS2/OCFS, nor NFS, handle high metadata/IOPS
> workloads very well, especially those that make heavy use of locking.
>
> > The specific configs for NFS or cluster filesystem we used:
> >
> > mmap_disable = yes
> > mail_fsync = always
> > mail_nfs_storage = yes
> > mail_nfs_index = yes
> > fsync_disable=no
> > lock_method = fcntl
> >
> > mail location :
> >
> > mail_location =
> > mdbox:/var/vmail/%d/%3n/%n/mdbox:INDEX=/var/indexes/%d/%3n/%n
>
> For a Dovecot cluster using shared storage, you are probably better off
> using a mailbox format for which indexes are independent of mailbox
> files and are automatically [re]generated if absent.
>
> Try using mbox or maildir and store indexes on local node disk/SSD
> instead of on the cluster filesystem.  Only store the mailboxes on the
> cluster filesystem.  If for any reason a user login gets bumped to a
> node lacking the index files they're automatically rebuilt.
>
> Since dbox indexes aren't automatically generated if missing you can't
> do what I describe above with dbox storage.  Given the limitations of
> cluster filesystem (and NFS) metadata IOPS and locking, you'll likely
> achieve best performance and stability using local disk index files and
> mbox format mailboxes on GFS2.  Maildir format works in this setup as
> well, but the metadata load on the cluster filesystem is much higher,
> and thus peak performance will typically be lower.
>
> --
> Stan
>