[Dovecot] mmap in GFS2 on rhel 6.1

Aliet Santiesteban Sifontes

11 Jun 2011 11 Jun '11

7:24 a.m.

Hello list, we continue our tests using Dovecot on a RHEL 6.1 Cluster Backend with GFS2, also we are using dovecot as a Director for user node persistence, everything was ok until we started stress testing the solution with imaptest, we had many deadlocks, cluster filesystems corruptions and hangs, specially in index filesystem, we have configured the backend as if they were on a NFS like setup but this seems not to work at least on GFS2 on rhel 6.1. We have a two node cluster sharing two GFS2 filesystem

Index GFS2 filesystem to store users indexes
Mailbox data on a GFS2 filesystem

The specific configs for NFS or cluster filesystem we used:

mmap_disable = yes mail_fsync = always mail_nfs_storage = yes mail_nfs_index = yes fsync_disable=no lock_method = fcntl

mail location :

mail_location = mdbox:/var/vmail/%d/%3n/%n/mdbox:INDEX=/var/indexes/%d/%3n/%n

But this seems not to work for GFS2 even doing user node persistence, maillog is plagged of errors and GFS2 hangs on stress testing with imaptest, many corrupted index for example, transaction logs etc, at this point we have many questions, first mmap... In Redhat GFS2 docs we read: Gold rules for performance: An inode is used in a read only fashion across all nodes An inode is written or modified from a single node only.

We have succesfull archived this using dovecot director

Now, for mmap rh says:

... If you mmap() a file on GFS2 with a read/write mapping, but only read from it, this only counts as a read. On GFS though, it counts as a write, so GFS2 is much more scalable with mmap() I/O...

But in our config we are using mmap_disable=yes, do we have to use mmap_disable=no with GFS2???

Also, how dovecot manage the cache flush on GFS2 filesystem???

Why, if we are doing user node persistence, dovecot indexes gets corrupted???

What lock method do we have to use??

How fsync should be used??

We know we have many questions, but this is really a very complex stuff and we are going to appreciate any help you can give us.

Thank you all for a great work, specially Timo... best regards

Show replies by date

Stan Hoeppner

11 Jun 11 Jun

12:13 p.m.

On 6/10/2011 11:24 PM, Aliet Santiesteban Sifontes wrote:

...

Hello list, we continue our tests using Dovecot on a RHEL 6.1 Cluster Backend with GFS2, also we are using dovecot as a Director for user node persistence, everything was ok until we started stress testing the solution with imaptest, we had many deadlocks, cluster filesystems corruptions and hangs, specially in index filesystem, we have configured the backend as if they were on a NFS like setup but this seems not to work at least on GFS2 on rhel 6.1.

Actual _filesystem_ corruption is typically unrelated to user space applications. You should be looking at a lower level for the cause, i.e. kernel, device driver, hardware, etc. Please post details of your shared storage hardware environment, including HBAs, SAN array brand/type, if you're using GFS2 over DRBD, etc.

...

We have a two node cluster sharing two GFS2 filesystem

Index GFS2 filesystem to store users indexes

Mailbox data on a GFS2 filesystem

Experience of many users has shown that neither popular cluster filesystems such as GFS2/OCFS, nor NFS, handle high metadata/IOPS workloads very well, especially those that make heavy use of locking.

...

The specific configs for NFS or cluster filesystem we used:

mmap_disable = yes mail_fsync = always mail_nfs_storage = yes mail_nfs_index = yes fsync_disable=no lock_method = fcntl

mail location :

mail_location = mdbox:/var/vmail/%d/%3n/%n/mdbox:INDEX=/var/indexes/%d/%3n/%n

For a Dovecot cluster using shared storage, you are probably better off using a mailbox format for which indexes are independent of mailbox files and are automatically [re]generated if absent.

Try using mbox or maildir and store indexes on local node disk/SSD instead of on the cluster filesystem. Only store the mailboxes on the cluster filesystem. If for any reason a user login gets bumped to a node lacking the index files they're automatically rebuilt.

Since dbox indexes aren't automatically generated if missing you can't do what I describe above with dbox storage. Given the limitations of cluster filesystem (and NFS) metadata IOPS and locking, you'll likely achieve best performance and stability using local disk index files and mbox format mailboxes on GFS2. Maildir format works in this setup as well, but the metadata load on the cluster filesystem is much higher, and thus peak performance will typically be lower.

-- Stan

Aliet Santiesteban Sifontes

12 Jun 12 Jun

8:27 p.m.

https://bugzilla.redhat.com/show_bug.cgi?id=712139

Furhter investigating this bug I have tested all kinds of config with dovecot, and all of them gets gfs2 hanged, I have tested this scenario with bare metal hardware cluster, with virtualized cluster guests in vmware esxi 4.1, with a cluster test in vmware workstation and I can reproduce the problem in all the tests, even in different enviroments, we are testing if dovecot can be deployed on a Redhat Cluster of Active-Active Nodes doing user session persistence. This was my last test, I simplify the scenario with a cluster in my own laptop:

1- Used a two node rhel 6.1 cluster, virtualized in VMWare Workstation. 2- Used two shared iscsi devices from a NAS. 3- Used fence_scsi.

Cluster.conf <?xml version="1.0"?> <cluster config_version="9" name="MailCluster"> <clusternodes> <clusternode name="node0.local" nodeid="1"> <fence> <method name="fn_mt_scsi"> <device name="fn_scsi"/> </method> </fence> <unfence> <device action="on" name="fn_scsi"/> </unfence> </clusternode> <clusternode name="node1.local" nodeid="2"> <fence> <method name="fn_mt_scsi"> <device name="fn_scsi"/> </method> </fence> <unfence> <device action="on" name="fn_scsi"/> </unfence> </clusternode> </clusternodes> <cman expected_votes="1" two_node="1"/> <fencedevices> <fencedevice agent="fence_scsi" logfile="/var/log/cluster/fence_scsi.log" name="fn_scsi"/> </fencedevices> </cluster>

4- Used the iscsi devices for the LVM stuff and created there the GFS2 filesystems.

fstab fragment

GFS2 filesystem

/dev/vg_indexes/lv_indexes /var/vmail/indexes gfs2 noatime,quota=off,errors=withdraw 0 0 /dev/vg_mailbox/lv_mailbox /var/vmail/mailbox gfs2 noatime,quota=off,errors=withdraw 0 0

5- Dovecot configured with users in ldap, in this case we tested the mbox mailbox format with fnctl and mmap_disable=yes, we have also tested all other mailboxes formats, indexes and mailboxes stored in gfs2 filesystems, here the conf:

[root@node0 ~]# dovecot -n

2.0.9: /etc/dovecot/dovecot.conf

OS: Linux 2.6.32-131.2.1.el6.x86_64 x86_64 Red Hat Enterprise Linux Server

release 6.1 (Santiago) gfs2 auth_default_realm = example.com auth_mechanisms = plain login auth_worker_max_count = 60 disable_plaintext_auth = no listen = * mail_fsync = always mail_gid = vmail mail_location = mbox:/var/vmail/mailbox/%d/%3n/%n:INDEX=/var/vmail/indexes/%d/%3n/%n mail_nfs_index = yes mail_nfs_storage = yes mail_uid = vmail mbox_write_locks = fcntl mmap_disable = yes passdb { args = /etc/dovecot/dovecot-ldap.conf.ext driver = ldap } ssl_cert = </etc/pki/dovecot/certs/dovecot.pem ssl_key = </etc/pki/dovecot/private/dovecot.pem userdb { args = /etc/dovecot/dovecot-ldap-userdb.conf.ext driver = ldap } [root@node0 ~]#

6- started dovecot service in the nodes.

7- from another hosts tested with imaptest the first node:

imaptest host=192.168.164.95 userfile=userfile port=143 mbox=mail/dovecot-crlf no_tracking logout=0 clients=20 secs=30 seed=123

8- Repeated many times against that node. 9- Run the test against the second node: imaptest host=192.168.164.96 userfile=userfile port=143 mbox=mail/dovecot-crlf no_tracking logout=0 clients=20 secs=30 seed=123 10- First node hangs

GFS2: fsid=MailCluster:indexes.0: fatal: filesystem consistency error GFS2: fsid=MailCluster:indexes.0: inode = 468 525144 GFS2: fsid=MailCluster:indexes.0: function = gfs2_dinode_dealloc, file = fs/gfs2/inode.c, line = 352 GFS2: fsid=MailCluster:indexes.0: about to withdraw this file system GFS2: fsid=MailCluster:indexes.0: telling LM to unmount GFS2: fsid=MailCluster:indexes.0: withdrawn Pid: 3808, comm: delete_workqueu Not tainted 2.6.32-131.2.1.el6.x86_64 #1 Call Trace: [<ffffffffa064bfd2>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2] [<ffffffffa0621209>] ? trunc_dealloc+0xa9/0x130 [gfs2] [<ffffffffa064c1dd>] ? gfs2_consist_inode_i+0x5d/0x60 [gfs2] [<ffffffffa0631584>] ? gfs2_dinode_dealloc+0x64/0x210 [gfs2] [<ffffffffa064a1da>] ? gfs2_delete_inode+0x1ba/0x280 [gfs2] [<ffffffffa064a0ad>] ? gfs2_delete_inode+0x8d/0x280 [gfs2] [<ffffffffa064a020>] ? gfs2_delete_inode+0x0/0x280 [gfs2] [<ffffffff8118cfbe>] ? generic_delete_inode+0xde/0x1d0 [<ffffffffa062e940>] ? delete_work_func+0x0/0x80 [gfs2] [<ffffffff8118d115>] ? generic_drop_inode+0x65/0x80 [<ffffffffa0648c4e>] ? gfs2_drop_inode+0x2e/0x30 [gfs2] [<ffffffff8118bf82>] ? iput+0x62/0x70 [<ffffffffa062e994>] ? delete_work_func+0x54/0x80 [gfs2] [<ffffffff810887d0>] ? worker_thread+0x170/0x2a0 [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 [<ffffffff81088660>] ? worker_thread+0x0/0x2a0 [<ffffffff8108dd96>] ? kthread+0x96/0xa0 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 [<ffffffff8108dd00>] ? kthread+0x0/0xa0 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 no_formal_ino = 468 no_addr = 525144 i_disksize = 65536 blocks = 0 i_goal = 525170 i_diskflags = 0x00000000 i_height = 1 i_depth = 0 i_entries = 0 i_eattr = 0 GFS2: fsid=MailCluster:indexes.0: gfs2_delete_inode: -5

I I change to differents mailbox formats, they also hangs, only that messages in the kernel are little differents as the first post. any ideas??? Best regards

2011/6/11 Stan Hoeppner <stan@hardwarefreak.com>

...

On 6/10/2011 11:24 PM, Aliet Santiesteban Sifontes wrote:

...
Hello list, we continue our tests using Dovecot on a RHEL 6.1 Cluster Backend with GFS2, also we are using dovecot as a Director for user node persistence, everything was ok until we started stress testing the solution with imaptest, we had many deadlocks, cluster filesystems corruptions and hangs, specially in index filesystem, we have configured the backend as if they were on a NFS like setup but this seems not to work at least on GFS2 on rhel 6.1.

Actual _filesystem_ corruption is typically unrelated to user space applications. You should be looking at a lower level for the cause, i.e. kernel, device driver, hardware, etc. Please post details of your shared storage hardware environment, including HBAs, SAN array brand/type, if you're using GFS2 over DRBD, etc.

...
We have a two node cluster sharing two GFS2 filesystem

Index GFS2 filesystem to store users indexes

Mailbox data on a GFS2 filesystem

Experience of many users has shown that neither popular cluster filesystems such as GFS2/OCFS, nor NFS, handle high metadata/IOPS workloads very well, especially those that make heavy use of locking.

...
The specific configs for NFS or cluster filesystem we used:

mmap_disable = yes mail_fsync = always mail_nfs_storage = yes mail_nfs_index = yes fsync_disable=no lock_method = fcntl

mail location :

mail_location = mdbox:/var/vmail/%d/%3n/%n/mdbox:INDEX=/var/indexes/%d/%3n/%n

For a Dovecot cluster using shared storage, you are probably better off using a mailbox format for which indexes are independent of mailbox files and are automatically [re]generated if absent.

Try using mbox or maildir and store indexes on local node disk/SSD instead of on the cluster filesystem. Only store the mailboxes on the cluster filesystem. If for any reason a user login gets bumped to a node lacking the index files they're automatically rebuilt.

Since dbox indexes aren't automatically generated if missing you can't do what I describe above with dbox storage. Given the limitations of cluster filesystem (and NFS) metadata IOPS and locking, you'll likely achieve best performance and stability using local disk index files and mbox format mailboxes on GFS2. Maildir format works in this setup as well, but the metadata load on the cluster filesystem is much higher, and thus peak performance will typically be lower.

-- Stan

Timo Sirainen

13 Jun 13 Jun

3:54 p.m.

On Sat, 2011-06-11 at 00:24 -0400, Aliet Santiesteban Sifontes wrote:

...

Hello list, we continue our tests using Dovecot on a RHEL 6.1 Cluster Backend with GFS2, also we are using dovecot as a Director for user node persistence, everything was ok until we started stress testing the solution with imaptest, we had many deadlocks, cluster filesystems corruptions and hangs, specially in index filesystem, we have configured the backend as if they were on a NFS like setup but this seems not to work at least on GFS2 on rhel 6.1.

Since you're using director, you shouldn't really need any special Dovecot config.

...

The specific configs for NFS or cluster filesystem we used:

mmap_disable = yes mail_fsync = always mail_nfs_storage = yes mail_nfs_index = yes fsync_disable=no lock_method = fcntl

fsync_disable is a deprecated setting, which was replaced by mail_fsync. The mail_nfs_* settings will only slow things down, you should keep them as "no".

...

... If you mmap() a file on GFS2 with a read/write mapping, but only read from it, this only counts as a read. On GFS though, it counts as a write, so GFS2 is much more scalable with mmap() I/O...

But in our config we are using mmap_disable=yes, do we have to use mmap_disable=no with GFS2???

There are more potential bugs with mmap_disable=no, since it uses both read()/write() and mmap(), while with mmap_disable=yes it only uses read()/write().

...

Also, how dovecot manage the cache flush on GFS2 filesystem???

There shouldn't be any need for that with directors.

...

Why, if we are doing user node persistence, dovecot indexes gets corrupted???

Looks to me like GFS is still pretty buggy.

One thing you could test is if running imaptest directly against one backend server for one user triggers this. If not, run simultaneously another imaptest against another user on another server. Maybe then? The point being that try to find the simplest test that can break GFS, and once you have that try to get Redhat people to fix it.

5308

Age (days ago)

5310

Last active (days ago)

List overview

3 comments

3 participants

participants (3)

Aliet Santiesteban Sifontes
Stan Hoeppner
Timo Sirainen

[Dovecot] mmap in GFS2 on rhel 6.1

Aliet Santiesteban Sifontes

Stan Hoeppner

Aliet Santiesteban Sifontes

GFS2 filesystem

2.0.9: /etc/dovecot/dovecot.conf

OS: Linux 2.6.32-131.2.1.el6.x86_64 x86_64 Red Hat Enterprise Linux Server

Timo Sirainen

tags

participants (3)