[Dovecot] Direct groups of users to pairs of backend servers
Hi All,
I am using dovecot in the Director setup with multiple proxy and backend mailstores and user information stored in LDAP. I am aware users can be directed to a single backend server. It would be useful to be able to direct groups of users to pairs of backend servers to give some fault tolerance against NFS issues and make the whole thing more scalable. Otherwise each backend mailstore will need all the NFS mounts and the whole cluster will be affected if one NFS mount has an issue. I am not sure if this possible with the current dovecot implementation? If not it would be a great enhancement.
Thanks
Murray
On 3/24/2014 10:02 PM, Murray Trainer wrote:
Hi All,
I am using dovecot in the Director setup with multiple proxy and backend mailstores and user information stored in LDAP. I am aware users can be directed to a single backend server. It would be useful to be able to direct groups of users to pairs of backend servers to give some fault tolerance against NFS issues and make the whole thing more scalable.
Your description says you currently have a "shared nothing" storage architecture. You can't get any more scalable than that. To enable "groups of users" to be directed to "pairs of backend servers" you'll need each member of the pair to mount the NFS path of the partner server.
Then you will have two different mailbox locations to deal with. Do you have per user mailbox paths configured in LDAP? You will have to do that for this "pairing" to work.
Otherwise each backend mailstore will need all the NFS mounts and the whole cluster will be affected if one NFS mount has an issue.
The whole cluster will not be affected. Only users whose mail in on the problem mount will be affected. This is no different that your current setup in that regard.
I am not sure if this possible with the current dovecot implementation? If not it would be a great enhancement.
So, in a nutshell, you want Dovecot to be able to overcome faults in your NFS architecture because you did not build in redundancy? Is this correct?
Why are you concerned about NFS mount failures? Most folks running NFS Dovecot clusters share a single mount with all mailboxes among all the cluster nodes. You seem to have multiple mounts, one for each backend node. If mount failures were a common occurrence, we'd see frequent reports of that. But we don't. Did you home brew your NFS servers and they're not reliable?
Cheers,
-- Stan
On 25/03/14 15:06, Stan Hoeppner wrote:
On 3/24/2014 10:02 PM, Murray Trainer wrote:
Hi All,
I am using dovecot in the Director setup with multiple proxy and backend mailstores and user information stored in LDAP. I am aware users can be directed to a single backend server. It would be useful to be able to direct groups of users to pairs of backend servers to give some fault tolerance against NFS issues and make the whole thing more scalable. Your description says you currently have a "shared nothing" storage architecture. You can't get any more scalable than that. To enable "groups of users" to be directed to "pairs of backend servers" you'll need each member of the pair to mount the NFS path of the partner server.
Then you will have two different mailbox locations to deal with. Do you have per user mailbox paths configured in LDAP? You will have to do that for this "pairing" to work.
Otherwise each backend mailstore will need all the NFS mounts and the whole cluster will be affected if one NFS mount has an issue. The whole cluster will not be affected. Only users whose mail in on the problem mount will be affected. This is no different that your current setup in that regard.
I am not sure if this possible with the current dovecot implementation? If not it would be a great enhancement. So, in a nutshell, you want Dovecot to be able to overcome faults in your NFS architecture because you did not build in redundancy? Is this correct?
Why are you concerned about NFS mount failures? Most folks running NFS Dovecot clusters share a single mount with all mailboxes among all the cluster nodes. You seem to have multiple mounts, one for each backend node. If mount failures were a common occurrence, we'd see frequent reports of that. But we don't. Did you home brew your NFS servers and they're not reliable?
Cheers, Hi Stan,
Sorry I didn't properly explain my setup. The backend mailstores each have the same set of 5 NFS mounts from EMC VNX storage where the mailboxes are located. We don't use share NFS filesystems directly between mailstores. There is no relation between the number of NFS mounts and backend mailstores. We are talking about migrating a large amount of users and mailboxes - 100,000+ and 50TB+ and don't want to put that all on one NFS filesystem. We want to break it down into redundant parts so that all the mailstores don't stop functioning if there is a problem with the one NFS filesystem. Our NFS storage should be pretty reliable but the email below on this list about a week ago made me concerned about all our mailstores hanging if there is a problem with one of the NFS mounts. Hence the query about breaking up the NFS mounts into groups per pair of mailstores. We will eventually set mail servers and redundant EMC storage between separate data centres and use pNFS which will make the whole thing more fault tolerant but that won't happen for a while.
Thanks for your response.
Murray
[Dovecot] NFS not responding generates authantication crash I am facing dovecot authentication problems caused by unresponding NFS server. If there is even short break in communication with NFS server keeping maildirs, the dovecot generates the avalanche of processes (dovecot/imap and dovecot/pop3). The real number of connections was about 50 and after the problems occurs it rises to 1000. After about 3 hours the limit of connections is filled up: dovecot: master: Warning: service(auth): client_limit (1000) reached, client connections are being dropped and next: imap-login: Warning: Auth process not responding, delayed sending greeting pop3-login: Warning: Error sending handshake to auth server: Broken pipe imap-login: Warning: Error sending handshake to auth server: Broken pipe
On 3/25/2014 8:18 AM, Murray Trainer wrote:
On 25/03/14 15:06, Stan Hoeppner wrote:
On 3/24/2014 10:02 PM, Murray Trainer wrote:
Hi All,
I am using dovecot in the Director setup with multiple proxy and backend mailstores and user information stored in LDAP. I am aware users can be directed to a single backend server. It would be useful to be able to direct groups of users to pairs of backend servers to give some fault tolerance against NFS issues and make the whole thing more scalable. Your description says you currently have a "shared nothing" storage architecture. You can't get any more scalable than that. To enable "groups of users" to be directed to "pairs of backend servers" you'll need each member of the pair to mount the NFS path of the partner server.
Then you will have two different mailbox locations to deal with. Do you have per user mailbox paths configured in LDAP? You will have to do that for this "pairing" to work.
Otherwise each backend mailstore will need all the NFS mounts and the whole cluster will be affected if one NFS mount has an issue. The whole cluster will not be affected. Only users whose mail in on the problem mount will be affected. This is no different that your current setup in that regard.
I am not sure if this possible with the current dovecot implementation? If not it would be a great enhancement. So, in a nutshell, you want Dovecot to be able to overcome faults in your NFS architecture because you did not build in redundancy? Is this correct?
Why are you concerned about NFS mount failures? Most folks running NFS Dovecot clusters share a single mount with all mailboxes among all the cluster nodes. You seem to have multiple mounts, one for each backend node. If mount failures were a common occurrence, we'd see frequent reports of that. But we don't. Did you home brew your NFS servers and they're not reliable?
Cheers, Hi Stan,
Sorry I didn't properly explain my setup.
The backend mailstores each have the same set of 5 NFS mounts from EMC VNX storage where the mailboxes are located...
There is no relation between the number of NFS mounts and backend mailstores.
Surely you see the contradiction here.
You're talking in present tense. Have you already set this up, or is this 5 mounts per mailbox host simply a potential architectural idea right now?
We are talking about migrating a large amount of users and mailboxes - 100,000+ and 50TB+ and don't want to put that all on one NFS filesystem. We want to break it down into redundant parts so that all the mailstores don't stop functioning if there is a problem with the one NFS filesystem.
Sounds reasonable. But you just traded horses, going from "mount point down" to "NFS filesystem" problem. By that do you mean the actual EMC proprietary filesystem that is exported? Filesystem as in run fsck if broken? And if so, you're simply wanting to mirror those filesystems within the EMC, create a different export for each, and have two servers in a "pair" each mount one of these mirrored filesystems?
Never heard of such a thing...
Our NFS storage should be pretty reliable but the email below on this list about a week ago made me concerned about all our mailstores hanging if there is a problem with one of the NFS mounts.
Mounts are client side. Exports are server side. If a mount hangs only that client host has a problem. Are you concerned about a mount failing or an export failing?
Hence the query about breaking up the NFS mounts into groups per pair of mailstores.
You need to explain this concept in technical detail. As stated it makes no sense, because both NFSv3 and v4 support export failover. Surely the EMC supports this. Actually, in v4 mode, is -must- because it's part of the protocol itself.
We will eventually set mail servers and redundant EMC storage between separate data centres and use pNFS which will make the whole thing more fault tolerant but that won't happen for a while.
Thanks for your response.
Murray
[Dovecot] NFS not responding generates authantication crash I am facing dovecot authentication problems caused by unresponding NFS server. If there is even short break in communication with NFS server keeping maildirs, the dovecot generates the avalanche of processes (dovecot/imap and dovecot/pop3). The real number of connections was about 50 and after the problems occurs it rises to 1000. After about 3 hours the limit of connections is filled up: dovecot: master: Warning: service(auth): client_limit (1000) reached, client connections are being dropped and next: imap-login: Warning: Auth process not responding, delayed sending greeting pop3-login: Warning: Error sending handshake to auth server: Broken pipe imap-login: Warning: Error sending handshake to auth server: Broken pipe
NFSv4 has a 90 second failover grace period. If the user above was using NFSv4 clustering this breakage would not have happened, at least not to this degree.
Cheers,
Stan
participants (2)
-
Murray Trainer
-
Stan Hoeppner