Director+NFS Experiences

Mark Moseley

24 Feb 2017 24 Feb '17

12:08 a.m.

As someone who is about to begin the process of moving from maildir to mdbox on NFS (and therefore just about to start the 'director-ization' of everything) for ~6.5m mailboxes, I'm curious if anyone can share any experiences with it. The list is surprisingly quiet about this subject, and articles on google are mainly just about setting director up. I've yet to stumble across an article about someone's experiences with it.

How big of a director cluster do you use? I'm going to have millions of mailboxes behind 10 directors. I'm guessing that's plenty. It's actually split over two datacenters. In the larger, we've got about 200k connections currently, so in a perfectly-balanced world, each director would have 20k connections on it. I'm guessing that's child's play. Any good rule of thumb for ratio of 'backend servers::director servers'? In my larger DC, it's about 5::1.
Do you use the perl poolmon script or something else? The perl script was being weird for me, so I rewrote it in python but it basically does the exact same things.
Seen any issues with director? In testing, I managed to wedge things by having my poolmon script running on all the cluster boxes (I think). I've since rewritten it to run *only* on the lowest-numbered director. When it wedged, I had piles (read: hundreds per second) of log entries that said:

Feb 12 06:25:03 director: Warning: director(10.1.20.5:9090/right): Host 10.1.17.3 is being updated before previous update had finished (down -> up)

setting to state=up vhosts=0 Feb 12 06:25:03 director: Warning: director(10.1.20.5:9090/right): Host 10.1.17.3 is being updated before previous update had finished (up -> down)
setting to state=down vhosts=0 Feb 12 06:25:03 director: Warning: director(10.1.20.3:9090/left): Host 10.1.17.3 is being updated before previous update had finished (down -> up)
setting to state=up vhosts=0 Feb 12 06:25:03 director: Warning: director(10.1.20.3:9090/left): Host 10.1.17.3 is being updated before previous update had finished (up -> down)
setting to state=down vhosts=0

Because it was in testing, I didn't notice it and it was like this for several days till dovecot was restarted on all the director nodes. I'm not 100% on what happened, but my *guess* is that two boxes tried to update the status of the same backend server in rapid succession.

Assuming you're using NFS, do you still see non-trivial amounts of indexes getting corrupted?
Again, assuming NFS and assuming at least some corrupted indexes, what's your guess for success rate % for dovecot recovering them automatically? And how about success rate % for ones that dovecot wasn't able to do automatically but you had to use doveadm to repair it? Really what I'm trying to figure out is 1) how often sysops will need to manually recover indexes; and 2) how often admins *can't* manually recover indexes?
if you have unrecoverable indexes (and assuming you have snapshots on your NFS server), does grabbing the most recent indexes from the snapshots always work for recovery (obviously, up till the point that the snapshot was taken)?
Any gotchas you've seen anywhere in a director-fied stack? I realize that's a broad question :)
Does one of your director nodes going down cause any issues? E.g. issues with the left and right nodes syncing with each other? Or when the director node comes back up?
Does a backend node going down cause a storm of reconnects? In the time between deploying director and getting mailboxes converted to mdbox, reconnects for us will mean cold local-disk dovecot caches. But hopefully consistent hashing helps with that?
Do you have consistent hashing turned on? I can't think of any reason not to have it turned on, but who knows
Any other configuration knobs (including sysctl) that you needed to futz with, vs the default?

I appreciate any feedback!

Show replies by date

Timo Sirainen

24 Feb 24 Feb

1:15 a.m.

On 24 Feb 2017, at 0.08, Mark Moseley <moseleymark@gmail.com> wrote:

...

As someone who is about to begin the process of moving from maildir to mdbox on NFS (and therefore just about to start the 'director-ization' of everything) for ~6.5m mailboxes, I'm curious if anyone can share any experiences with it. The list is surprisingly quiet about this subject, and articles on google are mainly just about setting director up. I've yet to stumble across an article about someone's experiences with it.

How big of a director cluster do you use? I'm going to have millions of mailboxes behind 10 directors.

I wouldn't use more than 10.

...

I'm guessing that's plenty. It's actually split over two datacenters.

Two datacenters in the same director ring? This is dangerous. if there's a network connectivity problem between them, they split into two separate rings and start redirecting users to different backends.

...

Do you have consistent hashing turned on? I can't think of any reason not to have it turned on, but who knows

Definitely turn it on. The setting only exists because of backwards compatibility and will be removed at some point.

Mark Moseley

1:32 a.m.

On Thu, Feb 23, 2017 at 3:15 PM, Timo Sirainen <tss@iki.fi> wrote:

...

On 24 Feb 2017, at 0.08, Mark Moseley <moseleymark@gmail.com> wrote:

...
As someone who is about to begin the process of moving from maildir to mdbox on NFS (and therefore just about to start the 'director-ization' of everything) for ~6.5m mailboxes, I'm curious if anyone can share any experiences with it. The list is surprisingly quiet about this subject,

and

...
articles on google are mainly just about setting director up. I've yet to stumble across an article about someone's experiences with it.

How big of a director cluster do you use? I'm going to have millions of mailboxes behind 10 directors.

I wouldn't use more than 10.

Cool

...

...
I'm guessing that's plenty. It's actually split over two datacenters.

Two datacenters in the same director ring? This is dangerous. if there's a network connectivity problem between them, they split into two separate rings and start redirecting users to different backends.

I was unclear. The two director rings are unrelated and won't ever need to talk to each other. I only mentioned the two rings to point out that all 6.5m mailboxes weren't behind one ring, but rather split between two

...

...

Do you have consistent hashing turned on? I can't think of any reason not to have it turned on, but who knows

Definitely turn it on. The setting only exists because of backwards compatibility and will be removed at some point.

Out of curiosity (and possibly extremely naive), unless you've moved a mailbox via 'doveadm director', if someone is pointed to a box via consistent hashing, why would the directors need to share that mailbox mapping? Again, assuming they're not moved (I'm also assuming that the mailbox would always, by default, hash to the same value in the consistent hash), isn't their hashing all that's needed to get to the right backend? I.e. "I know what the mailbox hashes to, and I know what backend that hash points at, so I'm done", in which case, no need to communicate to the other directors. I could see that if you moved someone, it *would* need to communicate that mapping. Then the only maps traded by directors would be the consistent hash boundaries *plus* any "moved" mailboxes. Again, just curious.

Mark Moseley

9:29 p.m.

...

On Thu, Feb 23, 2017 at 3:15 PM, Timo Sirainen <tss@iki.fi> wrote:

...
On 24 Feb 2017, at 0.08, Mark Moseley <moseleymark@gmail.com> wrote:

...
As someone who is about to begin the process of moving from maildir to mdbox on NFS (and therefore just about to start the 'director-ization'

of

...
everything) for ~6.5m mailboxes, I'm curious if anyone can share any experiences with it. The list is surprisingly quiet about this subject, and articles on google are mainly just about setting director up. I've yet to stumble across an article about someone's experiences with it.

How big of a director cluster do you use? I'm going to have millions of mailboxes behind 10 directors.

I wouldn't use more than 10.

Cool

...
...
I'm guessing that's plenty. It's actually split over two datacenters.

Two datacenters in the same director ring? This is dangerous. if there's a network connectivity problem between them, they split into two separate rings and start redirecting users to different backends.

I was unclear. The two director rings are unrelated and won't ever need to talk to each other. I only mentioned the two rings to point out that all 6.5m mailboxes weren't behind one ring, but rather split between two

...
...

Do you have consistent hashing turned on? I can't think of any reason not to have it turned on, but who knows

Definitely turn it on. The setting only exists because of backwards compatibility and will be removed at some point.

Out of curiosity (and possibly extremely naive), unless you've moved a mailbox via 'doveadm director', if someone is pointed to a box via consistent hashing, why would the directors need to share that mailbox mapping? Again, assuming they're not moved (I'm also assuming that the mailbox would always, by default, hash to the same value in the consistent hash), isn't their hashing all that's needed to get to the right backend? I.e. "I know what the mailbox hashes to, and I know what backend that hash points at, so I'm done", in which case, no need to communicate to the other directors. I could see that if you moved someone, it *would* need to communicate that mapping. Then the only maps traded by directors would be the consistent hash boundaries *plus* any "moved" mailboxes. Again, just curious.

Timo, Incidentally, on that error I posted:

Feb 12 06:25:03 director: Warning: director(10.1.20.3:9090/left): Host 10.1.17.3 is being updated before previous update had finished (up -> down)

setting to state=down vhosts=0 Feb 12 06:25:03 director: Warning: director(10.1.20.3:9090/left): Host 10.1.17.3 is being updated before previous update had finished (down -> up)
setting to state=up vhosts=0

any idea what would cause that? Is my guess that multiple directors tried to update the status simultaneously correct?

Francisco Wagner C. Freire

9:41 p.m.

In our experience. A ring with more of 4 servers is bad, we have sync problems everyone. Using 4 or less works perfect.

Em 24 de fev de 2017 4:30 PM, "Mark Moseley" <moseleymark@gmail.com> escreveu:

...

...
On Thu, Feb 23, 2017 at 3:15 PM, Timo Sirainen <tss@iki.fi> wrote:

...
On 24 Feb 2017, at 0.08, Mark Moseley <moseleymark@gmail.com> wrote:

...
As someone who is about to begin the process of moving from maildir to mdbox on NFS (and therefore just about to start the 'director-ization'

of

...
everything) for ~6.5m mailboxes, I'm curious if anyone can share any experiences with it. The list is surprisingly quiet about this

subject,

...
...
and

...
articles on google are mainly just about setting director up. I've yet to stumble across an article about someone's experiences with it.

How big of a director cluster do you use? I'm going to have millions of mailboxes behind 10 directors.

I wouldn't use more than 10.

Cool

...
...
I'm guessing that's plenty. It's actually split over two datacenters.

Two datacenters in the same director ring? This is dangerous. if there's a network connectivity problem between them, they split into two separate rings and start redirecting users to different backends.

I was unclear. The two director rings are unrelated and won't ever need to talk to each other. I only mentioned the two rings to point out that all 6.5m mailboxes weren't behind one ring, but rather split between two

...
...

Do you have consistent hashing turned on? I can't think of any

reason

...
not

...
to have it turned on, but who knows

Definitely turn it on. The setting only exists because of backwards compatibility and will be removed at some point.

Out of curiosity (and possibly extremely naive), unless you've moved a mailbox via 'doveadm director', if someone is pointed to a box via consistent hashing, why would the directors need to share that mailbox mapping? Again, assuming they're not moved (I'm also assuming that the mailbox would always, by default, hash to the same value in the consistent hash), isn't their hashing all that's needed to get to the right backend? I.e. "I know what the mailbox hashes to, and I know what backend that hash points at, so I'm done", in which case, no need to communicate to the other directors. I could see that if you moved someone, it *would* need to communicate that mapping. Then the only maps traded by directors would be the consistent hash boundaries *plus* any "moved" mailboxes. Again, just curious.

Timo, Incidentally, on that error I posted:

Feb 12 06:25:03 director: Warning: director(10.1.20.3:9090/left): Host 10.1.17.3 is being updated before previous update had finished (up -> down)

setting to state=down vhosts=0 Feb 12 06:25:03 director: Warning: director(10.1.20.3:9090/left): Host 10.1.17.3 is being updated before previous update had finished (down -> up)

setting to state=up vhosts=0

any idea what would cause that? Is my guess that multiple directors tried to update the status simultaneously correct?

Mark Moseley

9:58 p.m.

On Fri, Feb 24, 2017 at 11:41 AM, Francisco Wagner C. Freire < wgrcunha@gmail.com> wrote:

...

In our experience. A ring with more of 4 servers is bad, we have sync problems everyone. Using 4 or less works perfect.

Em 24 de fev de 2017 4:30 PM, "Mark Moseley" <moseleymark@gmail.com> escreveu:

...
...
On Thu, Feb 23, 2017 at 3:15 PM, Timo Sirainen <tss@iki.fi> wrote:

...
On 24 Feb 2017, at 0.08, Mark Moseley <moseleymark@gmail.com> wrote:

...
As someone who is about to begin the process of moving from maildir

to

...
...
...
mdbox on NFS (and therefore just about to start the 'director-ization' of everything) for ~6.5m mailboxes, I'm curious if anyone can share any experiences with it. The list is surprisingly quiet about this subject, and articles on google are mainly just about setting director up. I've yet to stumble across an article about someone's experiences with it.

How big of a director cluster do you use? I'm going to have millions of mailboxes behind 10 directors.

I wouldn't use more than 10.

Cool

Interesting. That's good feedback. One of the things I wondered about is whether it'd be better to deploy a 10-node ring or split it into 2x 5-node rings. Sounds like splitting it up might not be a bad idea. How often would you see those sync problems (and were they the same errors as I posted or something else)? And were you running poolmon from every node when you were seeing sync errors?

Tanstaafl

25 Feb 25 Feb

5:52 p.m.

On Fri Feb 24 2017 14:41:17 GMT-0500 (Eastern Standard Time), Francisco Wagner C. Freire <wgrcunha@gmail.com> wrote:

...

In our experience. A ring with more of 4 servers is bad, we have sync problems everyone. Using 4 or less works perfect.

Since this contradicts Timo's recommendation not to use more than 10, it sounds to me like you either encountered a bug, or possibly it was not optimally deployed.

Did you ever come here and ask for help?

Timo Sirainen

7:25 p.m.

On 25 Feb 2017, at 17.52, Tanstaafl <tanstaafl@libertytrek.org> wrote:

...

On Fri Feb 24 2017 14:41:17 GMT-0500 (Eastern Standard Time), Francisco Wagner C. Freire <wgrcunha@gmail.com> wrote:

...
In our experience. A ring with more of 4 servers is bad, we have sync problems everyone. Using 4 or less works perfect.

Since this contradicts Timo's recommendation not to use more than 10, it sounds to me like you either encountered a bug, or possibly it was not optimally deployed.

Did you ever come here and ask for help?

There have of course been various bugs in director once in a while. Also depends a bit on how director was being used. Currently I'm not aware of any bugs related to it.

Timo Sirainen

24 Feb 24 Feb

9:53 p.m.

On 24 Feb 2017, at 21.29, Mark Moseley <moseleymark@gmail.com> wrote:

...

Feb 12 06:25:03 director: Warning: director(10.1.20.3:9090/left): Host 10.1.17.3 is being updated before previous update had finished (up -> down)

setting to state=down vhosts=0 Feb 12 06:25:03 director: Warning: director(10.1.20.3:9090/left): Host 10.1.17.3 is being updated before previous update had finished (down -> up)

setting to state=up vhosts=0

any idea what would cause that? Is my guess that multiple directors tried to update the status simultaneously correct?

Most likely, yes. I'm not sure if it might happen also if the same server issues conflicting commands rapidly.

Zhang Huangbin

1:45 a.m.

...

On Feb 24, 2017, at 6:08 AM, Mark Moseley <moseleymark@gmail.com> wrote:

Do you use the perl poolmon script or something else? The perl script was being weird for me, so I rewrote it in python but it basically does the exact same things.

Would you mind sharing it? :)

Zhang Huangbin, founder of iRedMail project: http://www.iredmail.org/ Time zone: GMT+8 (China/Beijing). Available on Telegram: https://t.me/iredmail

Mark Moseley

9:28 p.m.

On Thu, Feb 23, 2017 at 3:45 PM, Zhang Huangbin <zhb@iredmail.org> wrote:

...

...
On Feb 24, 2017, at 6:08 AM, Mark Moseley <moseleymark@gmail.com> wrote:

Do you use the perl poolmon script or something else? The perl script was being weird for me, so I rewrote it in python but it basically does the exact same things.

Would you mind sharing it? :)

Zhang Huangbin, founder of iRedMail project: http://www.iredmail.org/ Time zone: GMT+8 (China/Beijing). Available on Telegram: https://t.me/iredmail

Attached. No claims are made on the quality of my code :)

Zhang Huangbin

25 Feb 25 Feb

6:24 a.m.

...

On Feb 25, 2017, at 3:28 AM, Mark Moseley <moseleymark@gmail.com> wrote:

Attached. No claims are made on the quality of my code :)

Thank you for sharing. :)

Some suggestions:

should replace log() by the standard logging module like "logging.debug(xx)”
add managesieve support
add lmtp support
how about store command line options in a config file? remove the ‘optparse’ module.
email notification support when server is up/down
lots of PEP8 style issue :)

Would you like to publish this code in github/bitbucket/…?

Zhang Huangbin, founder of iRedMail project: http://www.iredmail.org/ Time zone: GMT+8 (China/Beijing). Available on Telegram: https://t.me/iredmail

Sami Ketola

27 Feb 27 Feb

11:40 a.m.

...

On 24 Feb 2017, at 21.28, Mark Moseley <moseleymark@gmail.com> wrote: Attached. No claims are made on the quality of my code :) <poolmon>

With recent dovecots you probably should not use set_host_weight( server, '0’ ) to mark backend down but instead should use director commands HOST-DOWN and HOST-UP in combination with HOST-FLUSH.

Sami

Tom Sommer

12:20 p.m.

On 2017-02-27 10:40, Sami Ketola wrote:

...

...
On 24 Feb 2017, at 21.28, Mark Moseley <moseleymark@gmail.com> wrote: Attached. No claims are made on the quality of my code :) <poolmon>

With recent dovecots you probably should not use set_host_weight( server, '0’ ) to mark backend down but instead should use director commands HOST-DOWN and HOST-UP in combination with HOST-FLUSH.

This is already the case in the latest version of Poolmon

Alessio Cecchi

3 Mar 3 Mar

6:14 p.m.

Il 23/02/2017 23:08, Mark Moseley ha scritto:

...

As someone who is about to begin the process of moving from maildir to mdbox on NFS (and therefore just about to start the 'director-ization' of everything) for ~6.5m mailboxes, I'm curious if anyone can share any experiences with it. The list is surprisingly quiet about this subject, and articles on google are mainly just about setting director up. I've yet to stumble across an article about someone's experiences with it. Hi,

in the past I did some consulting for ISPs with 4-5mln mailboxes, they had "only" 6 Director and about 30 or more Dovecot backend.

About NFS, I had some trouble with Maildir, Director and NFSv4, I don't know if was a problem of client (Debian 6) or storage (NetApp Ontap 8.1) but with NFSv3 work fine. Now we should try again with Centos 6/7 and NFSv4.1.

-- Alessio Cecchi Postmaster @ http://www.qboxmail.it https://www.linkedin.com/in/alessice

3077

Age (days ago)

3085

Last active (days ago)

List overview

14 comments

8 participants

participants (8)

Alessio Cecchi
Francisco Wagner C. Freire
Mark Moseley
Sami Ketola
Tanstaafl
Timo Sirainen
Tom Sommer
Zhang Huangbin