We're a similar installation (60-70k users, FAS3050 cluster).
We have been using "perdition" (IMAP/POP redirector) software for a while. The IMAP/POP.ourdomain A records point to 2 front ends, which all they do is to redirect the IMAP/POP session to the a specific mail server for each user, based on their LDAP mailhost entry.
We use postfix to deliver mail, and procmail is the LDA. We are running a background process on each mail server (3 mailservers which do SMTP/POP/IMAP, and barely sweat at it - 2xquad core Xeons with 8gig each), which monitors the maillog, and if dovecot sees a index corruption, the monitor fixes the problem (we used to see these errors when we still used mbox, not anymore though).
We run a periodic process, which gets statistics of usage from the mail servers, and reassign the users to mail servers in order to better distribute the load. Each new user that is created gets his mail server by a random function which choses one of the three.
Each mail server in the user LDAP entry is in fact a virtual address on a load balancer, pointing to the real mail server behind it, BUT also having a backup server for each in case the real server crashes, so assuming mailsrv1 crashes, mailsrv2 will take its clients.
The setup works rather well, within the limitations of maildir and netapp (mainly full body search being slowish with very large mailboxes made out of 10s of thousands of files).
We used to not use the "perdition" directors in the past, and once we started using them, we saved alot of problems on a few fronts:
- Index corruption issues
- SSL termination - since the front ends to the SSL termination, the backend servers access from the front ends is clear text, saving CPU cycles from the backends servers.
I haven't taken a look yet at Dovecot's solution for the director, but I am writing this since I do think that it is addressing a real life problem for any medium++ or larger installation that uses NFS.
Oh, and Timo, I don't think we are just "a couple of NFS users". Maildir and NFS are not as uncommon as you'd think, even in very large installations.
--Ariel
Brandon Davidson wrote:
Noel,
On 8/26/10 9:59 PM, "Noel Butler" <noel.butler@ausics.net> wrote:
I fail to see advantage if anything it add in more point of failure, with
i agree with this and it is why we dont use it
we use dovecots deliver with postfix and have noticed no problems, not to say there was none, but if so, we dont notice it.
We might be a slightly larger install than you (60k users, mail on FAS 3170 Metrocluster), but we have noticed corruption issues and the director is definitely going to see use in our shop. We still use Sendmail+procmail for delivery, so no issue there... but we've got hordes of IMAP users that will leave a client running at home, at their desk, on their phone, and then will use Webmail on their laptop.
Without the director, all of these sessions end up on different backend mailservers, and it's basically a crapshoot which Dovecot instance notices a new message first. NFS locking being what it is, odds are an index will get corrupted sooner or later, and when this happens the user's mail 'disappears' until Dovecot can reindex it. The users inevitably freak out and call the helpdesk, who tells them to close and reopen their mail client. Maybe you're small enough to not run into problems, or maybe your users just have lower expectations or a higher pain threshold than ours. Either way, it's unpleasant for everyone involved, and quite easy to solve with the director proxy.
Timo has been saying for YEARS that you need user-node affinity if you're doing NFS, and now he's done something about it. If you've already got a load balancer, then just point the balancer at a pool of directors, and then point the directors at your existing mailserver pool.
<shameless plug> For health monitoring on the directors, check out: http://github.com/brandond/poolmon </shameless plug>
-Brad
--
Ariel Biener e-mail: ariel@post.tau.ac.il PGP: http://www.tau.ac.il/~ariel/pgp.html