Hi,
Having asked if there are any big sites (50,000-100,000 users) it seems there are a few. I'd like to ask some fairly general questions.
I have inherited responsibility for a Cyrus mail store, at a UK university.
It is front-ended by a pair of mail gateways running Exim which handle spam, A/V etc.
Local delivery is a dedicated Suse Linux box running Postfix feeding Cyrus over LMTP. There are around 80,000 accounts, with around 20,000 active (one or (many) more messages per day). I suspect we peak at around 500 simultaneous users. The message store is around 600Gb.
Cyrus back-end storage is a fibre-channel SAN. We use most of the Cyrus functionality including Sieve, quotas and shared mailboxes. Clients access the mail store using their choice of client, predominantly IMAP/SSL from Horde/IMP or Outlook, although some of us use Thunderbird. In theory we have a stand-by box, which is a similar configuration (but with a local RAID array). The two used to be connected by DRBD, which was replaced by rsync - I believe this is because following any comms failure the entire mail store had to be resynced. Backups run over the net to tape and take around 24 hours to complete.
A small number of users are on an Exchange server instead of Cyrus. They will not be moving. User authentication runs over LDAP and there is an attribute in LDAP which identifies whether the user is a Cyrus user or an Exchange user, so that Exim knows which mail store to send their mail to, and Webmail knows whether to redirect them to Horde or to a Microsoft Outlook Web server.
It is time for a refresh which needs to take place seamlessly, and in short order (complete roll-out in the next couple of months). We need to add a few extras into the equation...
It is corporate policy to move all storage to a NetApp filer which is replicated using frequent snap-mirrors to a second site over a shared 1Gb link. (Due to possible bandwidth issues, the two filers do not update synchronously, but the backup NetApp should be no more than a couple of minutes behind, and this much loss of data would be tolerated in the event of a disaster recovery deployment.)
NFS is preferred over iSCSI, due to file recovery and disk space utilisation on the NetApp.
The two servers (or two clusters, if we go that way) will be sited one at each site. In the event of a data centre failure, we need to have quick and effective fail over to the other site (manual intervention is acceptable). It is possible that the redundant link between the sites could fail, leading to the servers losing touch with each other but both still running.
We have user communities at both sites. Currently they both talk to the single Cyrus server at "HQ".
Clustered servers would be preferred so we can do rolling upgrades by removing individual machines for OS patches etc. We have layer 4 load balancers available.
Our preferred corporate platform is Suse Linux Enterprise Server 9 running on Intel hardware.
Cyrus generally is seen as a very competent solution, and greatly preferred to the UW Imap server it replaced (this may be to do with the NFS servers UW used). Reasons for leaving Cyrus are (1) NFS and (2) replication - although I understand the Cyrus 2.3 tree has some good support for keeping multiple servers loosely synchronised over a WAN.
I am very nervous about comments on this list concerning NFS lock-ups. This system has to be bullet-proof 24/7. I would consider SolarisX86 (or possibly FreeBSD) if the NFS implementation is robust out-of-the-box. Management would like the warm feeling that a vendor-supported operating system would give them (so Suse and Sun are preferred).
My gut feeling is that I would like to split the users into two communities, with half on each NetApp, and with the two NetApps mirroring to each other. In practice users will work from both sites (and remotely) but each one has a "home" site in terms of their home directory, etc. At each site, I'd like 2 identical Dovecot boxes. I'll call this a 2 x 2 solution.
All users (Exchange users excepted) have the same address wired into their e-mail client for IMAP/SSL and SMTP/SSL, so there would have to be some magic to ensure that the user ended up talking to a Dovecot server which could see the appropriate NetApp. I don't think the load balancers are clever enough to be able to do this. I think I've read it's possible for an IMAP server to hand a user off to a different IMAP server, but can Dovecot do this and is there client support. Or should I just proxy users who hit the wrong server. Or should I just put everyone on the same NetApp and use 4 servers? I'll call this a 4 x 1 solution.
If we lost a site with a live NetApp, I would expect the surviving site to mount the latest snap-mirror and serve it. In the case we are running 2 x 2 it would become 1 x 2. If we were running 4 x 1 it would become 2 x 1 which is arguably more robust.
Does anyone have any comments on any of this. If it were your site, what would you be doing? What kit would you use? Which operating system? How will it play with our load balancers. 4 x 1 or 2 x 2? Would anyone else in UK academia like to compare notes?
Many thanks, Jonathan.