On 4/7/2012 9:43 AM, Emmanuel Noobadmin wrote:
On 4/7/12, Stan Hoeppner stan@hardwarefreak.com wrote:
Firstly, thanks for the comprehensive reply. :)
I'll assume "networked storage nodes" means NFS, not FC/iSCSI SAN, in which case you'd have said "SAN".
I haven't decided on that but it would either be NFS or iSCSI over Gigabit. I don't exactly get a big budget for this. iSCSI because I planned to do md/mpath over two separate switches so that if one switch explodes, the email service would still work.
So it seems you have two courses of action:
Identify individual current choke points and add individual systems and storage to eliminate those choke points.
Analyze your entire workflow and all systems, identifying all choke points, then design a completely new well integrated storage architecture that solves all current problems and addresses future needs
Adding an NFS server and moving infrequently (old) accessed files to alternate storage will alleviate your space problems. But it will probably not fix some of the other problems you mention, such as servers bogging down and becoming unresponsive, as that's not a space issue. The cause of that would likely be an IOPS issue, meaning you don't have enough storage spindles to service requests in a timely manner.
Less complexity and cost is always better. CPU throughput isn't a factor in mail workloads--it's all about IO latency. A 1U NFS server with 12 drive JBOD is faster, cheaper, easier to setup and manage, sucks less juice and dissipates less heat than 4 1U servers each w/ 4 drives.
My worry is that if that one server dies, everything is dead. With at least a pair of servers, I could keep it running, or if necessary, restore the accounts on the dead servers from backup, make some config changes and have everything back running while waiting for replacement hardware.
You are a perfect candidate for VMware ESX. The HA feature will do exactly what you want. If one physical node in the cluster dies, HA automatically restarts the dead VMs on other nodes, transparently. Clients will will have to reestablish connections, but everything else will pretty much be intact. Worse case scenario will possibly be a few corrupted mailboxes that were being written when the hardware crashed.
A SAN is required for such a setup. I had extensive experience with ESX and HA about 5 years ago and it works as advertised. After 5 years it can only have improved. It's not "cheap" but usually pays for itself due to being able to consolidate the workload of dozens of physical servers into just 2 or 3 boxes.
I don't recall seeing your user load or IOPS requirements so I'm making some educated guesses WRT your required performance and total storage.
I'm embarrassed to admit I don't have hard numbers on the user load except the rapidly dwindling disk space count and the fact when the web-based mail application try to list and check disk quota, it can bring the servers to a crawl.
Maybe just starting with a description of your current hardware setup and number of total users/mailboxes would be a good starting point. How many servers do you have, what storage is connected to each, percent of MUA POP/IMAP connections from user PCs versus those from webmail applications, etc, etc.
Probably the single most important piece of information would be the hardware specs of your current Dovecot server, CPUs/RAM/disk array, etc, and what version of Dovecot you're running.
The focus of your email is building a storage server strictly to offload old mail and free up space on the Dovecot server. From the sound of things, this may not be sufficient to solve all your problems.
My lame excuse is that I'm just the web dev who got caught holding the server admin potato.
Baptism by fire. Ouch. What doesn't kill you makes you stronger. ;)
is nearly irrelevant for a mail workload, you can see it's much cheaper to scale capacity and IOPS with a single node w/fat storage than with skinny nodes w/thin storage. Ok, so here's the baseline config I threw together:
One of my concern is that heavy IO on the same server slow the overall performance even though the theoretical IOPS of the total drives are the same on 1 and on X servers. Right now, the servers are usually screeching to a halt, to the point of even locking out SSH access due to IOWait sending the load in top to triple digits.
If multiple servers are screeching to a halt due to iowait, either all of your servers individual disks are overloaded, or you already have shared storage. We really need more info on your current architecture. Right now we don't knw if we're talking about 4 servers or 40., 100 users or 10,000.
Some host failure redundancy is about all you'd gain from the farm setup. Dovecot shouldn't barf due to one NFS node being down, only hiccup. I.e. only imap process accessing files on the downed node would have trouble.
But if I only have one big storage node and that went down, Dovecot would barf wouldn't it? Or would the mdbox format mean Dovecot would still use the local storage, just that users can't access the offloaded messages?
If the big storage node is strictly alt storage, and it dies, Dovecot will still access its main mdbox storage just fine. It simply wouldn't be able to access the alt storage and would log errors for those requests.
If you design a whole new architecture from scratch, going with ESX and an iSCSI SAN this whole line of thinking is moot as there is no SPOF. This can be done with as little as two physical servers and one iSCSI SAN array which has dual redundant controllers in the base config. Depending on your actual IOPS needs, you could possibly consolidate everything you have now into two physical servers and one iSCSI SAN array, for between $30-40K USD in hardware and $8-10K in ESX licenses. That just a guess on ESX as I don't know the current pricing. Even if it's that "high" it's far more than worth the price due to the capability.
Such a setup allows you to run all of your Exim, webmail, Dovecot, etc servers on two machines, and you usually get much better performance than with individual boxes, especially if you manually place the VMs on the nodes for lowest network latency. For instance, if you place your webmail server VM on the same host as the Dovecot VM, TCP packet latency drops from the high micro/low milliscond range into the mid nanosecond range--a 1000x decrease in latency. Why? The packet transfer is now a memory-to-memory copy through the hypervisor. The packets never reach a physical network interface. You can do some of these things with free Linux hypervisors, but AFAIK the poor management interfaces for them make the price of ESX seem like a bargain.
Also, I could possibly arrange them in a sort of network raid 1 to gain redundancy over single machine failure.
Now you're sounding like Charles Marcus, but worse. ;) Stay where you are, and brush your hair away from your forehead. I'm coming over with my branding iron that says "K.I.S.S"
Lol, I have no idea who Charles is, but I always feel safer if there was some kind of backup. Especially since I don't have the time to dedicate myself to server administration, by the time I notice something is bad, it might be too late for anything but the backup.
Search the list archives for Charles' thread about bringing up a 2nd office site. His desire was/is to duplicate machines at the 2nd site for redundancy, when the proper thing to do is duplicate them at the primary site, and simply duplicate the network links between sites. My point to you and Charles is that you never add complexity for the sake of adding complexity.
Of course management and clients don't agree with me since backup/redundancy costs money. :)
So does gasoline, but even as the price has more than doubled in 3 years in the States, people haven't stopped buying it. Why? They have to have it. The case is the same for certain levels of redundancy. You simply have to have it. You job is properly explaining that need. Ask the CEO/CFO how much money the company will lose in productivity if nobody has email for 1 workday, as that is how long it will take to rebuild it from scratch and restore all the data when it fails. Then ask what the cost is if all the email is completely lost because they were to cheap to pay for a backup solution?
To executives, money in the bank is like the family jewels in their trousers. Kicking the family jewels and generating that level of pain seriously gets their attention. Likewise, if a failed server plus rebuild/restore costs $50K in lost productivity, spending $20K on a solution to prevent that from happening is a good investment. Explain it in terms execs understand. Have industry data to back your position. There plenty of it available.
-- Stan