[Dovecot] Multiple locations, 2 servers - planning questions...
Hello all/Timo,
Up until now, my main Clients office has consisted of a single location, and I have never had to deal with the situation of multiple locations for a single company.
They have just told me that they are acquiring an additional floor at a building that is about 4 minutes away - but obviously far enough away that I now have to deal with supporting users in the same domain but at two disparate physical locations.
These two locations will be connected via a private Gb ethernet connection, and each location will have its own internet connection (I think - still waiting on some numbers to present to the owner to see what he wants to do in that regard, but that will be my recommendation), so bandwidth for replication won't be an issue.
I have a couple of months to get this done, and I am already planning on hiring Timo's new commercial support company to help with the final and actual design and implementation, but obviously first I need to know what my actual options are.
Just a rough idea of what I'd like to do is:
Set up one dovecot server at each location (these will be VMs), so users at each location are accessing the local server for that office
Full replication between the two for the mail/indexes, and configure them such that each can act as a failover for the other in case one goes down for whatever reason
This is my first/main question...
I recall that 'dsync based replication' is actually on the map for 2.1, but, since apparently dsync can't do this now, Timo, do you have even a rough idea how much work this would be to get it working for only 2 locations (assuming it *may* be easier to get the initial support for only 2 locations, my client may be willing to pay for it if it isn't a huge amount - feel free to reply privately to this question), then you could revisit it later to make it more scalable? Or, if it is going to take more work than my client is willing to pay for (I'm hoping not, since you said it was on the map for 2.1, not 2.2+), maybe the notify plugin could be leveraged in some way to provide something 'close enough' until it is fully implemented in dsync?
On that note (something 'close enough' until dsync fully supports this natively), would setting up a dsync cron job, say, every 5 or 10 minutes, be asking for trouble? Our mail server is not all that busy, really, so in 5 or 10 minutes, there wouldn't be many changes at all.
If that is not recommended, although I want to avoid the hassles of NFS if at all possible, maybe there is another shared filesystem that will work ok - or... since I will be forcing users to a single server always anyway, maybe NFS or some other shared filesystem is really the best option here, and just let it take care of the syncing?
and
- Configure things such that each offices users are directed to the local server for that office, but connections will fail-over to the remote server in the case of one of them going down for whatever reason?
I'm fairly sure that some combination of Dovecot Proxy/Director will accomplish this, but one concern is - for internal users, my understanding is it will redirect them via the private IP, but that would result in lots of traffic across the Gb connection between the two locations, and I'd like to eliminate that if possible - so how will this work when they are accessing it from outside the office, where each office has its own public IP? I'd prefer to not rely on users using the correct hostname (currently, we just use 'mail.example.com', and I know I could set up two new ones - office1.example.com and office2.example.com - but then I'd be relying on the users to get it right, and I'd prefer to avoid that can of worms). I guess a worst case scenario (if there is no better way) would be to do it that way, then watch the logs for users who get it wrong and are using the inter-office connection, and deal with them on a case by case basis.
Thanks to any/all for reading this far and for any thoughts, suggestions and/or ideas...
--
Best regards,
Charles
On 27.02.2012 17:54, Charles Marcus wrote:
These two locations will be connected via a private Gb ethernet connection, and each location will have its own internet connection (I think - still waiting on some numbers to present to the owner to see what he wants to do in that regard, but that will be my recommendation), so bandwidth for replication won't be an issue. [cut]
I do have a basic question... How many users will be in this new, remote location? Will the traffic be so vast, that 1GbE link will not be enough, or are you using two servers for reliability?
The simpler the configuration, it is almost always the better. Maybe you can stay with one server in yours primary location?
-- Adam Szpakowski
On Mon, Feb 27, 2012 at 06:59:14PM +0100, Adam Szpakowski wrote:
On 27.02.2012 17:54, Charles Marcus wrote:
These two locations will be connected via a private Gb ethernet connection, and each location will have its own internet connection (I think - still waiting on some numbers to present to the owner to see what he wants to do in that regard, but that will be my recommendation), so bandwidth for replication won't be an issue. [cut]
I do have a basic question... How many users will be in this new, remote location? Will the traffic be so vast, that 1GbE link will not be enough, or are you using two servers for reliability?
The simpler the configuration, it is almost always the better. Maybe you can stay with one server in yours primary location?
This was exactly my thought as reading it.
If you have some control over client configuration, use "offline IMAP," where clients maintain a local copy of what's on the server. (That's a good idea anyway, distributed backups of mail which possibly is important.)
http://rob0.nodns4.us/ -- system administration and consulting Offlist GMX mail is seen only if "/dev/rob0" is in the Subject:
On 2012-02-27 1:12 PM, /dev/rob0 <rob0@gmx.co.uk> wrote:
If you have some control over client configuration, use "offline IMAP," where clients maintain a local copy of what's on the server. (That's a good idea anyway, distributed backups of mail which possibly is important.)
Most of these users have many multiple Gb of email, so keeping local cached copies of all it if is silly, in my opinion... and again, the *main* purpose for the two separate servers is for high availability (redundancy/failover)...
--
Best regards,
Charles Marcus I.T. Director Media Brokers International, Inc. 678.514.6200 x224 | 678.514.6299 fax
On 2012-02-27 12:59 PM, Adam Szpakowski <as@3a.pl> wrote:
I do have a basic question... How many users will be in this new, remote location? Will the traffic be so vast, that 1GbE link will not be enough, or are you using two servers for reliability?
Yeah, I guess I should have mentioned this...
Each location is an entire floor of a 6 story building. The remote location has the capacity for about 60 users, the new location about 100. We only allow IMAP access to email, so if everyone is using email at the same time, that would be a lot of traffic over a single Gb link I think...
The simpler the configuration, it is almost always the better. Maybe you can stay with one server in yours primary location?
I had considered this, but like I said, this is not purely for performance - I'd also like to get redundancy in the deal (up until now, we haven't had any - admittedly - knock on wood - we haven't needed it, but I'd still like to implement it).
--
Best regards,
Charles
Charles Marcus <CMarcus@media-brokers.com> wrote:
On 2012-02-27 12:59 PM, Adam Szpakowski <as@3a.pl> wrote:
I do have a basic question... How many users will be in this new, remote location? Will the traffic be so vast, that 1GbE link will not be enough, or are you using two servers for reliability?
Yeah, I guess I should have mentioned this...
Each location is an entire floor of a 6 story building. The remote location has the capacity for about 60 users, the new location about 100. We only allow IMAP access to email, so if everyone is using email at the same time, that would be a lot of traffic over a single Gb link I think...
Naa, most clients download mails only once and then keep them cached locally (at least Thunderbird and Outlook do).
Looking at the used bandwidth of the mailserver of my small university (10.000 users, about 1000 concurrently active during the daytime) shows a steady amount of roughly 5MBit/s with peaks to 10MBit/s in and out.
Remember: your outgoing bandwidth will be roughly the bandwidth of mail going into the server.
Grüße, Sven.
-- Sigmentation fault. Core dumped.
On 2012-02-27 1:34 PM, Sven Hartge <sven@svenhartge.de> wrote:
Charles Marcus<CMarcus@media-brokers.com> wrote:
Each location is an entire floor of a 6 story building. The remote location has the capacity for about 60 users, the new location about 100. We only allow IMAP access to email, so if everyone is using email at the same time, that would be a lot of traffic over a single Gb link I think...
Naa, most clients download mails only once and then keep them cached locally (at least Thunderbird and Outlook do).
Looking at the used bandwidth of the mailserver of my small university (10.000 users, about 1000 concurrently active during the daytime) shows a steady amount of roughly 5MBit/s with peaks to 10MBit/s in and out.
Interesting - thanks for the numbers...
But, again, my main reason for 2 servers is not performance, it is for redundancy...
--
Best regards,
Charles
On Mon, 27 Feb 2012 13:38:39 -0500, Charles Marcus <CMarcus@Media-Brokers.com> wrote:
On 2012-02-27 1:34 PM, Sven Hartge <sven@svenhartge.de> wrote:
Charles Marcus<CMarcus@media-brokers.com> wrote:
Each location is an entire floor of a 6 story building. The remote location has the capacity for about 60 users, the new location about 100. We only allow IMAP access to email, so if everyone is using email at the same time, that would be a lot of traffic over a single Gb link I think...
Naa, most clients download mails only once and then keep them cached locally (at least Thunderbird and Outlook do).
Looking at the used bandwidth of the mailserver of my small university (10.000 users, about 1000 concurrently active during the daytime) shows a steady amount of roughly 5MBit/s with peaks to 10MBit/s in and out.
Interesting - thanks for the numbers...
But, again, my main reason for 2 servers is not performance, it is for redundancy...
I too have been tasked with multisite redundancy, and have been experimenting with GlusterFS (http://www.gluster.org/community/documentation/index.php/Main_Page), which is a distributed file system. In our network we have a dedicated 10GB link between two datacenters 100 miles apart, and I have a GlusterFS node at each site setup in Distriubted Replicated mode with 2 replicas which means the servers are mirrored. The file writes are done to all the replica servers (2 servers in this case), so depending on network latency the writes could potentially slow down. GlusterFS has it's own file serving protocol that allows automatic and immediate failover in the case that a storage node disappears, but there are some caveats to restoring a failed storage node (takes forever to resync the data). I have not put this experiment into production, but I can say that it's extremely simple to manage, and performance testing has shown that it could handle mail traffic just fine. You could also look at GPFS (http://www-03.ibm.com/systems/software/gpfs/), which is not open source but it's apparently rock solid and I believe supports multisite clustering.
On Mon, Feb 27, 2012 at 02:51:54PM -0600, list@airstreamcomm.net wrote:
You could also look at GPFS (http://www-03.ibm.com/systems/software/gpfs/), which is not open source but it's apparently rock solid and I believe supports multisite clustering.
GPFS supports different modes of clustering. I think the appropriate solution here would be to deploy a single cluster spanning 3 sites (3. site is needed for quorum node, two sites can't work because you can't protect it from split brain). The simplest config would then be 3 nodes (but you could have any number of nodes at each site):
quorum node1 on site1 with a local disk (or local SAN-disk) as Network Shared Disk (NSD)
quorum node2 on site2 with a local disk (or local SAN-disk) as Network Shared Disk (NSD)
quorum node3 on site3
The filesystem would be replicated (over IP) between the disk on site1 and site2. Should one site go down, the other site would survive as long as it could still see the quorum node on site3. After a site has been down, one would need to sync up the NSDs (mmrestripefs) to re-establish the replication of any blocks that has been changed while it was down.
-jf
On 27.02.2012 19:21, Charles Marcus wrote:
On 2012-02-27 12:59 PM, Adam Szpakowski <as@3a.pl> wrote:
I do have a basic question... How many users will be in this new, remote location? Will the traffic be so vast, that 1GbE link will not be enough, or are you using two servers for reliability?
Yeah, I guess I should have mentioned this...
Each location is an entire floor of a 6 story building. The remote location has the capacity for about 60 users, the new location about 100. We only allow IMAP access to email, so if everyone is using email at the same time, that would be a lot of traffic over a single Gb link I think... I'm not sure that the bandwidth will be a problem. One of our clients is a civic design office. Around 60 people and lots of multi megabyte files in multiple copies. Autocad 3D files are flying all around ;). All accounts are IMAP ones, there is also local SAMBA service. The server has 1GbE connection to almost all workstations and the bandwidth utilization is low, very low, on average much less then 1%. We are talking about over 50 heavy duty users.
The simpler the configuration, it is almost always the better. Maybe you can stay with one server in yours primary location?
I had considered this, but like I said, this is not purely for performance - I'd also like to get redundancy in the deal (up until now, we haven't had any - admittedly - knock on wood - we haven't needed it, but I'd still like to implement it). IMHO use something simple for redundancy such us DRBD in active/passive mode in single location. Manual migration to do not have to deal with split brain problems. As a additional layer of security against local cataclysm (fire in the building) use nightly backup to the second office. You will not have automatic, 99.999% reliability, but for most clients it is ok. They do not need this. The market for highly available, redundant services is quite small.
-- Adam Szpakowski
On 27.2.2012, at 18.54, Charles Marcus wrote:
I recall that 'dsync based replication' is actually on the map for 2.1, but, since apparently dsync can't do this now, Timo, do you have even a rough idea how much work this would be to get it working for only 2 locations (assuming it *may* be easier to get the initial support for only 2 locations, my client may be willing to pay for it if it isn't a huge amount - feel free to reply privately to this question), then you could revisit it later to make it more scalable?
I'll initially build it for only 2 locations, but I think it will be pretty simple to scale to more than 2.
If that is not recommended, although I want to avoid the hassles of NFS if at all possible, maybe there is another shared filesystem that will work ok - or... since I will be forcing users to a single server always anyway, maybe NFS or some other shared filesystem is really the best option here, and just let it take care of the syncing?
Synchronous drbd replication for a master/slave setup should also work, since the latency between your servers is probably quite low. I wouldn't use asynchronous replication since it can lose some of the last changes when failure happens.
Then there are of course all the cluster filesystems, but I don't have much experience with them other than what I've read in this list. I think GPFS is the only one I haven't read any complains of (but it could be also that so few people have actually used it).
- Configure things such that each offices users are directed to the local server for that office, but connections will fail-over to the remote server in the case of one of them going down for whatever reason?
With a clusterfs setup you could do this. With dsync-replicated setup you could assign a primary location for the user, and proxy the connection there if user got connected to wrong server, except when the primary server is down.
I'm fairly sure that some combination of Dovecot Proxy/Director will accomplish this, but one concern is - for internal users, my understanding is it will redirect them via the private IP, but that would result in lots of traffic across the Gb connection between the two locations, and I'd like to eliminate that if possible - so how will this work when they are accessing it from outside the office, where each office has its own public IP? I'd prefer to not rely on users using the correct hostname (currently, we just use 'mail.example.com', and I know I could set up two new ones - office1.example.com and office2.example.com - but then I'd be relying on the users to get it right, and I'd prefer to avoid that can of worms). I guess a worst case scenario (if there is no better way) would be to do it that way, then watch the logs for users who get it wrong and are using the inter-office connection, and deal with them on a case by case basis.
Like other mentioned, I don't think the cross-office traffic will be that much of a problem, especially for external connections from outside office. For internal connections you should be able to mostly avoid it.
On 2/27/2012 10:54 AM, Charles Marcus wrote:
These two locations will be connected via a private Gb ethernet connection, and each location will have its own internet connection (I think - still waiting on some numbers to present to the owner to see what he wants to do in that regard, but that will be my recommendation), so bandwidth for replication won't be an issue.
Say you're a boutique mail services provider or some such. In your own datacenter you have a Dovecot server w/64 processors, 512GB RAM, and 4 dual port 8Gb fiber channel cards. It's connected via 8 redundant fiber channel links to 4 SAN array units, each housing 120 x15k SAS drives, 480 drives total, ~140,000 random IOPs. This gear eats 36U of a 40U rack, and about $400,000 USD out of your wallet. In the remaining 4U at the top of the rack you have a router, with two GbE links connected to the server, and an OC-12 SONET fiber link (~$15k-20k USD/month) to a national ISP backbone. Not many years ago OC-12s comprised the backbone links of the net. OC-48s handle that today. Today OC-12s are most often used to link midsized ISPs to national ISPs, act as the internal backbone of midsized ISPs, and link large ISPs' remote facilities to the backbone.
Q: How many concurrent IMAP clients could you serve with this setup before hitting a bottleneck at any point in the architecture? What is the first bottleneck you'd run into?
The correct answer to this question, and the subsequent discussion that will surely take place, may open your eyes a bit, and prompt you to rethink some of your assumptions that went into the architectural decisions you've presented here.
-- Stan
On 2012-02-29 9:15 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
Q: How many concurrent IMAP clients could you serve with this setup before hitting a bottleneck at any point in the architecture?
No idea how to calculate it...
What is the first bottleneck you'd run into?
Unless this is a trick question, the OC-12 link (since it is only 644Mb), and the next bottleneck would be the 2 GbE server connections to the router (are these bonded? if so, what mode?...
The correct answer to this question, and the subsequent discussion that will surely take place, may open your eyes a bit, and prompt you to rethink some of your assumptions that went into the architectural decisions you've presented here.
Since the vast majority of our connections will be *local*, I'm unconcerned about the internet connect speeds (one office has a 100/10Mb Cable (Comcast Business Class) connection, the other will have a 100/100Mb fiber/ethernet connection).
My main priority is that the user experience at each physical location be optimal, which is why I'm more focused on making sure each offices users are connected to only the local server for all services (file/print/mail).
I also neglected to mention how each server would be physically connected to the network, which I guess I should have done, since I'm fairly sure that will be the bottleneck I should mostly be concerned about...
My choices are, as I see it, single GbE connections, or add some multiport GbE cards (these Dells support up to 3 PCIe cards) and bond some ports together for each VM. 10GbE is simply not in our price range (and I don't think we need it anyway), although I did stumble on these while googling and am waiting on pricing, since they claim to be 'much cheaper':
http://www.mellanox.com/ethernet/
Since neither the multi-port GbE cards or decent switches that have reliable support for bonding/teaming are really not that expensive (especially when comparing to 10GbE solutions), I don't really see any reason *not* to do this (at a minimum I'd get redundancy if one of the ports on the server failed), but I'm also not sure which mode would be best - round-robin or IEEE 802.3ad dynamic link aggregation?
Obviously, I don't have the experience or expertise to answer these questions myself (never analyzed IMAP traffic to have an idea of the bandwidth each user uses, and probably wouldn't trust my efforts if I made the attempt). Hopefully, there are some people here who have a rough idea, which is why I brought this question up here.
Oh - and I am/will be working with a local I.T. services company to help with the design and implementation (since obviously I don't have the experience to do this myself), and will be asking them these same questions, I just like to usually know the general answers to questions like this ahead of time, so that I know if the guys I'm hiring know what they are doing and are giving me the best options for my budget.
Thanks for your thoughts...
--
Best regards,
Charles
On 3/1/2012 5:43 AM, Charles Marcus wrote:
On 2012-02-29 9:15 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
Q: How many concurrent IMAP clients could you serve with this setup before hitting a bottleneck at any point in the architecture?
No idea how to calculate it...
The correct answer is approximately 120,000 concurrent users, based on an assumed average of ~3MB-5MB of ram consumed in all processes for each user.
What is the first bottleneck you'd run into?
Unless this is a trick question, the OC-12 link (since it is only 644Mb), and the next bottleneck would be the 2 GbE server connections to the router (are these bonded? if so, what mode?...
It was a bit of a trick question, with a somewhat elaborate setup, designed to shift your focus/thinking. Apparently I failed in my effort here. The correct answer is that RAM will be the first bottleneck. Then disk IOPS, finally followed by the OC-12 assuming we beef up the others.
Since the vast majority of our connections will be *local*, I'm unconcerned about the internet connect speeds (one office has a 100/10Mb Cable (Comcast Business Class) connection, the other will have a 100/100Mb fiber/ethernet connection).
You didn't grasp why I used the OC-12 in my example. It had nothing to do with LAN/WAN, local or remote, but the total users/traffic a 600Mb/s link can carry.
My main priority is that the user experience at each physical location be optimal, which is why I'm more focused on making sure each offices users are connected to only the local server for all services (file/print/mail).
A single MAN (Metropolitan Area Network) 1000BASE-LX link, good for 5km, likely what you will have, is more than sufficient to carry the 2nd office site traffic while keeping all of your servers/etc where they are now.
My choices are, as I see it, single GbE connections, or add some multiport GbE cards (these Dells support up to 3 PCIe cards) and bond some ports together for each VM. 10GbE is simply not in our price range (and I don't think we need it anyway), although I did stumble on these while googling and am waiting on pricing, since they claim to be 'much cheaper':
With specs like that you must be supporting 100,000 users. ;)
Obviously, I don't have the experience or expertise to answer these questions myself (never analyzed IMAP traffic to have an idea of the bandwidth each user uses, and probably wouldn't trust my efforts if I made the attempt). Hopefully, there are some people here who have a rough idea, which is why I brought this question up here.
Your company/employer has less than 250 users IIRC. Is this right? You're a media company that works with files much larger than the average company. Is that correct? Let's cut to the chase shall we?
Your 1000BASE-LX MAN link has an after link overhead bandwidth of just over 100MB/s full duplex. To put this into real world perspective, you can copy a single 4.7GB DVD in 47 seconds, or 1 in each direction in the same time, 2 total, 9.4GB total. You can copy 20 full DVDs over this link, 10 in each direction, in less than 8 minutes. Add heavy IMAP traffic for 500 concurrent users and it's still less than 10 minutes and the IMAP users won't have a clue if the switch VLAN QOS is setup correctly.
You see GbE as mundane, slow, because it has been ubiquitous for some time, being a freebie on both servers and desktops. This is why I used the OC-12 example at $15K/month, hoping you'd start to grasp that cost has little direct relationship to performance. GbE is "free" now because the cost of the silicon to drive a 1000MHz signal over 300 meters of copper wire is no longer higher than for 100BASE-T.
Here's another comparison. All internet backbone links are OC-48 at 2.5Gb/s. It takes only 2.5 GbE links to equal a backbone link. Backbone links carry the traffic of *millions* of users, all applications, all data stream types. And that's *only* 250MB/s.
So, the point is, a single 1000BASE-LX MAN link is far more than plenty to carry all of the traffic you'll throw at it, and quite a bit more, with some minor QOS configuration. Consider how much money, time, and duplication of services and servers you are going to save now that you realize you need nothing other than the 1000BASE-LX MAN link, and closet switches at the second office site?
Get yourself a qualified network architect. Pay for a full network traffic analysis. He'll attach sniffers at multiple points in your network to gather traffic/error/etc data. Then you'll discuss the new office, which employees/types with move there, and you'll be able to know almost precisely the average and peak bandwidth needs over the MAN link. He'll very likely tell you the same thing I have, that a single gigabit MAN link is plenty. If you hire him to do the work, he'll program the proper QOS setup to match the traffic patterns gleaned from the sniffers.
-- Stan
On 2012-03-01 8:38 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
Get yourself a qualified network architect. Pay for a full network traffic analysis. He'll attach sniffers at multiple points in your network to gather traffic/error/etc data. Then you'll discuss the new office, which employees/types with move there, and you'll be able to know almost precisely the average and peak bandwidth needs over the MAN link. He'll very likely tell you the same thing I have, that a single gigabit MAN link is plenty. If you hire him to do the work, he'll program the proper QOS setup to match the traffic patterns gleaned from the sniffers.
Finally had time to properly review your answers here Stan.
The time you took for the in-depth reply is very much appreciated - and I'm sure you got a kick out of the level of my ignorance... ;)
As for hiring a network architect, I will absolutely be doing as you recommend (was already planning on it), but with the information I'm now armed with, at least I'll have a better chance of knowing if they know what they are doing/talking about...
I'm still planning for the two physical servers (one at each location), but you have convinced me that trying to run two live mail systems is an unnecessary and even unwanted level of complexity. The DC VM will still be hot (it is always best to have two DCs in a windows domain environment anyway) so I'll get automatic real time off site backup of all of the users data (since it will all be on DFS), but for the mail services, I'll just designate one as live, and one as the hot/standby that is kept in sync using dsync. This way I'll automatically get off site back up for each site for the users data stored in the DFS, and have a second mail system ready to go if something happens to the primary.
Again, thanks Stan... I am constantly amazed at the level of expertise and quality of advice available *for free* in the open source world, as is available on these lists.
--
Best regards,
Charles
On 3/15/2012 5:51 AM, Charles Marcus wrote:
On 2012-03-01 8:38 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
Get yourself a qualified network architect. Pay for a full network traffic analysis. He'll attach sniffers at multiple points in your network to gather traffic/error/etc data. Then you'll discuss the new office, which employees/types with move there, and you'll be able to know almost precisely the average and peak bandwidth needs over the MAN link. He'll very likely tell you the same thing I have, that a single gigabit MAN link is plenty. If you hire him to do the work, he'll program the proper QOS setup to match the traffic patterns gleaned from the sniffers.
Finally had time to properly review your answers here Stan.
The time you took for the in-depth reply is very much appreciated - and
Multi-site setups can be tricky as they often temp folks to do unnecessary things they otherwise would not. Just trying to help keep your sails pointed in the right direction. :) #1 rule when building a multi-site network: only duplicate hardware and services at the remote site(s) when absolutely necessary.
I'm sure you got a kick out of the level of my ignorance... ;)
Not at all. I'm sure there is some subject or another where you would demonstrate my ignorance. From another perspective, if there was no ignorance left on the planet then there would be nothing left for anyone to learn. That would make for a boring world.
As for hiring a network architect, I will absolutely be doing as you recommend (was already planning on it), but with the information I'm now armed with, at least I'll have a better chance of knowing if they know what they are doing/talking about...
Now that you are aware of network analysis using sniffers, allow me to throw you a curve ball. For a network of your size, less than 70 users IIRC, with a typical application mix but with SMB/NFS traffic/file sizes a little above 'average', a qualified engineer probably won't need to plug sniffers into your network to determine the size MAN pipe and what traffic shaping you'll need. He'll have already done a near identical setup dozens of times. The good news is this saves you a few grand. Analysis with sniffers ain't cheap, even for small networks. And sniffers are normally only deployed to identify the cause of network problems, not very often for architectural or capacity planning. But, asking him about doing a full analysis using sniffers, and hearing his response, may lead to a valuable discussion nonetheless.
Have your MAN and internet providers' (if not the same company) pricing sheet(s) in hand when you meet with the engineer. Depending on fast ethernet MAN, GbE MAN, and internet pipe pricing, he may have some compelling options/recommendations for you, possibly quite different, less costly, and more redundant than what you have been considering up to this point.
I'm still planning for the two physical servers (one at each location),
Again, if you don't _need_ hardware and services at the 2nd site to achieve the current service level at the primary site, do not add these things to the 2nd site. I really want to put a bunch of exclamation points here but I hate exclamation points in technical emails--actually I just hate them, period. ;)
but you have convinced me that trying to run two live mail systems is an unnecessary and even unwanted level of complexity.
Running an active/active Dovecot cluster doesn't guarantee an unnecessary nor unwanted additional complexity. The need for clustering should go through a justification process just like anything else: what's the benefit, total 'cost', what's the ROI, etc. Lots of people here do active/active clustering every day with great success. Connecting the cluster nodes over a MAN link, however, does introduce unnecessary complexity. Locating one node in another building many blocks away is unnecessary. Putting the nodes in the same rack/room is smart, and easily accomplished in your environment, gives you the redundancy above, but without the potentially problematic MAN link as the cluster interconnect. Granted you'll need to build two new (preferably identical) systems from scratch and setup shared storage (DRBD or a SAN array) and GFS2 or OCFS, etc. Given your environment, there are only two valid reasons for locating equipment and duplicating data and services at a remote site:
- Unrecoverable network failure (due to single MAN link)
- Unrecoverable primary site failure (natural or man made disaster)
#1 is taken care of by redundant MAN links #2 you've never planned for to this date (probability is *low*) and you need _everything_ duplicated at the remote site
Duplicating servers for high(er) user throughput/lower latency to/from servers isn't a valid reason for remote site duplication in your case because you are able to afford plenty of bandwidth and link redundancy between the sites. The relative low cost and high bandwidth of the MAN link outweighs any benefit of service replication due to the latter's complexity level.
Here are some other 'rules':
- Don't duplicate servers at remote sites to mitigate network link failure when sites are close and redundant bandwidth is afforadable
- Do duplicate network links to mitigate link failure when sites are close and bandwidth is affordable
- Implement and test a true disaster avoidance and recovery plan
The DC VM will still be hot (it is always best to have two DCs in a windows domain environment anyway) so I'll get automatic real time off site backup of all of the users data (since it will all be on DFS), but for the mail services, I'll just designate one as live, and one as the hot/standby that is kept in sync using dsync. This way I'll automatically get off site back up for each site for the users data stored in the DFS, and have a second mail system ready to go if something happens to the primary.
Again, you're not looking at this network design from the proper perspective. See rules 1-3 above.
Off site backups/replication are used exclusively to mitigate data loss due to catastrophic facility failure, not server failure, enabling rapid system recovery when new equipment has arrived. Many business insurers have catastrophic IT equipment replacement plans and relationships with the big 5 hardware vendors, enabling you to get new new equipment racked and begin your restore from offsite tape, within as little as 24 hours of notification.
Think of how FEMA stages emergency supplies all around the country. Now think 10 times better, faster. Such services increase your premiums, but if you're serious about disaster avoidance and recovery, this is the only way to go. IBM, HP, maybe Dell, Sun (used to anyway), have dedicated account reps for disaster recovery. They work with you to keep an inventory of all of your systems and storage. Your records are constantly updated when your products are EOL'd or superseded or you replace or add hardware, and a list is maintained of current hardware best matched to replace all of your now burned, flooded, tornado shredded, hurricane blasted equipment, right down to bare metal restore capability, if possible/applicable.
You plan to replicate filesystem user data and mailbox data to a 2nd site to mitigate single server failures. Why does that need to be done to an offsite location/system? It doesn't. There is no benefit whatsoever. You can accomplish this in the same rack/room and get by with a smaller MAN pipe saving time, money, and administrative burden. The restore procedure will be faster if all machines are in the same rack/room and you're using tape, and you won't slow users down with restore traffic going over the MAN link.
If you really want off-site backup, for what it's meant to accomplish, get a network attached tape library/silo, or a speedy high cap LTO-4/5 tape drive in each server, put a real backup rotation and restore plan in place, and store backup tapes in a secure facility. A remote "hot site" is great when it's in a different city, better yet region, or in a hardened facility in any locale. Your hot site is only a few blocks away. If your primary site it taken out by anything other than fire, such as a tornado, earthquake, hurricane being more likely in your case, chances are your hot site may go down soon after the primary. If you want/need a real off site backup solution, rotate tapes to an everything-proof facility. Here are 3 companies in the Atlanta area that offer media rotation storage services. Watch the Offsite Tape Vaulting video at IronMountain:
http://www.ironmountain.com/Knowledge-Center/Reference-Library/View-by-Docum...
http://www.askads.net/media-rotation/ http://www.adamsdatamanagement.com/tape-rotation-atlanta-ga.htm
Again, thanks Stan... I am constantly amazed at the level of expertise and quality of advice available *for free* in the open source world, as is available on these lists.
Always glad to assist my brethren in this digital kingdom. Whichever architecture/topology you choose, remote replicated systems or not, I hope my input has given you some good information on which to base your decisions.
-- Stan
On 3/1/2012 5:43 AM, Charles Marcus wrote:
Obviously, I don't have the experience or expertise to answer these questions myself (never analyzed IMAP traffic to have an idea of the bandwidth each user uses, and probably wouldn't trust my efforts if I made the attempt). Hopefully, there are some people here who have a rough idea, which is why I brought this question up here.
Expanding on my previous statements, and hopefully answering some questions here, or at least getting in the ballpark, lets see what a single GbE link is capable of.
Let's assume an average transfer size of SMTP/IMAP email including headers is roughly 4096 bytes, or 32768 bits.
TCP over GbE after all framing and protocol overhead = 992,697,000 bits/sec maximum bandwidth with jumbo frames = 941,482,000 bits/sec max without jumbo frames
We'll go without jumbo frames in our example. Every GbE interface on one router segment must support jumbo or you can't enable it. If you do, interfaces that don't do jumbo will have bad to horrible performance, or maybe not work at all. Many workstation NICs don't do jumbo frames as well as many commercial routers.
Typical IMAP command payload is absolutely tiny, so we'll concentrate on response traffic. Theoretical steady state IMAP server to client 4KB message transfer rates:
= 28,731 msgs/sec
= 1,723,905 msgs/minute
= 103,434,301 msgs/hour
= 2,482,423,242 msgs/day
General file transfer bandwidth, 5MB JPG:
= 22 files/sec
= 1,346 files/minute
= 80,808 files/hour
= 1,939,393 files/day
General file transfer bandwidth, 100MB TIFF:
= 1 files/sec
= 67 files/minute
= 4,040 files/hour
= 96,969 files/day
General file transfer bandwidth, 500MB video file:
= 1 files 4.5 seconds
= 10 files 44.6 seconds
= 100 files 7.4 minutes
As you can see, a single GbE interface has serious capacity and will probably easily carry your inter-site traffic without needing duplicate servers at the second site. You mentioned putting multiple GbE interfaces on your servers. Very, very few servers *need* 900+ Mb/s of bandwidth, however having two links is good for redundancy. So I'd not worry about the aggregation performance, only the proper and seamless failover functionality.
I obviously haven't seen your workflows Charles, but I recall you do a lot of media work. By 'you' I mean Media Brokers. So obviously your users will be hitting the network harder than average office workers. I'm taking that into account.
My gut instinct, based on experience and the match, is that a single GbE inter site MAN link will be plenty, without the need to duplicate server infrastructure. Again, have a qualified network architect sniff your current network traffic patterns, and discuss with you the anticipated user traffic at the 2nd site to determine your average and peak inter-site b/w needs. The average will absolutely be much less than 1Gb/s, but the peak may be well above 1Gb/s. You can still avoid the myriad problems/costs of server duplication without incurring significant additional link costs. There are a couple of options that should be available to you:
A second fiber pair and GbE link You might negotiate a burst contract. You pay a flat monthly rate for a base bit rate of X and pay extra for bursts. Burst contract availability will depend on the provider's network topology. If at any point they're aggregating multiple customer's traffic on a single trunk fiber pair a burst contract should be available. Burst contract allow them to oversubscribe their trunks, just as ISPs and broadband providers do. Your network architect should be able to assist you in figuring out what you'd want for your base and peak bit rates for such a contract. Why pay for 1000Mb/s from 8pm to 6am if you're only using 20Kb/s?
Add a second GbE link on a different transceiver wavelength using a prism on each end to transmit both links on one fiber pair. This is typically cheaper when the provider has limited fiber runs in a given area or to a given building. You may or may not be able to save money with a burst contract in this scenario. Talk to your provider and find out what your options are. Wait until your architect has finished your network analysis before speaking to the provider.
Treat this link as a traditional WAN link. Do NOT treat it as simply another switch segment. Put an IP router on each side of the GbE MAN link and create a separate IP subnet for hosts and devices in the new office. By doing this you keep broadcast traffic from traversing the link. This includes things like ARP discovery, DHCP, NTP broadcast, and most importantly: broadcast traffic from disk imaging software. If you don't make this an IP routed link, network disk imaging traffic will traverse the MAN link just as it traverses your entire switched LAN. This could be anywhere from 25-80MB/s (200-640Mb/s) of broadcast traffic. You obviously don't want this clogging the link. You *might* be able to eliminate broadcast traffic using special VLAN configurations on sufficiently advanced layer2-7 "switch routers", but it's cheaper and fool proof when done with standard IP routers. Again, chat with your architect.
With this being a routed connection, and broadcast traffic being eliminated, any services that rely on broadcast traffic will need to be duplicated or tweaked accordingly. You will need a DHCP server in the new office. The router should be able to serve DHCP, unless you're currently serving some custom scope it can't handle. If you rely on broadcast for WINS, or have any other Microsoft services that rely on broadcast, you will need to address those. If you currently use NTP broadcast for time updates you'll need another NTP server in the new office. Again, the router should be able to broadcast NTP updates. The solutions to these things have been around forever, so I'm not going to go into all of them, but you need to be aware. You'll need to discuss these things with your network architect or a qualified Microsoft consultant. If you run no MS servers and don't use broadcast, then no need to worry about. And hooray for you, no MS! :)
This may be of interest given the topic. At a previous $dayjob a few years back, we ran the traffic of about 580 desktops/wireless laptops through a single GbE uplink into an 11 blade server farm backed by a small fiber channel SAN. Blade-blade IP traffic was through a dedicated 14x6 port GbE switch module, so things like vmotion, backups, etc worked at full boogie. But the uplink from the switch module in the BladeCenter to the Cisco 5000 core switch was a single copper GbE uplink. All user traffic flowed over this link. We never had performance issues. We'd configured QOS to keep the IP phones happy but that's about it for traffic shaping. Before I left I jacked in a 2nd GbE uplink for redundancy and configured Cisco's link aggregation protocol. We didn't notice a performance difference. I could have aggregated 6 GbE uplinks. One did the job, two gave resiliency, more would have just wasted ports on the core switch.
Hope you find this educational/informational/useful Charles, and maybe others.
-- Stan
Thanks very much for taking the time for your detailed reply, Stan, but I'll need more time to study it...
On 2012-03-02 4:17 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote: <snip>
My gut instinct, based on experience and the match, is that a single GbE inter site MAN link will be plenty, without the need to duplicate server infrastructure.
I just wanted to point out one thing - I have two primary goals - yes, one is to maximize performance, but the other is accomplish a level of *redundancy*...
Also - I already have the servers (I have 3 Poweredge 2970's available to me, only one of which is currently being used)...
So, the only extra expenses involved will be relatively minor hardware expenses (multi-port Gb NICs), and some consulting services for making sure I implement the VM environment (including the routing) correctly.
So, honestly, we'd be incurring most of these expenses anyway, even if we didn't set up redundant servers, so I figure why not get redundancy too (now is the time to get the boss to pay for it)...
--
Best regards,
Charles
So, the only extra expenses involved will be relatively minor hardware expenses (multi-port Gb NICs), and some consulting services for making sure I implement the VM environment (including the routing) correctly.
Take into account costs of administering a more complex environment too.
-- pozdrawiam Piotr Szafarczyk
On 3/3/2012 2:20 PM, Charles Marcus wrote:
Thanks very much for taking the time for your detailed reply, Stan, but I'll need more time to study it...
On 2012-03-02 4:17 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote: <snip>
My gut instinct, based on experience and the match, is that a single GbE inter site MAN link will be plenty, without the need to duplicate server infrastructure.
I just wanted to point out one thing - I have two primary goals - yes, one is to maximize performance, but the other is accomplish a level of *redundancy*...
What type of redundancy are you looking for? I.e. is one reason for duplicating servers at site #2 to avoid disruption in the event the MAN link fails? Do you currently have redundant GbE links to each closet switch stack in site #1, and also redundant switches in the datacenter? I.e. do you skip a beat if a core or closet switch fails?
If you do not currently have, nor plan to create such network redundancy internally at site #1, then why build application redundancy with the single goal of mitigating failure of a single network link? Do you have reason to believe there is a higher probability of failure of the MAN link than any other single link in the current network?
Also - I already have the servers (I have 3 Poweredge 2970's available to me, only one of which is currently being used)...
So, the only extra expenses involved will be relatively minor hardware expenses (multi-port Gb NICs), and some consulting services for making sure I implement the VM environment (including the routing) correctly.
Again, you don't need multi-port GbE NICs or bonding for performance--a single GbE link is all each server needs. Your switches should be able to demonstrate that, without even needing a sniffer, assuming they're decent managed units. If you're after link redundancy, use two single port NICs per server, or one mobo mounted port and once single port NIC. Most dual port NICs duplicate the PHYs but not the ethernet chip nor power circuits, etc. Thus, when a dual port NIC fails you usually loose both ports.
So, honestly, we'd be incurring most of these expenses anyway, even if we didn't set up redundant servers, so I figure why not get redundancy too (now is the time to get the boss to pay for it)...
Don't forget power backup at site #2. Probably not a huge cost in the overall scheme of things, but it's still another $5000 or so.
In summary, my advice is:
One 1000Mb MAN link is plenty of bandwidth for all users at site #2 including running internet traffic through site #1, saving the cost of an internet pipe at site #2
If truly concerned about link failure, get a backup 100Mb/s link, or get two GbE links with a burst contract, depending on price
Keep your servers in one place. If you actually desire application level redundancy (IMAP, SMB/CIFS, etc) unrelated to a network link failure, then do your clustering etc "within the rack". It will be much easier to manage and troubleshoot this than two datacenters w/ all kinds of replication etc between them
If site #1 is not already link redundant, it makes little sense to make a big redundancy push to cover a possible single network link failure, regardless of which link
Building a 2nd datacenter and using the MAN link for data replication gives you no performance advantage, and may actually increase overall utilization, vs using the link as a regular trunk
*Setup QOS appropriately to maintain low latency of IMAP and other priority data, giving a back seat to SMB/CIFS/FTP/HTTP and other bulk transfer protocols* With proper QOS the single GbE MAN link will simply scream for everyone, regardless of saturation level
-- Stan
participants (9)
-
/dev/rob0
-
Adam Szpakowski
-
Charles Marcus
-
Jan-Frode Myklebust
-
list@airstreamcomm.net
-
Piotr NetExpert
-
Stan Hoeppner
-
Sven Hartge
-
Timo Sirainen