[Dovecot] Highly Performance and Availability

Stan Hoeppner stan at hardwarefreak.com
Wed Feb 17 09:02:54 EET 2010

Wayne Thursby put forth on 2/16/2010 9:42 AM:

> I was planning on using EqualLogic because the devices seem competent,
> and we already have an account with Dell. Also, being on VMware's HCL is
> important as we have a support contract with them.

Using the standard Gbe ports on the servers won't work, for two basic reasons:

1.  This would require using the ESX iSCSI initiator which isn't up to the task,
as it sucks too many CPU cycles under intense disk workloads, stealing those
cycles from the guests and their applications, which, coincidentally are causing
the big disk I/O workload.

2.  Gbe iSCSI has a maximum raw signaling rate of 125MB/s, 100MB/s after TCP
overhead.  This is less than a single 15K rpm SAS disk.  And you'll have 14 of
those in the array.  That spells Bottleneck with a CAPS B, a 14:1 bottleneck.
It's just not suitable for anything but low demand file transfers or very low
transaction databases.

Here's good news.  I just looked, and most of Nexsan's SAN arrays are now VMware
Ready certified, including all the ones I talk about here:


> Here's where I think you misunderstood me. I have no SAN at the moment.
> I'm running a monolithic Postfix/Dovecot virtual machine on an ESXi host
> that is comprised of a Dell 2950 directly attached via SAS to a Dell
> MD-1000 disk array. We have no Fiber Channel anything, so going that
> route would require purchasing a full compliment of cards and switches.

Yes, I did misunderstand.  My apologies.  The way you worded you previous post
led me to believe your organization had a small SAN being used for other things,
and that you were consolidating some other applications to that SAN storage and
were thinking of moving some of this VMware stuff onto it.  I'm clear now that
this isn't the case.

I did, however, fully understand what your current ESXi SMTP/IMAP server
platform is and what you want to achieve moving forward.

>>> Is it as expensive as running my primary mailserver mounted from the SAN
>>> via Fiber Channel? Will that get me under 30ms latency?

Without actually testing the iSCSI solution I can't state the latency.  But,
there is no doubt latency is going to be an order of magnitude higher with Gbe
iSCSI than with 4Gb FC especially under high load.  Make that 2-3 orders of
magnitude higher if using software initiators.  I can tell you that round trip
latency of an FC block request from HBA through Qlogic switch to Nexsan array
and back will be less than 10ms, and over 90% of that latency is the disk head
reads, which you'll obviously have with any SAN.  The magic is the low overhead
of FC.  With 1Gbe iSCSI, half or more of the total latency will be in the
ethernet network and TCP processing.

>> I'm not sure what you mean by "expensive" in this context.
> Simply that purchasing FC cards and switches adds to the cost, wheras we
> already have GbE for iSCSI.

As I stated above, 1Gbe ethernet with a software initiator is woefully
inadequate for your needs.  Using 1Gbe iSCSI HBAs would help slightly, 10-20%
maybe, but you're still shackled with a maximum 100MB/s data rate.  Again,
that's slower than a single 15K SAS drive.  That's not enough bandwidth for your
workload, if I understand it correctly.

>> I ran an entire 500 user environment, all systems, all applications, on two
>> relatively low end FC SAN boxen, and you're concerned about the performance of a
>> single mail SMTP/IMAP server over a SAN?  I don't think you need to worry about
>> performance, as long as all is setup correctly.  ;)
> I hope that is correct, thank you for sharing your experiences. I
> inherited a mail system that had capable hardware but was crippled by
> bad sysadmin-ing, so I'm trying to make sure I'm going down the right
> path here.

You're welcome.  There is no "hope" involved.  It's just fact.  These Nexsan
controllers with big cache and fast disks can easily pump 50K random IOPs to
cache and 2,500+ through to disk.  They really are beasts.  You would have to
put 5-10X your current workload, including full body searches, through one of
these Nexsan units before you'd come close to seeing any lag due to controller
or disk bottle necking.

> My main concern is when Dovecot tries to run a body search on an inbox
> with 14,000 emails in it, that the rest of the users don't experience
> any performance degradation. This works beautifully in my current setup,
> however the MD-1000 is not supported by VMWare, doesn't do vMotion, etc,
> etc. It sounds like I have nothing to worry about if I go with Fiber
> Channel, any idea about iSCSI?

Like I said, you'd have to go with 10Gbe iSCSI with HBAs and a 10Gbe switch to
meet your needs.  1Gbe sotware initiator iSCSI will probably fall over with your
workload, and your users will very likely see latency effects.  And, as I said,
due to this fact your costs will be far greater than the FC solution I've outlined.

> My current disk layout is as follows:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda1             9.5G  4.2G  4.8G  47% /
> /dev/sdb1             199G  134G   55G  71% /var/vmail
> /dev/sdc1              20G   13G  6.8G  65% /var/sdc1
> /dev/sdd1            1012M   20M  941M   3% /var/spool/postfix
> /dev/sda1 is a regular VMWare disk. The other three are independent
> persistent disks so that I can snapshot/restore the VM without
> destroying the queue or stored email.

It's been a while since I worked with the VMware ESX GUI.  Suffice it to say
that each LUN you expose on the Nexsan will appear to ESX as a big SCSI disk,
which you can use as VMFS to store guests, or you can assign it as a raw LUN
("raw device mapping" I think was official VMware jargon) to a particular guest.
 You've probably got more ESX experience at this point than I do.  At the very
least your experience is fresh, and mine is stale, back in the 3.0 days.  I
recall back in the day there were a couple of "gotchas", where if you chose one
type of configuration for a LUN (disk) then you couldn't use some of the
advanced backup/snapshot features.  There were some trade offs one had to make.
 Man, it's been so long lol.  Read the best practices and all the VMware info
you can find on using fiber channel SANs with ESX.  Avoid any gotchas WRT HA and

> You certainly clarified a number of things for me by detailing your past
> setup. I suppose I should clarify exactly what the current plan is.
> We are migrating a number of other services to some kind of an HA setup
> using VMWare and vMotion, that much has been decided. My primary
> decision centers around choosing either iSCSI or Fiber Channel. We have
> *no* Fiber Channel infrastructure at the moment, so this would add
> significantly to the price of our setup (at least 2 cards + switch).

Nah, they're cheap, I'd say maybe $4K total.  Lets see...



Get 4 SFP LC transceivers (always have a spare)
You'll populate 3 switch ports with these, plugging ESX servers into two of them
and FC port 0 on the Nexsan into the other.  With these products you'll have end
to end 4 Gb/s links, 800 MB/s total throughput per switch link--400MB/s full
duplex per link.

So, lets see how close my guesstimate was:

1 x QLogic SANbox 3810, 8 x 8/4/2 Gb/s FC switch     $1,880
2 x QLogic SANblade QLE2440 - host bus adapter       $  790
4 x IBM 4Gbps SW SFP Transceiver                     $  140
Total:                                               $4,020

Yep, about $4K.  I under estimated by $20, but then again, CDW isn't the
cheapest vendor by far, but I used them as an example as I knew they carried all
this stuff.  They carry all the Nexsan arrays as well, but unfortunately, just
like everyone else, for SAN products in this price range you have to call to get
a quote.  Get yourself quotes from CDW and SANDirect.com on these standard
factory configurations:

Nexsan SASBoy, 2 FC, 2 iSCSI, 2GB cache, 14 x 300GB 15K SAS drives
Nexsan SATABoy, 2 FC, 2 iSCSI, 1GB cache, 14 x 500GB 7.2K SATA drives

The first will give you more performance than you can imagine, and will allow
for 10 years of performance growth, though at 4.2 raw TB, you may run out of
space before 10 years.  Depends on if you store digital xrays etc on it.  These
arrays would really shine in this application BTW.  Nexsans have won multiple
performance awards for their streaming, although their random I/O is fantastic
as well.

> The other applications we are virtualizing are nowhere near as disk i/o
> intensive as our email server, so I feel confident that an iSCSI SAN
> would meet all performance requirements for everything *except* the
> email server.

One key point that you are failing to realize is that the advanced storage and
backup features of ESX itself demands high bandwidth low latency access to the
SAN storage arrays.  Snapshots, backup, etc.  VMware snapshots will fill FC
links to capacity until completed, unless you lower their priority (not sure it
that's possible).  Anyway, if you want/need to use any of ESX's advanced
capabilities, 1Gbe iSCSI isn't going to cut it.  We had 2Gb FC, and some
operations I performed had to be done at night or one weekends because they
filled the SAN pipes.  You may run into that even with 4Gb FC.  And if you do,
you can pat yourself on the back for going FC, as 1Gbe iSCSI would take over 4
times as long to complete the same storage operation. :)

> I'm really looking for a way to get some kind of redundancy/failover for
> Postfix/Dovecot using just iSCSI, but without killing the performance
> I'm experiencing using direct attached storage, but it sounds like
> you're saying I need FC.

To maintain the level of I/O performance you currently have, but in a SAN
environment which allows VMware magic, you will require either an FC SAN or
10Gbe iSCSI SAN.  The 10Gbe iSCSI solution will probably be almost twice the
total $price, will be more difficult to setup and troubleshoot, and will have no
more, and likely less, total performance than the 4Gb FC solution.

> Well, I've got the rest of my virtual infrastructure/SAN already figured
> out, so my questions are centering around providing redundancy for
> Dovecot/maildirs. I think you've answered all of my hardware questions
> (ya' freak). It really seems like Fiber Channel is the way to go if I
> want to have HA maildirs.

It's not just maildirs you're making HA but the entire Linux guest server, or
all your VM guests if you want.  All ESX servers connected to shared SAN storage
can start and run any VM guest in the environment residing on those SAN LUNs and
can access any raw device mappings (raw LUNs) associated with a VM.  This is
also what makes vmotion possible.  It's incredible technology really.  Once you
start getting a good grasp on what VMware ESX, Vmotion, HA, Snapshots, etc can
really do for you, you'll start buying additional machines and ESX licenses, and
you'll end up consolidating every possible standalone server you have onto
VMware.  The single largest overriding reason for this is single point backup
and disaster recovery.

With consolidated backup, and a large enough tape library system, it's possible
to do a complete nightly backup of your entire VMware environment including all
data on the SAN array(s), and rotate the entire set of tapes off site for
catastrophic event recovery, for things such as fire, earthquake, flood, etc.
In the immediate aftermath, you can acquire one big machine with appropriate
HBAs, an identical SAN array, switch, tape library, etc, and restore the entire
system in less than 24 hours, bringing up only critical VMs until you're able to
get more new machines in and setup.  The beauty of ESX is that there is nothing
to restore onto all the ESX hosts. All you do is a fresh install of ESX and
configure it to see the SAN LUNs.  You can have a copy of the ESX host
configuration files sitting on the SAN, and thus in the DR backup.

Normally this is done in a temporary data center colocation facility with
internet access so at minimum principals within the organization (CEO, CFO, VPs,
etc) can get access to critical information to start rebuilding the
organization.  This is all basic business continuity 101 stuff, so I won't go
into more detail.  The key point is that with ESX, an FC SAN, a tape library and
consolidated backup, the time to get an organization back up and running after a
catastrophe is cut from possibly weeks to a couple of days, most of that time
being spent working with insurance folk and waiting on the emergency replacement
hardware to arrive.

There is no replacement for off site tape but a hot/standby remote datacenter,
and most can't afford that.  Thus, one needs a high performance high capacity
tape library/silo.  Doing consolidated backup of one's VMware environment
requires fast access to the storage.  1Gbe iSCSI is not even close to
appropriate for this purpose.  Case in point:  you have 4Gb of VMs and data LUNs
on your array.  If you can get 100% of the iSCSI Gbe bandwidth for consolidated
backup--which you can't because the VMs are going to be live at the time, and
you can't get 100% out of Gbe anyway due to TCP--it'll would take 11 hours to
backup that 4TB as it all has to come off the array via 1Gbe iSCSI.  If you have
4Gb FC it would cut that time x 4, that 11 hours becoming a little under 3
hours.  An 11 hour backup window is business disruptive, and makes it difficult
to properly manage an off site backup procedure (which everyone should have).

> I just don't know if I can justify the extra cost of a FC infrastructure
> just because a single service would benefit, especially if there's a
> hybrid solution possible, or if iSCSI was sufficient, thus my questions
> for the list.

I covered the costs above, and again, FC beats iSCSI all around the block and on
Sunday, unless Equallogic has dropped their prices considerably since 2006.  If
by hybrid you mean a SAN array that supports both 4Gb FC and 1Gbe iSCSI, all of
the Nexsan units fit that bill with two 4Gb FC and two 1Gbe iSCSI ports per

Sorry this is so freak'n long.  Hardware is my passion, and I guess verbosity is
my disease.  Hope the info helps in one way or another.  If nothing else it may
put you to sleep faster than a boring book. ;)


More information about the dovecot mailing list