[Dovecot] Providing shared folders with multiple backend servers

Tue Jan 10 04:19:22 EET 2012

On 1/9/2012 7:48 AM, Sven Hartge wrote:

> It seems my initial idea was not so bad after all ;) 

Yeah, but you didn't know how "not so bad" it really was until you had
me analyze it, flesh it out, and confirm it.  ;)

> Now I "just" need o
> built a little test setup, put some dummy users on it and see, if
> anything bad happens while accessing the shared folders and how the
> reaction of the system is, should the shared folder server be down.

It won't be down.  Because instead of using NFS you're going to use GFS2
for the shared folder LUN so each user accesses the shared folders
locally just as they do their mailbox.  Pat yourself on the back Sven,
you just eliminated a SPOF. ;)

>> How many total cores per VMware node (all sockets)?
> 
> 8

Fairly beefy.  Dual socket quad core Xeons I'd guess.

> Here the memory statistics an 14:30 o'clock:
> 
>              total       used       free     shared    buffers     cached
> Mem:         12046      11199        847          0         88       7926
> -/+ buffers/cache:       3185       8861
> Swap:         5718         10       5707

That doesn't look too bad.  How many IMAP user connections at that time?
 Is that a high average or low for that day?  The RAM numbers in
isolation only paint a partial picture...

> The SAN has plenty space. Over 70TiB at this time, with another 70TiB
> having just arrived and waiting to be connected.

140TB of 15k storage.  Wow, you're so under privileged. ;)

> The iSCSI storage nodes (HP P4500) use 600GB SAS6 at 15k rpm with 12
> disks per node, configured in 2 RAID5 sets with 6 disks each.
> 
> But this is internal to each storage node, which are kind of a blackbox
> and have to be treated as such.

I cringe every time I hear 'black box'...

> The HP P4500 is a but unique, since it does not consist of a head node
> which storage arrays connected to it, but of individual storage nodes
> forming a self balancing iSCSI cluster. (The nodes consist of DL320s G2.)

The 'black box' is Lefthand Networks SAN/iQ software stack.  I wasn't
that impressed with it when I read about it 8 or so years ago.  IIRC,
load balancing across cluster nodes is accomplished by resending host
packets from a receiving node to another node after performing special
sauce calculations regarding cluster load.  Hence the need, apparently,
for a full power, hot running, multi-core x86 CPU instead of an embedded
low power/wattage type CPU such as MIPS, PPC, i960 descended IOP3xx, or
even the Atom if they must stick with x86 binaries.  If this choice was
merely due to economy of scale of their server boards, they could have
gone with a single socket board instead of the dual, which would have
saved money.  So this choice of a dual socket Xeon board wasn't strictly
based on cost or ease of manufacture.

Many/most purpose built SAN arrays on the market don't use full power
x86 chips, but embedded RISC chips, to cut cost, power draw, and heat
generation.  These RISC chips are typically in order designs, don't have
branch prediction or register renaming logic circuits and they have tiny
caches.  This is because block moving code handles streams of data and
doesn't typically branch nor have many conditionals.  For streaming
apps, data caches simply get in the way, although an instruction cache
is beneficial.  HP's choice of full power CPUs that have such features
suggests branching conditional code is used.  Which makes sense when
running algorithms that attempt to calculate the least busy node.

Thus, this 'least busy node' calculation and packet shipping adds non
trivial latency to host SCSI IO command completion, compared to
traditional FC/iSCSI SAN arrays, or DAS, and thus has implications for
high IOPS workloads and especially those making heavy use of FSYNC, such
as SMTP and IMAP servers.  FSYNC performance may not be an issue if the
controller instantly acks FSYNC before data hits platter, but then you
may run into bigger problems as you have no guarantee data hit the disk.
 Or, you may not run into perceptible performance issues at all given
the number of P4500s you have and the proportionally light IO load of
your 10K mail users.  Sheer horsepower alone may prove sufficient.

Just in case, it may prove beneficial to fire up ImapTest or some other
synthetic mail workload generator to see if array response times are
acceptable under heavy mail loads.

> So far, I had no performance or other problems with this setup and it
> scales quite nice, as you <marketing> buy as you grow </marketing>.

I'm glad the Lefthand units are working well for you so far.  Are you
hitting the arrays with any high random IOPS workloads as of yet?

> And again, price was also a factor, deploying a FC-SAN would have cost
> us more than thrice the amount than the amount the deployment of an iSCSI
> solution did, because the latter is "just" ethernet, while the former
> would have needed a lot more totally new components.

I guess that depends on the features you need, such as PIT backups,
remote replication, etc.  I expanded a small FC SAN about 5 years ago
for the same cost as an iSCSI array, simply due to the fact that the
least expensive _quality_ unit with a good reputation happened to have
both iSCSI and FC ports included.  It was a 1U 8x500GB Nexsan Satablade,
their smallest unit (since discontinued).  Ran about $8K USD IIRC.
Nexsan continues to offer excellent products.

For anyone interested in high density high performance FC+iSCSI SAN
arrays at a midrange price, add Nexsan to your vendor research list:
http://www.nexsan.com

> No, at that time (2005/2006) nobody thought of a SAN. That is a fairly
> "new" idea here, first implemented for the VMware cluster in 2008.

You must have slower adoption on that side of the pond.  As I just
mentioned, I was expanding an already existing small FC SAN in 2006 that
had been in place since 2004 IIRC.  And this was at a small private 6-12
school with enrollment of about 500.  iSCSI SANs took off like a rocket
in the States around 06/07, in tandem with VMware ESX going viral here.

> More space. The IMAP usage became more prominent which caused a steep
> rise in space needed on the mail storage server. But 74GiB SCA drives
> where expensive and 130GiB SCA drives where not available at that time.

With 144TB of HP Lefthand 15K SAS drives it appears you're no longer
having trouble funding storage purchases. ;)

>>> And this is why I kind of hold this upgrade back until dovecot 2.1 is
>>> released, as it has some optimizations here.
> 
>> Sounds like it's going to be a bit more than an 'upgrade'. ;)
> 
> Well, yes. It is more a re-implementation than an upgrade.

It actually sounds like fun.  To me anyway.  ;)  I love this stuff.

> Central IT here this days only uses x86-based systems. There where some Sun
> SPARC systems, but both have been decomissioned. New SPARC hardware is
> just too expensive for our scale. And if you want to use virtualization,
> you can either use only SPARC systems and partition them or use x86
> based systems. And then there is the need to virtualize Windows, so x86
> is the only option.

Definitely a trend for a while now.

> Most bigger Universities in Germany make nearly exclusive use of SPARC
> systems, but they had a central IT with big irons (IBM, HP, etc.) since
> back in the 1960's, so naturally the continue on that path.

Siemens/Fujitsu machines or SUN machines?  I've been under the
impression that Fujitsu sold more SPARC boxen in Europe, or at least
Germany, than SUN did, due to the Siemens partnership.  I could be wrong
here.

-- 
Stan