[Dovecot] Providing shared folders with multiple backend servers

Mon Jan 9 15:48:22 EET 2012

Stan Hoeppner <stan at hardwarefreak.com> wrote:
> On 1/8/2012 9:39 AM, Sven Hartge wrote:

>> Memory size. I am a bit hesistant to deploy a VM with 16GB of RAM. My
>> cluster nodes each have 48GB, so no problem on this side though.

> Shouldn't be a problem if you're going to spread the load over 2 to 4
> cluster nodes.  16/2 = 8GB per VM, 16/4 = 4GB per Dovecot VM.  This,
> assuming you are able to evenly spread user load.

I think I will be able to do that. If I devide my users by using a
hash like MD5 or SHA1 over their username, this should give me an even
distribution.

>> So, this reads like my idea in the first place.
>> 
>> Only you place all the mails on the NFS server, whereas my idea was to
>> just share the shared folders from a central point and keep the normal
>> user dirs local to the different nodes, thus reducing network impact for
>> the way more common user access.

> To be quite honest, after thinking this through a bit, many traditional
> advantages of a single shared mail store start to disappear.  Whether
> you use NFS or a clusterFS, or 'local' disk (RDMs), all IO goes to the
> same array, so the traditional IO load balancing advantage disappears.
> The other main advantage, replacing a dead hardware node, simply mapping
> the LUNs to the new one and booting it up, also disappears due to
> VMware's unique abilities, including vmotion.  Efficient use of storage
> isn't an issue as you can just as easily slice off a small LUN to each
> of 2/4 Dovecot VMs as a larger one to the NFS VM.

Yes. Plus I can much more easily increase a LUNs size, if the need
arises.

> So the only disadvantages I see are with the 'local' disk RDM mailstore
> location. 'manual' connection/mailbox/size balancing, all increasing
> administrator burden.

Well, I don't see size balancing as a problem since I can increase the
size of the disk for a node very easy.

Load should be fairly even, if I distribute the 10,000 users across the
nodes. Even if there is a slight imbalance, the systems should have
enough power to smooth that out.  I could measure the load every user
creates and use that as a distribution key, but I believe this to be a
wee bit over-engineered for my scenario.

Initial placement of a new user will be automatic, during the activation
of the account, so no administrative burden there.

It seems my initial idea was not so bad after all ;) Now I "just" need o
built a little test setup, put some dummy users on it and see, if
anything bad happens while accessing the shared folders and how the
reaction of the system is, should the shared folder server be down.

>> 2.3GHz for most VMware nodes.

> How many total cores per VMware node (all sockets)?

8

>> You got the numbers wrong. And I got a word wrong ;)
>> 
>> Should have read "900GB _of_ 1300GB used".

> My bad.  I misunderstood.

Here the memory statistics an 14:30 o'clock:

             total       used       free     shared    buffers     cached
Mem:         12046      11199        847          0         88       7926
-/+ buffers/cache:       3185       8861
Swap:         5718         10       5707

>> So not much wiggle room left.

> And that one is retiring anyway as you state below.  So do you have
> plenty of space on your VMware SAN arrays?  If not can you add disks
> or do you need another array chassis?

The SAN has plenty space. Over 70TiB at this time, with another 70TiB
having just arrived and waiting to be connected.

>> This is a Transtec Provigo 610. This is a 24 disk enclosure, 12 disks
>> with 150GB (7.200k) each for the main mail storage in RAID6 and
>> another 10 disks with 150GB (5.400k) for a backup LUN. I daily
>> rsnapshot my /home onto this local backup (20 days of retention),
>> because it is easier to restore from than firing up Bacula, which has
>> the long retention time of 90 days. But must users need a restore of
>> mails from $yesterday or $the_day_before.

> And your current iSCSI SAN array(s) backing the VMware farm?  Total
> disks?  Is it monolithic, or do you have multiple array chassis from
> one or multiple vendors?

The iSCSI storage nodes (HP P4500) use 600GB SAS6 at 15k rpm with 12
disks per node, configured in 2 RAID5 sets with 6 disks each.

But this is internal to each storage node, which are kind of a blackbox
and have to be treated as such.

The HP P4500 is a but unique, since it does not consist of a head node
which storage arrays connected to it, but of individual storage nodes
forming a self balancing iSCSI cluster. (The nodes consist of DL320s G2.)

So far, I had no performance or other problems with this setup and it
scales quite nice, as you <marketing> buy as you grow </marketing>.

And again, price was also a factor, deploying a FC-SAN would have cost
us more than thrice the amount than the amount the deployment of an iSCSI
solution did, because the latter is "just" ethernet, while the former
would have needed a lot more totally new components.

>> Well, it was either Parallel-SCSI or FC back then, as far as I can
>> remember. The price difference between the U320 version and the FC one
>> was not so big and I wanted to avoid having to route those big SCSI-U320
>> through my racks.

> Can't blame you there.  I take it you hadn't built the iSCSI SAN yet at
> that point?

No, at that time (2005/2006) nobody thought of a SAN. That is a fairly
"new" idea here, first implemented for the VMware cluster in 2008.

>> Then we bought new hardware (the one previous to the current one),
>> this time with more RAM, better RAID controller, smarter disk setup.
>> We outgrew this one really fast and a disk upgrade was not possible;
>> it lasted only 2 years.

> Did you need more space or more spindles?

More space. The IMAP usage became more prominent which caused a steep
rise in space needed on the mail storage server. But 74GiB SCA drives
where expensive and 130GiB SCA drives where not available at that time.

>> And this is why I kind of hold this upgrade back until dovecot 2.1 is
>> released, as it has some optimizations here.

> Sounds like it's going to be a bit more than an 'upgrade'. ;)

Well, yes. It is more a re-implementation than an upgrade.

>> I have plenty space for 2U systems and already use DL385 G7s, I am
>> not fixed on Intel or AMD, I'll gladly use the one which is the most
>> fit for a given jobs.

> Just out of curiosity do you have any Power or SPARC systems, or all
> x86?

Central IT here this days only uses x86-based systems. There where some Sun
SPARC systems, but both have been decomissioned. New SPARC hardware is
just too expensive for our scale. And if you want to use virtualization,
you can either use only SPARC systems and partition them or use x86
based systems. And then there is the need to virtualize Windows, so x86
is the only option.

Most bigger Universities in Germany make nearly exclusive use of SPARC
systems, but they had a central IT with big irons (IBM, HP, etc.) since
back in the 1960's, so naturally the continue on that path.

Grüße,
Sven.

-- 
Sigmentation fault. Core dumped.