[Dovecot] Providing shared folders with multiple backend servers

Stan Hoeppner stan at hardwarefreak.com
Mon Jan 9 14:28:55 EET 2012


On 1/8/2012 9:39 AM, Sven Hartge wrote:

> Memory size. I am a bit hesistant to deploy a VM with 16GB of RAM. My
> cluster nodes each have 48GB, so no problem on this side though.

Shouldn't be a problem if you're going to spread the load over 2 to 4
cluster nodes.  16/2 = 8GB per VM, 16/4 = 4GB per Dovecot VM.  This,
assuming you are able to evenly spread user load.

> And our VMware SAN is iSCSI based, so no way to plug a FC-based storage
> into it.

There are standalone FC-iSCSI bridges but they're marketed to bridge FC
SAN islands over an IP WAN.  Director class SAN switches can connect
anything to anything, just buy the cards you need.  Both of these are
rather pricey.  These wouldn't make sense in your environment.  I'm just
pointing out that it can be done.

> So, this reads like my idea in the first place.
> 
> Only you place all the mails on the NFS server, whereas my idea was to
> just share the shared folders from a central point and keep the normal
> user dirs local to the different nodes, thus reducing network impact for
> the way more common user access.

To be quite honest, after thinking this through a bit, many traditional
advantages of a single shared mail store start to disappear.  Whether
you use NFS or a clusterFS, or 'local' disk (RDMs), all IO goes to the
same array, so the traditional IO load balancing advantage disappears.
The other main advantage, replacing a dead hardware node, simply mapping
the LUNs to the new one and booting it up, also disappears due to
VMware's unique abilities, including vmotion.  Efficient use of storage
isn't an issue as you can just as easily slice off a small LUN to each
of 2/4 Dovecot VMs as a larger one to the NFS VM.

So the only disadvantages I see are with the 'local' disk RDM mailstore
location. 'manual' connection/mailbox/size balancing, all increasing
administrator burden.

> 2.3GHz for most VMware nodes.

How many total cores per VMware node (all sockets)?

> You got the numbers wrong. And I got a word wrong ;)
> 
> Should have read "900GB _of_ 1300GB used".

My bad.  I misunderstood.

> So not much wiggle room left.

And that one is retiring anyway as you state below.  So do you have
plenty of space on your VMware SAN arrays?  If not can you add disks or
do you need another array chassis?

> But modifications to our systems are made, which allow me to
> temp-disable a user, convert and move his mailbox and re-enable him,
> which allows me to move them one at a time from the old system to the
> new one, without losing a mail or disrupting service to long and often.

As it should be.

> This is a Transtec Provigo 610. This is a 24 disk enclosure, 12 disks
> with 150GB (7.200k) each for the main mail storage in RAID6 and another
> 10 disks with 150GB (5.400k) for a backup LUN. I daily rsnapshot my
> /home onto this local backup (20 days of retention), because it is
> easier to restore from than firing up Bacula, which has the long
> retention time of 90 days. But must users need a restore of mails from
> $yesterday or $the_day_before.

And your current iSCSI SAN array(s) backing the VMware farm?  Total
disks?  Is it monolithic, or do you have multiple array chassis from one
or multiple vendors?

> Well, it was either Parallel-SCSI or FC back then, as far as I can
> remember. The price difference between the U320 version and the FC one
> was not so big and I wanted to avoid having to route those big SCSI-U320
> through my racks.

Can't blame you there.  I take it you hadn't built the iSCSI SAN yet at
that point?

> See above, not 1500GB disks, but 150GB ones. RAID6, because I wanted the
> double security. I have been kind of burned by the previous system and I
> tend to get nervous while tinking about data loss in my mail storage,
> because I know my users _will_ give me hell if that happens.

And as it turns out RAID10 wouldn't have provided you enough bytes.

> I never used mbox as an admin. The box before the box before this one
> uses uw-imapd with mbox and I experienced the system as a user and it
> was horriffic. Most users back then never heard of IMAP folders and just
> stored their mails inside of INBOX, which of course got huge. If one of
> those users with a big mbox then deleted mails, it would literally lock
> the box up for everyone, as uw-imapd was copying (for example) a 600MB
> mbox file around to delete one mail.

Yeah, ouch.  IMAP with mbox works pretty well when users are marginally
smart about organizing their mail, or a POP then delete setup.  I'd bet
if that was maildir in that era on that box it would have slowed things
way down as well.  Especially if the filesystem was XFS, which had
horrible, abysmal really, unlink performance until 2.6.35 (2009).

> Of course, this was mostly because of the crappy uw-imapd and secondly
> by some poor design choices in the server itself (underpowered RAID
> controller, to small cache and a RAID5 setup, low RAM in the server).

That's a recipe for disaster.

> So the first thing we did back then, in 2004, was to change to Courier
> and convert from mbox to maildir, which made the mailsystem fly again,
> even on the same hardware, only the disk setup changed to RAID10.

I wonder how much gain you'd have seen if you stuck with RAID5 instead...

> Then we bought new hardware (the one previous to the current one), this
> time with more RAM, better RAID controller, smarter disk setup. We
> outgrew this one really fast and a disk upgrade was not possible; it
> lasted only 2 years.

Did you need more space or more spindles?

> But Courier is showing its age and things like Sieve are only possible
> with great pain, so I want to avoid it.

Don't blame ya.  Lots of people migrate from Courier for Dovecot for
similar reasons.

> And this is why I kind of hold this upgrade back until dovecot 2.1 is
> released, as it has some optimizations here.

Sounds like it's going to be a bit more than an 'upgrade'. ;)

> That is a BPO-kernel. Not-yet Squeeze. I admin over 150 different
> systems here, plus I am the main VMware and SAN admin. So upgrades take
> some time until I grow an extra pair of eyes and arms. ;)

/me nods

> And since I have been planning to re-implement the mailsystem for some
> time now, I held the update to the storage backends back. No use in
> disrupting service for the end user if I'm going to replace the whole
> thing with a new one in the end.

/me nods

> Naa, I have been doing this for too long. While I am perfectly capable
> of building such a server myself, I am now the kind of guy who wants to
> "yell" at a vendor, when their hardware fails.

At your scale it would simply be impractical, and impossible from a time
management standpoint.

> Personal build PCs and servers out of single parts have been nothing
> than a nightmare for me. 

I've had nothing but good luck with "DIY" systems.  My background is
probably a bit different than most though.  Hardware has been in my
blood since I was a teenager in about '86.  I used to design and build
relatively high end custom -48vdc white box servers and SCSI arrays for
telcos back in the day, along with standard 115v servers for SMBs.
Also, note the RHS of my email address. ;)  That is a nickname given to
me about 13 years ago.  I decided to adopt it for my vanity domain.

> And: my cowworkers need to be able to service
> them as well while I am not available and they are not as a hardware
> aficionado as I am.

That's the biggest reason right there.  DIY is only really feasible if
you run your own show, and will likely continue to be running it for a
while.  Or if staff is similarly skilled.  Most IT folks these days
aren't hardware oriented people.

> So "professional" hardware with a 5 to 7 year support contract is the
> way to go for me.

Definitely.

> I have plenty space for 2U systems and already use DL385 G7s, I am not
> fixed on Intel or AMD, I'll gladly use the one which is the most fit for
> a given jobs.

Just out of curiosity do you have any Power or SPARC systems, or all x86?

-- 
Stan



More information about the dovecot mailing list