[Dovecot] Configuration advices for a 50000 mailboxes server(s)

Stan Hoeppner stan at hardwarefreak.com
Sat Apr 21 03:22:05 EEST 2012


On 4/19/2012 4:40 AM, Stan Hoeppner wrote:
> On 4/17/2012 8:01 AM, Frank Bonnet wrote:
> 
>> have 4000/6000 imaps concurent connections during working hours .

>>>> for approx 50K "intensives" users.
>>>>
>>>> The only mandatory thing will be I must use HP proliant servers
>>>>
>>>> The operating system will be FreeBSD or Linux

> I just made the wishlist public so it should be available tomorrow or
> Friday.  I'll provide the link when it's available.

And here it is:
http://secure.newegg.com/WishList/PublicWishDetail.aspx?WishListNumber=16797311

Since your requirement is for an HP solution, following is an HP server
and storage system solution of roughly identical performance and
redundancy to the SuperMicro based system I detailed.  The HP system
solution is $44,263, almost double the cost at $20,000 more.  Due to the
stupidity of Newegg requiring all wish lists to be reviewed before going
live, I'll simply provide the links to all the products.

Yes boys and girls, Newegg isn't just consumer products.  They carry
nearly the entire line of HP Proliant servers and storage, including the
4-way 48-core Opteron DL585 G7 w/64GB, the P2000 fiber channel array,
and much more.  In this case they sell every product needed to assemble
this complete mail server solution:

 1x http://www.newegg.com/Product/Product.aspx?Item=N82E16859105807
 8x http://www.newegg.com/Product/Product.aspx?Item=N82E16820326150
 3x http://www.newegg.com/Product/Product.aspx?Item=N82E16816401143
80x http://www.newegg.com/Product/Product.aspx?Item=N82E16822332061
 3x http://www.newegg.com/Product/Product.aspx?Item=N82E16816118109
 3x http://www.newegg.com/Product/Product.aspx?Item=N82E16816118163
 2x http://www.newegg.com/Product/Product.aspx?Item=N82E16816133048
 2x http://www.newegg.com/Product/Product.aspx?Item=N82E16833106050

The 9280-8e RAID controllers are identical to 9261-8i boards but have 2
external vs internal x4 6Gb SAS ports.  I spec them instead of the Smart
Array boards as they're far cheaper, easier to work with, and offer
equal or superior performance.  Thus everything written below is valid
for this system as well, with the exception that you would configure 1
global hot spare in each chassis since these units have 25 drive bays
instead of 24.  The D2700 units come with 20" 8088 cables.  I an
additional spec'd two 3ft cables to make sure we reach all 3 disk
chassis from the server, thinking the sever would be on top with the 3
disk chassis below.

I hope this and my previous post are helpful in one aspect or another to
Frank and anyone else.  I spent more than a few minutes on these
designs. ;)  Days in fact on the SuperMicro design, only a couple of
hours on the HP.  It wouldn't have taken quite so long if all PCIe slots
were created equal (x8), which they're not, or if modern servers didn't
require 4 different types of DIMMs depending on how many slots you want
to fill and how much expansion capacity you need without having to throw
out all the previous memory, which many folks end up doing out of
ignorance.  Memory configuration is simply too darn complicated with
high cap servers containing 8 channels and 24 slots.

> The key to performance, and yielding a single file tree, is once again
> using XFS to take advantage of this large spindle count across 3 RAID
> controllers.  Unlike previous configurations where I recommended using a
> straight md concatenation of hardware RAID1 pairs, in this case we're
> going to use a concatenation of 6 hardware RAID10 arrays.  There are a
> couple of reasons for doing so in this case:
> 
> 1.  Using 36 device names in a single md command line is less than
> intuitive and possibly error prone.  Using 6 is more manageable.
> 
> 2.  We have 3 BBWC RAID controllers w/24 drives each.  This is a high
> performance server and will see a high IO load in production.  In many
> cases one would use an external filesystem journal, which we could
> easily do and get great performance with our mirrored SSDs.  However,
> the SSDs are not backed by BBWC, so a UPS failure or system crash could
> hose the log journal.  So we'll go with the default internal journal
> which will be backed by the BBWC.
> 
> Going internal with the log in this mail scenario can cause a serious
> amount of extra IOPS on the filesystem data section, this being
> Allocation Group 0.  If we did the "normal" RAID1 concat, all the log IO
> would hit the first RAID1 pair.  On this system, the load may hit that
> spindle pretty hard, making access to mailboxes in AG0 slower than
> others.  With 6 RAID10 arrays in a concat, the internal log writes will
> be striped across 6 spindles in the first array.  With 512MB BBWC
> backing that array and optimizing writeout, and with delaylog, this will
> yield optimal log write performance without slowing down mailbox file
> access in AG0.  To create such a setup we'd do something like this,
> assuming the mobo LSI controller yields sd[ab], and the 6 array devices
> on the PCIe LSI cards yield sd[cdefgh]
> 
> 1.  Create two RAID10 arrays, each of 12 drives, in the WebBIOS GUI of
> each LSI card, using a strip size of 32KB which should yield good random
> r/w performance for any mailbox format.  Use the following policies for
> each array:  RW, Normal, Wback, Direct, Disable, No, and use the full
> size.
> 
> Create the concatenated md device:
> $ mdadm -C /dev/md0 -l linear -n 6 /dev/sd[cdefgh]
> 
> Then we format it with XFS, optimizing the AG layout for our mailbox
> workload, and allocation write stripe alignment to each hardware array:
> $ mkfs.xfs -d agcount=24 su=32k sw=6 /dev/md0
> 
> This yields 4 AGs per RAID10 array which will minimize the traditional
> inode64 head seeking overhead on striped arrays, while still yielding
> fantastic allocation parallelism with 24 AGs.
> 
> Optimal fstab for MTA queue/mailbox workload, assuming kernel 2.6.39+:
> /dev/md0   /mail   xfs   defaults,inode64,nobarrier   0   0
> 
> We disable write barriers as we have BBWC.  And that 1.5GB of BBWC will
> yield extremely low Dovecot write latency and throughput.
> 
> Given the throughput available, if you're running Postfix on this box,
> you will want to create a directory on this filesystem for the Postfix
> spool.  Postfix puts the spool files in many dozens, hundreds of
> subdirectories, so you'll get 100% parallelism across all AGs, thus all
> disks.
> 
> It's very likely none of you will decide to build this system.  My hope
> is that some of the design concepts and components used, along with the
> low cost but high performance of this machine, may be educational or
> simply give people new ideas, steer them in directions they may not have
> previously considered.

-- 
Stan



More information about the dovecot mailing list