[Dovecot] XFS vs EXT4 for mail storage

Stan Hoeppner stan at hardwarefreak.com
Sun May 5 13:00:57 EEST 2013


On 5/4/2013 9:52 AM, Charles Marcus wrote:
> On 2013-05-03 11:10 PM, Stan Hoeppner <stan at hardwarefreak.com> wrote:
>> On 5/3/2013 9:21 AM, Charles Marcus wrote:
>>> On 2013-05-03 8:34 AM, Stan Hoeppner <stan at hardwarefreak.com> wrote:
>>>> I assume /var will hold user mail dirs.
>>> Yes, in /var/vmail
>>>
>>>> Do /var/ and /snaps reside on the same RAID array, physical disks?
>>> Yes - vmware host is a Dell R515, with ESXi installed to mirrored
>>> internal SATA drives, with 8 drives in RAID 10 for all of the VMs. All
>>> storage is this local storage (no SAN/NAS).
>> Your RAID10 is on a PERC correct?
> 
> Correct... it is a PERC H700 (integrated)

Good.  512MB BBWC, LSI based IIRC.  Should yield good performance with
some margin of safety, though you're still vulnerable to guest fsync
being buffered/ignored.  Just make sure you disable all the individual
drive caches via the H700 BIOS, Dell Linux software management utility
(if there is one), Lifecycle manager, etc.  I don't use Dell gear so I'm
unable to give instructions.  If the Dell RAID HBAs are worth their salt
they'll disable drive caches automatically when you enable the BBWC.
Some HBAs do this, some don't.  Just keep in mind the safety net of BBWC
is defeated if drive caches are enabled.

>> You have four 7.2K SATA stripe spindles.
> 
> Actually, no, I have 6 15k 450G SAS6G hard drives (Seagate Cheetah
> ST3450857SS) in this RAID10 array...

Directly up above you said 8 drives in RAID10. So to make sure we're all
on the same page, you have 6x 450GB 15K SAS drives in RAID10, 3 stripe
spindles, ~1.35TB raw.  That's going to yield a non power of 2 stripe
width, which I always try to avoid, though it's not a show stopper.

>> Do you mind posting the RAID10 strip/chunk size? The RAID geometry can
>> be critical, not just for mail, but your entire VM setup.
> 
> I just used the defaults when I created it (crossing fingers hoping that
> wasn't a huge mistake). 

With VMware, your workloads and user head count, it may make no visible
difference.  As a general rule for small random IO workloads (which
covers most of what you do), smaller strips are better, 32-64KB max.  If
it defaulted to a 512KB or 1MB strip that's bad.  Large strip sizes are
really only beneficial for streaming write workloads.  When you use
large strips with small IO workloads you generally end up sending a
disproportionate amount of writes/reads to each drive in the array, thus
creating hotspots and decreasing the performance advantage of striping.
 I.e. you can end up making one disk work harder while the others sit
idle more of the time.

> But - I'm not sure how to provide the answer to
> the question (is my ignorance showing yet?)...

Fire up whatever tool Dell provides to manage the H700.  You should be
able to view all the current parameters of the controller.

>>   Also, what's your mdbox max file size?
> 
> Haven't settled on that yet. I was thinking of using the defaults there
> too. I try to stay with defaults whenever possible, especially if I
> don't know enough to know why I would want to change something.

IIRC the default is 2MB.  The downside to a small value here is more
metadata operations, more IOs for full text searches and longer search
times, longer backup times, potentially greater filesystem
fragmentation, etc.  The two advantages I can think of are potentially
fewer locking collisions, and a file corruption affects fewer emails.
There may be others.

With large mdbox sizes the negatives/positives above flip.  As you
increase the size the advantages become ever greater, up to a point.
You obviously don't want to specify 1GB mdboxes.  And if your users
regularly send emails with 5MB+ PDF or TIFF attachments then 2MB is
probably too small.

Best advice?  Take a poll of the list.  You'll likely find people using
between the 2MB default and 64MB.  Some brave souls may be going larger.
...
>> However, ISTR you mentioning that your users transfer multi-GB files, up
>> to 50GB, on a somewhat regular basis, to/from the file server over GbE
>> at ~80-100MB/s.  If these big copies hit the same 4 RAID10 spindles it
>> may tend to decrease IMAP response times due to seek contention.  This
>> has nothing to do with XFS.  It's the nature of shared storage.
> 
> I think you're confusing me/us with someone else. 

Highly possible, and I mean that sincerely.  I help a lot of people
across various lists.  But ISTR when we were discussing your metro
ethernet link the possibility of multi-GB file transfers causing
contention problems with normal user traffic.  Maybe that was your
backup process I'm thinking of.  That would make sense.

> This is definitely not
> something our users do, not even close. We do deal with a lot of large
> email attachments though. I used to have a max size of 50MB, but reduced
> it to 25MB about 8 months ago (equivalent of google's max size)...

Get a good idea of what the current max email size is and size mdbox
files accordingly.

> So, looks like I'm fine with what I have now...

You only have 3x 15K effective spindles, which seems a little lite
generally, but you've got a decent RAID HBA with 512MB of BBWC which
will help write latency tremendously.  And you only ~70 users.  Your
current setup may be fine, as long as drive caches are disabled.  Again,
ask for other opinions on max mdbox size.

-- 
Stan



More information about the dovecot mailing list