On 8/17/2011 9:42 AM, Adrian Ulrich wrote:
I read that XFS is a good choice, but is not too reliable...
Are you using Maildir or MBOX?
In any case: XFS would be my last choice:
XFS is nice if you are working with large files (> 2GB), but for E-Mail i'd stick with ext3 (or maybe even reiser3) as it works very well with small files.
XFS was designed for parallelism, whether with large files or small, though it has been optimized a bit more for large file throughput. In yet another attempt to dispel the XFS "small file problem" myth, XFS has never had a performance problem with "small" files. In the past XFS did have a performance problem with large metadata operations due to the way the delayed allocation had been designed. The perennial example of this was the horrible unlink performance when whacking a kernel tree with 'rm -rf'. It used to take forever, multiple tens of times slower than Reiser or EXT. This metadata bottleneck in the delayed allocation path was largely resolved by Dave Chinner's delayed logging patch which was experimental in 2.6.35 and is enabled by default in 2.6.39 and later. XFS metadata performance is now on par with that of EXT3/4.
Because of this, and XFS' use of allocation groups, today, for a busy IMAP server with lots of maildir mailboxen, one of the highest performance storage stack setups is the following:
- A dozen or more hardware or software RAID1 mirrors
- A linear concat over the mirrors
- XFS with 2*num_mirrors allocation groups, mounted with 'inode64'
- maildir mailboxes
This setup will give you significantly higher real IOPS than any striped array setup with any filesystem atop, for a couple of reasons:
No partial stripe width writes, and no unnecessary full stripe reads. All reads and writes match the page size and filesystem block size of 4KB.
In the example above, you have two AGs per mirror pair, 24 total AGs on 12 mirrors. The first two maildir directories will be created in AGs 1 and 2 on the first mirror. The second two in AGs 3 & 4 on the 2nd mirror pair, and so on. The 25th/26th directories will 'wrap' back to AGs 1 & 2 and the directory creation pattern will continue.
Because of its allocation group design XFS is the only filesystem that can accomplish this level of parallelism with a concatenated array and small email files. All others must rely on striped arrays, either RAID10 or 5/6. These come with the inefficiencies of writing/reading files as small as 2KB on a stripe ranging from 256KB-1MB or larger, depending on the number of disks in the array and the chosen stripe size. If you have a high write load, the Linux allocator will pack multiple files into a single stripe, but one rarely sees 100% efficiency here. Even at 100% on writes, at low read rates, you end up reading a lot of full 256KB-1MB stripes just to get a 2KB file, wasting bandwidth and filling up the buffer cache with unneeded data, not to mention any read cache on your hardware RAID controller or SAN head.
The only potential downside to this setup is the rare situation where your current logged in users all have their mailbox in the same AG or two AGs on the same spindle. I've yet to see this happen, though it is a theoretical possibility, though the probability is extremely low.
-- Stan