[Dovecot] SSD drives are really fast running Dovecot

Fri Jan 21 07:49:27 EET 2011

Frank Cusack put forth on 1/20/2011 2:30 PM:
> On 1/20/11 12:06 AM -0600 Stan Hoeppner wrote:
>>  This is amusing considering XFS is hands down
>> the best filesystem available on any platform, including ZFS.  Others are
>> simply ignorant and repeat what they've heard without looking for current
>> information.

> Your pronouncement that others are simply ignorant is telling.

So is your intentionally quoting me out of context.  In context:

Me:
"Prior to 2007 there was a bug in XFS that caused filesystem corruption upon
power loss under some circumstances--actual FS corruption, not simply zeroing of
files that hadn't been fully committed to disk.  Many (uneducated) folk in the
Linux world still to this day tell others to NOT use XFS because "Power loss
will always corrupt your file system."  Some probably know better but are EXT or
JFS (or god forbid, BTRFS) fans and spread fud regarding XFS.  This is amusing
considering XFS is hands down the best filesystem available on any platform,
including ZFS.  Others are simply ignorant and repeat what they've heard without
looking for current information."

The "ignorant" are those who blindly accept the false words of others regarding
4+ year old "XFS corruption on power fail" as being true today.  They accept but
without verification.  Hence the "rumor" persists in many places.

>> In my desire to be brief I didn't fully/correctly explain how delayed
>> logging works.  I attempted a simplified explanation that I thought most
>> would understand.  Here is the design document:
>> http://oss.sgi.com/archives/xfs/2010-05/msg00329.html

> I guess I understand your championing of it if you consider that a
> design document.  That brief piece of email hardly describes it at
> all, and the performance numbers are pretty worthless (due to the
> caveat that barriers are disabled).

You quoted me out of context again, intentionally leaving out the double paste
error I made of the same URL.

Me:
"In my desire to be brief I didn't fully/correctly explain how delayed logging
works.  I attempted a simplified explanation that I thought most would
understand.  Here is the design document:
http://oss.sgi.com/archives/xfs/2010-05/msg00329.html

Early performance numbers:
http://oss.sgi.com/archives/xfs/2010-05/msg00329.html"

Note the double URL paste error?  Frank?  Why did you twist an honest mistake
into something it's not?  Here's the correct link:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/filesystems/xfs-delayed-logging-design.txt

> Given the paragraph in the "design document":

Stop being an ass.  Or get off yours and Google instead of requiring me to spoon
feed you.

>> The best IO behaviour comes from the delayed logging version of XFS,
>> with the lowest bandwidth and iops to sustain the highest
>> performance. All the IO is to the log - no metadata is written to
>> disk at all, which is the way this test should execute.  As a result,
>> the delayed logging code was the only configuration not limited by
>> the IO subsystem - instead it was completely CPU bound (8 CPUs
>> worth)...
> 
> it is indeed a "ram spooler", for metadata, which is a standard (and
> good) approach.  That's not a side effect, that's the design.  AFAICT
> from the brief description anyway.

As you'll see in the design doc, that's not the intention of the patch.  XFS
already had a delayed metadata update design, but it was terribly inefficient in
implementation.  Dave increased the efficiency several fold.  The reason I
mentioned it on Dovecot is that it directly applies to large/busy maildir style
mail stores.

XFS just clobbers all other filesystems in parallel workload performance, but
historically its metadata performance was pretty anemic, about half that of
other FSes.  Thus, parallel creates and deletes of large numbers of small files
were horrible.  This patch fixes that issue, and brings the metadata performance
of XFS up to the level of EXT3/4, Reiser, and others, for single process/thread
workloads, and far surpasses their performance with large parallel
process/thread workloads, as is shown in the email I linked.

This now makes XFS the perfect Linux FS for maildir and [s/m]dbox on moderate to
heavy load IMAP servers.  Actually it's now the perfect filesystem for all Linux
server workloads.  Previously it was for all workloads but metadata heavy ones.

> This is guaranteed to lose data on power loss or drive failure.

On power loss, on a busy system, yes.  Due to a single drive failure?  That's
totally incorrect.  How are you coming to that conclusion?

As with with every modern Linux filesystem that uses the kernel buffer cache,
which is, all of them, you will lose in flight data that's in the buffer cache
when power drops.

Performance always has a trade off.  The key here is that the filesystem isn't
corrupted due to this metadata loss.  Solaris with ZFS has the same issues.  One
can't pipeline anything in a block device queue and not have some data loss on
power failure, period.  If one syncs every write then you have no performance.
Solaris and ZFS included.

-- 
Stan