[Dovecot] Best filesystem?

Stan Hoeppner stan at hardwarefreak.com
Mon Jan 31 05:05:47 EET 2011


Timo Sirainen put forth on 1/30/2011 5:27 PM:

> But it's not just about how much data is lost. It's also about if any
> existing data is unexpectedly lost. That's why people were complaining
> about ext4, because suddenly when renaming a file over another might
> lose both the old and the new file's contents if power got lost, while
> with ext3 either the old or the new data stayed behind. Then they did
> all kinds of things to ext4 to fix this / make it less likely.

People were complaining about EXT4 because EXT2/3 implemented features to "save
bad programmers from themselves", even though it is NOT the job of the
filesystem code to do so.  EXT4 removed these safeguards and bad programmers who
relied on EXT2/3 to cross their Ts and dot their Is for them threw fits when
they realized EXT4 didn't do this for them any longer.  Google "O_PONIES", and
the blog entry from Eric Sandeen, an XFS developer, regarding O_PONIES:
http://sandeen.net/wordpress/?p=42

XFS never had such "protections for bad programmers".  The bulk of IRIX
developers were well/over educated and well/over paid, usually working for the
US government in one from or another.  Such developers knew when to fsync or
take other measures to make sure critical data hit the disks.  I dare say the
average Linux developer didn't/doesn't have quite the same level of education or
proper mindset as IRIX devs.  If they'd had such skill we'd not have seen the
EXT2/3 to EXT4 problem Ted describes.

> I don't know how likely that is with XFS. Probably one way to test would
> be something like:
> 
> 1. Create 100 files of 1 MB size.
> 2. sync
> 3. Create a new file of 2 MB size & rename() it over a file created in
> step 1.
> 4. Repeat 3 until all files are replaced
> 5. Kill the power immediately after done
> 
> Then you can compare filesystems based on how many files there are whose
> size or content doesn't match.

Depending on your hardware, you may need a lot larger test set than 100 files.
If you don't sync between steps 4/5 you may not see anything except that the 100
overwrites never occurred, as those writes may all still be in the buffer cache
when you pull the plug.

Assuming you can call pull_the_plug with "perfect" timing, I can't tell you what
the exact results would be, as I've never tested this.  You'll likely lose a
pre-existing file depending on what inodes were committed to the journal without
their respective files being written.  I'm pretty sure this is one of those
scenarios that prompt programming professors to teach "create new/delete
old/rename new to old", instead of "rename/edit in place/overwrite".

I recall back in the day that WordPerfect and Microsoft Word both implemented
making a temp copy of every file opened for edit, because crashes of said text
editors would routinely corrupt the opened file.  I believe MS Word and Open
Office Writer still do the same today.

It seems some fundamentals haven't really changed that much in 25 years.

-- 
Stan


More information about the dovecot mailing list