newer
[Dovecot] dsyncing lazy_expunge...

[Dovecot] XFS vs EXT4 for mail storage

Charles Marcus

2 May 2013 2 May '13

2:16 p.m.

Hello,

I'm in the process of finalizing the spec for my new dovecot VM, and this is the last question I need to address...

I've read until I'm just about decided on XFS, but I have no experience with it (been using reiserfs on my old box (@ 8 yrs old now), and never had a problem (knock on wood), but considering its current situation (little to no development support for reasons everyone is aware of), I've decided now is the time to switch. It came down to XFS or EXT4, and I like what I've read about XFS, but am unsure how to tune it (or even if I should).

I've decided to use mdbox for storage (been using maildir), and will enable SIS for attachments.

So, anyone (Stan?) have any suggestions? Should I go with EXT4? Or XFS with just the defaults? Or XFS with one or more tuned parameters?

Appreciate any suggestions (including links to docs dealing with tuning XFS for my mail storage conditions that are written more at the layman level) or comments from anyone experienced using both...

Thanks,

Best regards,

Charles

Show replies by date

Luigi Rosa

2 May 2 May

2:54 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Charles Marcus said the following on 02/05/2013 13:16:

...

So, anyone (Stan?) have any suggestions? Should I go with EXT4? Or XFS with just the defaults? Or XFS with one or more tuned parameters?

Expecially when you are working in virtual environments, keep in mind the concept of "I/O cascading"

The bottleneck of virtual environment are often IOPS (I/O per second), so a VM that has a light footprint of IOPS will have a better performance.

The I/O cascading is in essence the muptiplying factor of each disk write at application level. Consider a SQL UPDATE statement: you have date written on database and trasaction log. Each file will have its mtime updated. If the underlying file system is transactional you will have double writes for actual file and transaction log... And so on.

The first and obvious advice (quite a default nowdays with SSD storage) is to mount the FS with noatime. But I think that is obvius as "do backups".

Ciao, luigi

/ +--[Luigi Rosa]-- \

If one morning I walked on top of the water across the Potomac River, the headline that afternoon would read "President Can't Swim". --Lyndon B. Johnson -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/

iEYEARECAAYFAlGCU+IACgkQ3kWu7Tfl6ZSLjwCgt2MJu5wqXOj4Mt3UdsvmaFc1 cO0AnAmxKtsJ0evmrVXUlnY6e06WtLIL =Rf7R -----END PGP SIGNATURE-----

Charles Marcus

4:17 p.m.

Thanks for the replies...

On 2013-05-02 7:54 AM, Luigi Rosa <lists@luigirosa.com> wrote:

...

The I/O cascading is in essence the muptiplying factor of each disk write at application level. Consider a SQL UPDATE statement: you have date written on database and trasaction log. Each file will have its mtime updated. If the underlying file system is transactional you will have double writes for actual file and transaction log... And so on.

Well, this is purely for a mailstore. The only thing I use SQL for is my userdb, so 99.999% of that is just reads for user validation and user auth. Writes are only very occasional, and tiny when they happen, so basically no impact on the system.

On 2013-05-02 8:04 AM, Alessio Cecchi <alessio@skye.it> wrote:

...

My mount options are:

"rw,noatime,attr2,delaylog,nobarrier,inode64,noquota"

Hmmm... some questions...

man mount doesn't show delaylog, nobarrier or noquota as valid mount options... ?

But, assuming they are, since rw is the default for all fs types, and attr2 is default for xfs, I could accomplish the same with:

defaults,noatime,delaylog,nobarrier,inode64,noquota

I'm not using quotas, and understand what inode64 does and am fine with that, but what I'm still unsure of for a VM environment is the delaylog and nobarrier options.

Are these recommended/optimal for a VM? Running on ESXi (does it matter what hypervisor ie being used)?

Best regards,

Charles

Reindl Harald

4:21 p.m.

Am 02.05.2013 15:17, schrieb Charles Marcus:

...

but what I'm still unsure of for a VM environment is the delaylog and nobarrier options.

Are these recommended/optimal for a VM? Running on ESXi (does it matter what hypervisor ie being used)?

barriers does not help you much or are implicit because you have no physical disk under the FS and the underlying storage should have battery backed buffers and doe snot need to confirm the physical write to the disk to have the data safe

/dev/sdd1 on /storage type ext4 (rw,noexec,noatime,nodiratime,commit=45,inode_readahead_blks=64) Default mount options: journal_data_writeback nobarrier

Stan Hoeppner

3 May 3 May

12:12 p.m.

On 5/2/2013 8:21 AM, Reindl Harald wrote:

...

Am 02.05.2013 15:17, schrieb Charles Marcus:

...
but what I'm still unsure of for a VM environment is the delaylog and nobarrier options.

Delaylog is fine for VM guests. The barrier settings may all simply be useless because many hypervisors don't pass barriers down the stack from the guest. Which means things like fdatasync don't work, not just journal write barriers. See:

http://xfs.org/index.php/XFS_FAQ#Q:_Which_settings_are_best_with_virtualizat...

This has negatively affected EXT4 on ESXi, not just XFS.

...

...
Are these recommended/optimal for a VM? Running on ESXi (does it matter what hypervisor ie being used)?

barriers does not help you much or are implicit because you have no physical disk under the FS and the underlying storage should have battery backed buffers and doe snot need to confirm the physical write to the disk to have the data safe

The problem isn't lack of a physical disk under the guest. The problem is lack of software support in the hypervisors. I don't have an answer as to which versions of ESXi/vSphere/etc, if any, do or do not support write barriers, fdatasync, etc. I'm not finding it in their knowledgebase, though I've not put much effort into it yet. You'll need to do some research.

-- Stan

Alessio Cecchi

2 May 2 May

4:51 p.m.

Il 02/05/2013 15:17, Charles Marcus ha scritto:

...

man mount doesn't show delaylog, nobarrier or noquota as valid mount options... ?

Yes, they are available on RHEL 6.x.

"since 2.6.35, xfs had a new mount option '-o delaylog', which improved a lot metadata operations. From 2.6.39 this option is on by default"

Ciao

Alessio Cecchi is: @ ILS -> http://www.linux.it/~alessice/ on LinkedIn -> http://www.linkedin.com/in/alessice Assistenza Sistemi GNU/Linux -> http://www.cecchi.biz/ @ PLUG -> ex-Presidente, adesso senatore a vita, http://www.prato.linux.it

Charles Marcus

5:21 p.m.

On 2013-05-02 9:51 AM, Alessio Cecchi <alessio@skye.it> wrote:

...

Il 02/05/2013 15:17, Charles Marcus ha scritto:

...
man mount doesn't show delaylog, nobarrier or noquota as valid mount options... ?

Yes, they are available on RHEL 6.x.

"since 2.6.35, xfs had a new mount option '-o delaylog', which improved a lot metadata operations. From 2.6.39 this option is on by default"

Is this a redhat specific feature? Again, man mount says nothing about those options.

Best regards,

Charles

Gedalya

8:47 p.m.

On 05/02/2013 10:21 AM, Charles Marcus wrote:

...

On 2013-05-02 9:51 AM, Alessio Cecchi <alessio@skye.it> wrote:

...
Il 02/05/2013 15:17, Charles Marcus ha scritto:

...
man mount doesn't show delaylog, nobarrier or noquota as valid mount options... ?

Yes, they are available on RHEL 6.x.

"since 2.6.35, xfs had a new mount option '-o delaylog', which improved a lot metadata operations. From 2.6.39 this option is on by default"

Is this a redhat specific feature? Again, man mount says nothing about those options.

You're right that it doesn't seem to be properly listed in the list of options, but it's discussed

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/Do...

Other XFS options: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/Do...

Michael Weissenbacher

3 May 3 May

8:14 a.m.

Hi Marcus!

...

Should I go with EXT4? Or XFS with just the defaults? Or XFS with one or more tuned parameters? ... I'm not using quotas, and understand what inode64 does and am fine with that, but what I'm still unsure of for a VM environment is the delaylog and nobarrier options.

I've been using XFS for many years now and i strongly recommend it for anything besides /boot. Considering a virtual environment i would strongly suggest NOT using nobarrier (i.e. use barrier). You can run into big trouble should the system ever lose power. In fact the only time i ever managed to damage a XFS filesystem on all those years was inside a VM and with the nobarrier option on and the UPS died (and yes the server had a functioning BBS battery attached to the RAID). The delaylog option can be recommended hands-off, since it speeds up metadata operations considerably (up to 10 times faster!). And for your last quesion, stay with the defaults when doing mkfs.xfs, optimizing for stripe width and stipe size and all those other options really only make sense on a physical machine.

hth and good luck, Michael

Charles Marcus

2:36 p.m.

On 2013-05-03 1:14 AM, Michael Weissenbacher <mw@dermichi.com> wrote:

...

I've been using XFS for many years now and i strongly recommend it for anything besides /boot. Considering a virtual environment i would strongly suggest NOT using nobarrier (i.e. use barrier). You can run into big trouble should the system ever lose power. In fact the only time i ever managed to damage a XFS filesystem on all those years was inside a VM and with the nobarrier option on and the UPS died (and yes the server had a functioning BBS battery attached to the RAID). The delaylog option can be recommended hands-off, since it speeds up metadata operations considerably (up to 10 times faster!). And for your last quesion, stay with the defaults when doing mkfs.xfs, optimizing for stripe width and stipe size and all those other options really only make sense on a physical machine.

hth and good luck, Michael

Thanks Michael. Yes, it helped to solidify my decision to stick with xfs. I posted my final fstab just a few minutes ago, which I'm now happy with.

Best regards,

Charles

Stan Hoeppner

3:45 p.m.

On 5/3/2013 12:14 AM, Michael Weissenbacher wrote: ...

...

last quesion, stay with the defaults when doing mkfs.xfs, optimizing for stripe width and stipe size and all those other options really only make sense on a physical machine.

The potential benefit of alignment is always workload dependent. If the hypervisor passes correct RAID geometry to the VM guest and the workload can benefit from alignment, then you want alignment in the VM filesystem just as much as on bare metal.

-- Stan

Stan Hoeppner

12:54 p.m.

On 5/2/2013 8:17 AM, Charles Marcus wrote:

...

man mount doesn't show delaylog, nobarrier or noquota as valid mount options... ?

Many XFS mount options are kernel version specific. Show: ~$ uname -a

Delaylog doesn't exist in recent mount(8) because it's no longer a mount option, same goes for older mount(8). Its existence as a mount option didn't exist for long, WRT distro mount(8) updates. Since 2.6.39 delaylog is the default, and as of somewhat more recently in the 3.x tree, the old journal logging code was completely removed from the source. Thus there is no longer a "delaylog" mount option. The feature is now hard coded in XFS.

Barriers are crucial to XFS journal, and thus filesystem, reliability. "nobarrier" isn't listed in mount(8), though "barrier" is, which is the default mode. Making people "look for" the switch that disables barriers forces them to take a learning journey. Hopefully during this journey they become educated to the risks of disabling it, before doing so. "Better reliability through obscurity" you might say. Consider the horrible rap XFS would have today if everyone and his dog could easily learn how to disable barriers, then did so on hardware not appropriate for it. Yes, exactly, corrupted XFS filesystems littering the landscape and people screaming what a pile of dogsh|zt XFS is.

WRT noquota, it is the default. You'd never specify it. There are 10 quota options at the bottom of the XFS section of mount(8) that one might want to set.

It is quite irritating, yet surprisingly common, to see XFS users re-specifying the defaults in their /etc/fstab, because they didn't take the time to educate themselves properly, and simply copy/pasted from one of many online "XFS tuning guides". On the XFS list we call these "XFS mis-tuning guides", as nearly all of them contain mostly misinformation. Not intentional mind you, but because they just don't know what they're talking about, or they did but the guide is 5+ years old, and things have changed.

-- Stan

Alessio Cecchi

2 May 2 May

3:04 p.m.

Il 02/05/2013 13:16, Charles Marcus ha scritto:

...

Hello,

I'm in the process of finalizing the spec for my new dovecot VM, and this is the last question I need to address...

I've read until I'm just about decided on XFS, but I have no experience with it (been using reiserfs on my old box (@ 8 yrs old now), and never had a problem (knock on wood), but considering its current situation (little to no development support for reasons everyone is aware of), I've decided now is the time to switch. It came down to XFS or EXT4, and I like what I've read about XFS, but am unsure how to tune it (or even if I should).

I've decided to use mdbox for storage (been using maildir), and will enable SIS for attachments.

So, anyone (Stan?) have any suggestions? Should I go with EXT4? Or XFS with just the defaults? Or XFS with one or more tuned parameters?

Appreciate any suggestions (including links to docs dealing with tuning XFS for my mail storage conditions that are written more at the layman level) or comments from anyone experienced using both...

Thanks,

Hi,

I'm using XFS for mail storage (Maildir type) and it works fine and better than ext4 (especially if you storage is very large).

My mount options are:

"rw,noatime,attr2,delaylog,nobarrier,inode64,noquota"

and I'm running it on RHEL 6.4

For more information you can read the RHEL documentation:

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/...

Ciao

Stan Hoeppner

3 May 3 May

11:48 a.m.

On 5/2/2013 7:04 AM, Alessio Cecchi wrote:

...

"rw,noatime,attr2,delaylog,nobarrier,inode64,noquota" ... and I'm running it on RHEL 6.4

I assume this is from /proc/mounts? All of those but for noatime, nobarrier, and inode64 are defaults. You've apparently specified these in /etc/fstab. noatime is useless as relatime is the default. Google "XFS relatime vs noatime".

I assume you have a RAID controller or SAN head with [F|B]BWC and have disabled individual drive write caches of array disks, given you've disabled journal write barriers. If drive caches are in fact enabled, and/or you don't have [F|B]BWC, then journal write barriers need to be enabled. If not you're skydiving without a reserve chute.

-- Stan

Alessio Cecchi

12:34 p.m.

Il 03/05/2013 10:48, Stan Hoeppner ha scritto:

...

On 5/2/2013 7:04 AM, Alessio Cecchi wrote:

...
"rw,noatime,attr2,delaylog,nobarrier,inode64,noquota" ... and I'm running it on RHEL 6.4

I assume this is from /proc/mounts? All of those but for noatime, nobarrier, and inode64 are defaults. You've apparently specified these in /etc/fstab. noatime is useless as relatime is the default. Google "XFS relatime vs noatime".

I assume you have a RAID controller or SAN head with [F|B]BWC and have disabled individual drive write caches of array disks, given you've disabled journal write barriers. If drive caches are in fact enabled, and/or you don't have [F|B]BWC, then journal write barriers need to be enabled. If not you're skydiving without a reserve chute.

Thanks Stan, yes the output is from /proc/mounts.

We are running XFS on RAID controller but we havent disabled individual drive write caches. So what options suggest in fstab for XFS with non high-end RAID/SAN ?

Thanks

Stan Hoeppner

2:39 p.m.

On 5/3/2013 4:34 AM, Alessio Cecchi wrote:

...

Il 03/05/2013 10:48, Stan Hoeppner ha scritto:

...
On 5/2/2013 7:04 AM, Alessio Cecchi wrote:

...
"rw,noatime,attr2,delaylog,nobarrier,inode64,noquota" ... and I'm running it on RHEL 6.4

I assume this is from /proc/mounts? All of those but for noatime, nobarrier, and inode64 are defaults. You've apparently specified these in /etc/fstab. noatime is useless as relatime is the default. Google "XFS relatime vs noatime".

I assume you have a RAID controller or SAN head with [F|B]BWC and have disabled individual drive write caches of array disks, given you've disabled journal write barriers. If drive caches are in fact enabled, and/or you don't have [F|B]BWC, then journal write barriers need to be enabled. If not you're skydiving without a reserve chute.

Thanks Stan, yes the output is from /proc/mounts.

We are running XFS on RAID controller but we havent disabled individual

Which RAID controller? Does it have BBWC (battery backed write cache)? How much cache RAM?

...

drive write caches. So what options suggest in fstab for XFS with non high-end RAID/SAN ?

Get rid of noatime and use the default, relatime. Only specify nobarrier if you have both:

Working BBWC on your RAID card
Individual disk drive caches are disabled (and preferably a good UPS)

RAID BBWC is worthless if drive caches are still enabled. This can corrupt your filesystem if power fails, or the kernel crashes, because writes to the journal may be lost.

-- Stan

lst_hoe02＠kwsoft.de

2 May 2 May

4:12 p.m.

Zitat von Charles Marcus <CMarcus@media-brokers.com>:

...

Hello,

I'm in the process of finalizing the spec for my new dovecot VM, and
this is the last question I need to address...

I've read until I'm just about decided on XFS, but I have no
experience with it (been using reiserfs on my old box (@ 8 yrs old
now), and never had a problem (knock on wood), but considering its
current situation (little to no development support for reasons
everyone is aware of), I've decided now is the time to switch. It
came down to XFS or EXT4, and I like what I've read about XFS, but
am unsure how to tune it (or even if I should).

I've decided to use mdbox for storage (been using maildir), and will
enable SIS for attachments.

So, anyone (Stan?) have any suggestions? Should I go with EXT4? Or
XFS with just the defaults? Or XFS with one or more tuned parameters?

Appreciate any suggestions (including links to docs dealing with
tuning XFS for my mail storage conditions that are written more at
the layman level) or comments from anyone experienced using both...

IMHO if you say "VM" than the filesystem inside the guest doesn't
matter that much. The difference of ext4/xfs are mostly the knowledge
and adjustability for special (high-end) hardware and the like. With a
Hypervisor providing some standard I/O channel and hiding/handling the
hardware details itself, most of the differences are gone. With this
in mind your question should maybe more of "what filesystem is more
Hypervisor friendly". For this i would suspect the simpler the better,
so i would choose ext4.

Regards

Andreas

Charles Marcus

5:02 p.m.

On 2013-05-02 9:12 AM, lst_hoe02@kwsoft.de <lst_hoe02@kwsoft.de> wrote:

...

IMHO if you say "VM" than the filesystem inside the guest doesn't matter that much.

Well... my understanding is that things can break rather badly if you use reiserfs for the host, and then use reiserfs for one of the guests...

So, if doing that can break things badly, I imagine you may not be totally correct that it 'doesn't matter'...

...

The difference of ext4/xfs are mostly the knowledge and adjustability for special (high-end) hardware and the like. With a Hypervisor providing some standard I/O channel and hiding/handling the hardware details itself, most of the differences are gone. With this in mind your question should maybe more of "what filesystem is more Hypervisor friendly". For this i would suspect the simpler the better, so i would choose ext4.

Possibly a valid argument overall... would like to see what Stan has to say about it though before I make a final decision...

Thanks to all for the replies so far.

Best regards,

Charles

lst_hoe02＠kwsoft.de

11:18 p.m.

Zitat von Charles Marcus <CMarcus@media-brokers.com>:

...

On 2013-05-02 9:12 AM, lst_hoe02@kwsoft.de <lst_hoe02@kwsoft.de> wrote:

...
IMHO if you say "VM" than the filesystem inside the guest doesn't
matter that much.

Well... my understanding is that things can break rather badly if
you use reiserfs for the host, and then use reiserfs for one of the
guests...

From my understanding this was because of the "repair" capabilities
of reiserfs checkdisk which was able to mix up your host and guest fs.
This was also only the case for VM Player and old Server eg. the Linux
add-on Hypervisors.

...

So, if doing that can break things badly, I imagine you may not be
totally correct that it 'doesn't matter'...

For ESXi with its own filesystem (vmfs) it still shouldn't matter that
much. As said the basic task of the Hypervisor is to abstract the
hardware used, so no chance for the guest OS to really optimize for
the hardware used. Maybe its time for a generic Hypervisor guest fs...

Regards

Andreas

Charles Marcus

11:36 p.m.

On 2013-05-02 4:18 PM, lst_hoe02@kwsoft.de <lst_hoe02@kwsoft.de> wrote:

...

For ESXi with its own filesystem (vmfs) it still shouldn't matter that much. As said the basic task of the Hypervisor is to abstract the hardware used, so no chance for the guest OS to really optimize for the hardware used. Maybe its time for a generic Hypervisor guest fs...

Interesting idea, but way over my head as far as whether or not it is accurate... ;)

One thing I'm still unsure of is the whole issue of things being sync'd to disk, and which options for xfs (or ext4) are 'safest' for the virtualized environment...

Best regards,

Charles

Stan Hoeppner

3 May 3 May

11:32 a.m.

On 5/2/2013 8:12 AM, lst_hoe02@kwsoft.de wrote:

...

IMHO if you say "VM" than the filesystem inside the guest doesn't matter that much.

Malarky.

...

The difference of ext4/xfs are mostly the knowledge and adjustability for special (high-end) hardware and the like. With a

XFS doesn't require "high end" hardware to demonstrate its advantages over EXT4. In his LCA 2012 presentation on XFS development, Dave Chinner showed data from IIRC a 12 disk RAID0 array, which is hardly high end. Watch the presentation and note the massive lead XFS has over EXT4 (and BTRFS) in most areas. The performance gap is quite staggering. You'll see the same performance, and differences, in a VM or on bare hardware.

http://youtu.be/FegjLbCnoBw

...

Hypervisor providing some standard I/O channel and hiding/handling the hardware details itself, most of the differences are gone. With this in

Again, malarky. The parallel performance in XFS resides in multiple threads and memory structures, b+ trees, and how these are executed and manipulated, and via the on disk layout of AGs and how they're written to in parallel. Virtualization doesn't change nor limit any of this. The block device driver, not the filesystem, talks through the hypervisor to the hardware. No hypervisor imposes limits on XFS parallelism or performance, nor block device drivers. Some may be configured to prioritize IO amongst guests, but that's a different issue entirely.

Worthy of note here is that nearly all XFS testing performed by the developers today is done within virtual machines on filesystems that reside within sparse files atop another XFS filesystem--not directly on hardware. According to you, this double layer of virtualization, OS and filesystem, would further eliminate all meaningful performance differences between XFS and EXT4. Yet this is not the case at all because EXT4 doesn't yet handle sparse files very well, so the XFS lead increases.

...

mind your question should maybe more of "what filesystem is more Hypervisor friendly". For this i would suspect the simpler the better, so i would choose ext4.

Again, malarky. The hypervisor imposes no limits on filesystem performance, other than the CPU cycles, scheduling, and RAM overhead of the hypervisor itself. I.e. the same things imposed on all aspects of guest operation.

-- Stan

lst_hoe02＠kwsoft.de

12:31 p.m.

Zitat von Stan Hoeppner <stan@hardwarefreak.com>:

...

On 5/2/2013 8:12 AM, lst_hoe02@kwsoft.de wrote:

...
IMHO if you say "VM" than the filesystem inside the guest doesn't matter that much.

Malarky.

If you are going to insult you maybe should write it so non native
speakers could find it (malarkey).

...

...
The difference of ext4/xfs are mostly the knowledge and adjustability for special (high-end) hardware and the like. With a

XFS doesn't require "high end" hardware to demonstrate its advantages over EXT4. In his LCA 2012 presentation on XFS development, Dave Chinner showed data from IIRC a 12 disk RAID0 array, which is hardly high end. Watch the presentation and note the massive lead XFS has over EXT4 (and BTRFS) in most areas. The performance gap is quite staggering. You'll see the same performance, and differences, in a VM or on bare hardware.

http://youtu.be/FegjLbCnoBw

It is not stunningly that a developer of XFS come out with a setup
where XFS is the fastest at all.

...

...
Hypervisor providing some standard I/O channel and hiding/handling the hardware details itself, most of the differences are gone. With this in

Again, malarky. The parallel performance in XFS resides in multiple threads and memory structures, b+ trees, and how these are executed and manipulated, and via the on disk layout of AGs and how they're written to in parallel. Virtualization doesn't change nor limit any of this. The block device driver, not the filesystem, talks through the hypervisor to the hardware. No hypervisor imposes limits on XFS parallelism or performance, nor block device drivers. Some may be configured to prioritize IO amongst guests, but that's a different issue entirely.

While it might be true that XFS threading and the
non-blocking/parallel design will gain some benefit, it is no longer
true for all points regarding "disk" layout or estimate of i/o
channels and disk spindles.

...

Worthy of note here is that nearly all XFS testing performed by the developers today is done within virtual machines on filesystems that reside within sparse files atop another XFS filesystem--not directly on hardware. According to you, this double layer of virtualization, OS and filesystem, would further eliminate all meaningful performance differences between XFS and EXT4. Yet this is not the case at all because EXT4 doesn't yet handle sparse files very well, so the XFS lead increases.

So you have confirmed may suspection that XFS developers will find a
case where it matters in favour of XFS ;-) In real world VM deployments most of the time there are vmfs volumes
(VMWare) underneath, or NTFS (Hyper-V) and in many cases these are
even taken from some form of SAN device doing its own mapping of fs
blocks to physical blocks. With this a careful choosen disk layout
inside the guest doesn't matter at all, if the Hypervisor does not or
can not map this useful to the hardware.

...

...
mind your question should maybe more of "what filesystem is more Hypervisor friendly". For this i would suspect the simpler the better, so i would choose ext4.

Again, malarky. The hypervisor imposes no limits on filesystem performance, other than the CPU cycles, scheduling, and RAM overhead of the hypervisor itself. I.e. the same things imposed on all aspects of guest operation.

You have forgotten that the Hypervisor also provide only a standard
device "API" for the I/O channel which limits the possibility to do
any hardware estimate/optimization inside the guest. So many
traditional performance tweaks don't work as expected like physical
block layout or alignment. The more "far away" in terms of layers you
are from hardware the more difficult it get to optimize i/o speed with
the traditional approaches. You can proof this by the myriand of
benchmarks flying around all have another clear winner dependent on
who has done the benchmark.

I know your history on insisting your are right in any cases, so this
is my last post on this subject. Every reader should try to understand
the differences on his/her own anyway.

Regards

Andreas

Stan Hoeppner

2:24 p.m.

On 5/3/2013 4:31 AM, lst_hoe02@kwsoft.de wrote:

...

If you are going to insult you maybe should write it so non native speakers could find it (malarkey).

Sorry Andreas. I didn't intend that as an insult, merely an expression of strong disagreement with statements not grounded in facts.

...

It is not stunningly that a developer of XFS come out with a setup where XFS is the fastest at all.

Dave is even handed with this stuff. Watch the video. The pre-delaylog slides show EXT4 metadata performance really trouncing old XFS by a *much* larger margin than that of XFS with delaylog over EXT4. When delaylog turns the tables the gap is much smaller. This says more about how horrible XFS metadata performance was prior to delaylog than how much better than EXT4 it is today, though it is substantially better with greater parallelism. ...

...

So you have confirmed may suspection that XFS developers will find a case where it matters in favour of XFS ;-)

All developers use VMs today for the obvious reason: It saves so much time and allows much more work in a given time frame. Note that for validation testing of things like barriers they must still use bare metal since the hypervisors noop disk cache flushes. ...

...

I know your history on insisting your are right in any cases, so this is

Then you've obviously missed posts where I've acknowledged making mistakes.

...

my last post on this subject. Every reader should try to understand the differences on his/her own anyway.

It's never about "being right" but "getting it right". People require accurate technical information in order to make technical decisions. I provide that when I have the information. I also try to correct incomplete, missing, or inaccurate information where I believe it to be necessary. You stated that a VM environment eliminates most of the advantages of any given filesystem, and that's simply not correct.

-- Stan

Stan Hoeppner

8:30 a.m.

On 5/2/2013 6:16 AM, Charles Marcus wrote: ...

...

I've decided to use mdbox for storage (been using maildir), and will enable SIS for attachments.

So, anyone (Stan?) have any suggestions? Should I go with EXT4? Or XFS with just the defaults? Or XFS with one or more tuned parameters?

Appreciate any suggestions (including links to docs dealing with tuning XFS for my mail storage conditions that are written more at the layman level) or comments from anyone experienced using both...

...

From a filesystem perspective mdbox is little different from maildir as they both exhibit lots of small random IOs. With either one aligning the filesystem to the RAID stripe is problematic as it can create spindle hotspots and increase free space fragmentation. If you're using a vmdk stripe alignment isn't possible anyway as VMware ignores hardware device geometry WRT vmdks.

Although the EXT developers have been working overtime the last few years trying to borrow/steal/duplicate the advanced performance features of XFS, they have a very long way to go. The parallel performance of EXT is far behind as well as file allocation/layout and free space management, to name a few.

My recommendation is to use XFS with the defaults, but add "inode64" to the mount options in /etc/fstab. This enables the modern allocator which clusters files around their parent directory within an allocation group. It's the default allocator in very recent upstream kernels but not in most currently shipping distro kernels. It decreases seek latency between metadata and file operations, and better manages on disk space. In short, XFS will yield superior mail performance to EXT4 in a multiuser environment.

There are currently no mail workload tuning docs in the world of XFS that I'm aware of. I've been intending to write such a doc for the XFS.org FAQ for some time but it hasn't happened yet.

-- Stan

Charles Marcus

2:30 p.m.

On 2013-05-03 1:30 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:

...

From a filesystem perspective mdbox is little different from maildir as they both exhibit lots of small random IOs.

Hi Stan. Thanks, was hoping you'd chime in here...

But, I'm confused as to why you'd say this. mdbox supposedly has many advantages over maildir, since it is *not* a single file for every email (like maildir or sdbox).

...

My recommendation is to use XFS with the defaults, but add "inode64" to the mount options in /etc/fstab. This enables the modern allocator which clusters files around their parent directory within an allocation group. It's the default allocator in very recent upstream kernels but not in most currently shipping distro kernels. It decreases seek latency between metadata and file operations, and better manages on disk space. In short, XFS will yield superior mail performance to EXT4 in a multiuser environment.

Thanks very much. I'd already come to a similar conclusion, but was starting to have doubts after some of the prior comments. But what you say backs up the majority of what I've been reading. It's just difficult to judge what you're reading when you aren't a software or hardware engineer, just a lowly self-taught sysadmin who still consider himself a noob even after doing this for a few years.

...

There are currently no mail workload tuning docs in the world of XFS that I'm aware of. I've been intending to write such a doc for the XFS.org FAQ for some time but it hasn't happened yet.

Hope you find the time to do it some day... :)

On 2013-05-03 5:54 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:

...

Many XFS mount options are kernel version specific. Show: ~$ uname -a

Linux myhost 3.7.10-gentoo-r1 #3 SMP Sat Apr 27 10:01:59 EDT 2013 x86_64 AMD Opteron(tm) Processor 4180 AuthenticAMD GNU/Linux

...

Delaylog doesn't exist in recent mount(8) because it's no longer a mount option, same goes for older mount(8). Its existence as a mount option didn't exist for long, WRT distro mount(8) updates. Since 2.6.39 delaylog is the default, and as of somewhat more recently in the 3.x tree, the old journal logging code was completely removed from the source. Thus there is no longer a "delaylog" mount option. The feature is now hard coded in XFS.

Got it, thanks.

...

Barriers are crucial to XFS journal, and thus filesystem, reliability. "nobarrier" isn't listed in mount(8), though "barrier" is, which is the default mode. Making people "look for" the switch that disables barriers forces them to take a learning journey. Hopefully during this journey they become educated to the risks of disabling it, before doing so. "Better reliability through obscurity" you might say. Consider the horrible rap XFS would have today if everyone and his dog could easily learn how to disable barriers, then did so on hardware not appropriate for it. Yes, exactly, corrupted XFS filesystems littering the landscape and people screaming what a pile of dogsh|zt XFS is.

Got it, thanks again.

...

WRT noquota, it is the default. You'd never specify it. There are 10 quota options at the bottom of the XFS section of mount(8) that one might want to set.

It is quite irritating, yet surprisingly common, to see XFS users re-specifying the defaults in their /etc/fstab, because they didn't take the time to educate themselves properly, and simply copy/pasted from one of many online "XFS tuning guides". On the XFS list we call these "XFS mis-tuning guides", as nearly all of them contain mostly misinformation. Not intentional mind you, but because they just don't know what they're talking about, or they did but the guide is 5+ years old, and things have changed.

Ok, so here's my final fstab... thanks again for all of the comments (especially yours Stan).

<fs> <mountpoint> <type> <opts> <dump/pass>

Non-LVM volumes

/dev/sda1 /boot ext4 defaults,noauto 1 2 /dev/sda3 / ext4 defaults 0 1 /dev/sda2 none swap sw 0 0 /dev/cdrom /mnt/cdrom auto noauto,ro 0 0 /dev/fd0 /mnt/floppy auto noauto 0 0

LVM volumes

/dev/vg/var /var xfs defaults,inode64 0 2 /dev/vg/tmp /tmp ext2 nodev,noexec,nosuid 0 2 /dev/vg/vtmp /var/tmp ext2 nodev,nosuid 0 2 /dev/vg/log /var/log ext2 defaults 0 2 /dev/vg/snaps /snaps xfs defaults,inode64 0 2

Best regards,

Charles

Stan Hoeppner

3:34 p.m.

On 5/3/2013 6:30 AM, Charles Marcus wrote:

...

On 2013-05-03 1:30 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:

...
From a filesystem perspective mdbox is little different from maildir as they both exhibit lots of small random IOs.

Hi Stan. Thanks, was hoping you'd chime in here...

But, I'm confused as to why you'd say this. mdbox supposedly has many advantages over maildir, since it is *not* a single file for every email (like maildir or sdbox).

When I said "lots of small random IOs" I was leading into the explanation of why alignment isn't necessary, and actually detrimental to a mail workload. It's WRT filesystem alignment to the RAID stripe that maildir and mdbox are little different.

...

Thanks very much. I'd already come to a similar conclusion, but was starting to have doubts after some of the prior comments. But what you say backs up the majority of what I've been reading. It's just difficult to judge what you're reading when you aren't a software or hardware engineer, just a lowly self-taught sysadmin who still consider himself a noob even after doing this for a few years.

Digesting the inner workings of a filesystem, especially one as complex and tweakable as XFS, and how they relate to real world workloads, is not for the faint of heart. Ironically, today's XFS defaults work extremely well "out of the box" for many workloads, including mail.

...

...
There are currently no mail workload tuning docs in the world of XFS that I'm aware of. I've been intending to write such a doc for the XFS.org FAQ for some time but it hasn't happened yet.

Hope you find the time to do it some day... :)

I need to get this Dovecot doc thing finished first...

...

On 2013-05-03 5:54 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:

...
Many XFS mount options are kernel version specific. Show: ~$ uname -a

Linux myhost 3.7.10-gentoo-r1 #3 SMP Sat Apr 27 10:01:59 EDT 2013 x86_64 AMD Opteron(tm) Processor 4180 AuthenticAMD GNU/Linux

3.7, yeah, pretty sure delaylog is no longer an option with this recent a kernel. ...

...

Ok, so here's my final fstab... thanks again for all of the comments (especially yours Stan).

<fs> <mountpoint> <type> <opts> <dump/pass>

...

/dev/vg/var /var xfs defaults,inode64 0 2 ... /dev/vg/snaps /snaps xfs defaults,inode64 0 2

I assume /var will hold user mail dirs. Do /var/ and /snaps reside on the same RAID array, physical disks? How about the other filesystems I snipped? If you have a large number of filesystems atop the same RAID, some of them being XFS, this could create a head thrashing problem under high load increasing latency and thus response times.

Would you mind posting:

~$ xfs_info /dev/vg/var ~$ xfs_info /dev/vg/snaps

-- Stan

Charles Marcus

5:21 p.m.

On 2013-05-03 8:34 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:

...

I assume /var will hold user mail dirs.

Yes, in /var/vmail

...

Do /var/ and /snaps reside on the same RAID array, physical disks?

Yes - vmware host is a Dell R515, with ESXi installed to mirrored internal SATA drives, with 8 drives in RAID 10 for all of the VMs. All storage is this local storage (no SAN/NAS).

...

How about the other filesystems I snipped? If you have a large number of filesystems atop the same RAID, some of them being XFS, this could create a head thrashing problem under high load increasing latency and thus response times.

Ouch...

This ESXi host also hosts 2 server 2008R2 vms...

...

Would you mind posting: ~$ xfs_info /dev/vg/var

xfs_info /dev/vg/var

meta-data=/dev/mapper/vg-var isize=256 agcount=4, agsize=45875200 blks = sectsz=512 attr=2 data = bsize=4096 blocks=183500800, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=89600, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0

...

~$ xfs_info /dev/vg/snaps

xfs_info /dev/vg/snaps

meta-data=/dev/mapper/vg-snaps isize=256 agcount=4, agsize=262144 blks = sectsz=512 attr=2 data = bsize=4096 blocks=1048576, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0

Thanks again Stan...

Best regards,

Charles

Charles Marcus

9:44 p.m.

On 2013-05-03 10:21 AM, Charles Marcus <CMarcus@Media-Brokers.com> wrote:

...

...
How about the other filesystems I snipped? If you have a large number of filesystems atop the same RAID, some of them being XFS, this could create a head thrashing problem under high load increasing latency and thus response times.

Ouch...

This ESXi host also hosts 2 server 2008R2 vms...

Or did you mean just the other filesystems in this linux VM?

Yes, they are all on the same RAID. The only purpose of the other xfs volume - snaps- is to hold snapshots of /var for email backup purposes - so, rsnapshot will initiate an LVM snapshot, take the backup, then remove the snapshot. /snaps is not used for anything else, and it is the only other xfs filesystem.

The others are either ext4 (/ and /boot) or ext2 (/tmp, /var/tmp and /var/log)...

Best regards,

Charles

Stan Hoeppner

4 May 4 May

6:10 a.m.

On 5/3/2013 9:21 AM, Charles Marcus wrote:

...

On 2013-05-03 8:34 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:

...
I assume /var will hold user mail dirs.

Yes, in /var/vmail

...
Do /var/ and /snaps reside on the same RAID array, physical disks?

Yes - vmware host is a Dell R515, with ESXi installed to mirrored internal SATA drives, with 8 drives in RAID 10 for all of the VMs. All storage is this local storage (no SAN/NAS).

Your RAID10 is on a PERC correct? You have four 7.2K SATA stripe spindles. Do you mind posting the RAID10 strip/chunk size? The RAID geometry can be critical, not just for mail, but your entire VM setup. Also, what's your mdbox max file size?

...

...
How about the other filesystems I snipped? If you have a large number of filesystems atop the same RAID, some of them being XFS, this could create a head thrashing problem under high load increasing latency and thus response times.

Ouch...

Don't fret yet.

...

This ESXi host also hosts 2 server 2008R2 vms...

So, what, 3 production VMs total? That shouldn't be a problem, unless... (read below)

...

...
Would you mind posting: ~$ xfs_info /dev/vg/var

...

meta-data=/dev/mapper/vg-var isize=256 agcount=4, agsize=45875200 ... meta-data=/dev/mapper/vg-snaps isize=256 agcount=4, agsize=262144 blks

Ok, good, mkfs gave you 4 AGs per filesystem, 8 between the two. This shouldn't be a problem.

However, ISTR you mentioning that your users transfer multi-GB files, up to 50GB, on a somewhat regular basis, to/from the file server over GbE at ~80-100MB/s. If these big copies hit the same 4 RAID10 spindles it may tend to decrease IMAP response times due to seek contention. This has nothing to do with XFS. It's the nature of shared storage.

...

Thanks again Stan...

You bet.

-- Stan

Charles Marcus

5:52 p.m.

On 2013-05-03 11:10 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:

...

On 5/3/2013 9:21 AM, Charles Marcus wrote:

...
On 2013-05-03 8:34 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:

...
I assume /var will hold user mail dirs. Yes, in /var/vmail

...
Do /var/ and /snaps reside on the same RAID array, physical disks? Yes - vmware host is a Dell R515, with ESXi installed to mirrored internal SATA drives, with 8 drives in RAID 10 for all of the VMs. All storage is this local storage (no SAN/NAS). Your RAID10 is on a PERC correct?

Correct... it is a PERC H700 (integrated)

...

You have four 7.2K SATA stripe spindles.

Actually, no, I have 6 15k 450G SAS6G hard drives (Seagate Cheetah ST3450857SS) in this RAID10 array...

...

Do you mind posting the RAID10 strip/chunk size? The RAID geometry can be critical, not just for mail, but your entire VM setup.

I just used the defaults when I created it (crossing fingers hoping that wasn't a huge mistake). But - I'm not sure how to provide the answer to the question (is my ignorance showing yet?)...

...

Also, what's your mdbox max file size?

Haven't settled on that yet. I was thinking of using the defaults there too. I try to stay with defaults whenever possible, especially if I don't know enough to know why I would want to change something.

...

...
...
How about the other filesystems I snipped? If you have a large number of filesystems atop the same RAID, some of them being XFS, this could create a head thrashing problem under high load increasing latency and thus response times. Ouch... Don't fret yet.

...
This ESXi host also hosts 2 server 2008R2 vms... So, what, 3 production VMs total? That shouldn't be a problem, unless... (read below)

...
...
Would you mind posting: ~$ xfs_info /dev/vg/var meta-data=/dev/mapper/vg-var isize=256 agcount=4, agsize=45875200 ... meta-data=/dev/mapper/vg-snaps isize=256 agcount=4, agsize=262144 blks Ok, good, mkfs gave you 4 AGs per filesystem, 8 between the two. This shouldn't be a problem.

Cool...

...

However, ISTR you mentioning that your users transfer multi-GB files, up to 50GB, on a somewhat regular basis, to/from the file server over GbE at ~80-100MB/s. If these big copies hit the same 4 RAID10 spindles it may tend to decrease IMAP response times due to seek contention. This has nothing to do with XFS. It's the nature of shared storage.

I think you're confusing me/us with someone else. This is definitely not something our users do, not even close. We do deal with a lot of large email attachments though. I used to have a max size of 50MB, but reduced it to 25MB about 8 months ago (equivalent of google's max size)...

So, looks like I'm fine with what I have now...

Thanks again,

Best regards,

Charles

Stan Hoeppner

5 May 5 May

1 p.m.

On 5/4/2013 9:52 AM, Charles Marcus wrote:

...

On 2013-05-03 11:10 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:

...
On 5/3/2013 9:21 AM, Charles Marcus wrote:

...
On 2013-05-03 8:34 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:

...
I assume /var will hold user mail dirs. Yes, in /var/vmail

...
Do /var/ and /snaps reside on the same RAID array, physical disks? Yes - vmware host is a Dell R515, with ESXi installed to mirrored internal SATA drives, with 8 drives in RAID 10 for all of the VMs. All storage is this local storage (no SAN/NAS). Your RAID10 is on a PERC correct?

Correct... it is a PERC H700 (integrated)

Good. 512MB BBWC, LSI based IIRC. Should yield good performance with some margin of safety, though you're still vulnerable to guest fsync being buffered/ignored. Just make sure you disable all the individual drive caches via the H700 BIOS, Dell Linux software management utility (if there is one), Lifecycle manager, etc. I don't use Dell gear so I'm unable to give instructions. If the Dell RAID HBAs are worth their salt they'll disable drive caches automatically when you enable the BBWC. Some HBAs do this, some don't. Just keep in mind the safety net of BBWC is defeated if drive caches are enabled.

...

...
You have four 7.2K SATA stripe spindles.

Actually, no, I have 6 15k 450G SAS6G hard drives (Seagate Cheetah ST3450857SS) in this RAID10 array...

Directly up above you said 8 drives in RAID10. So to make sure we're all on the same page, you have 6x 450GB 15K SAS drives in RAID10, 3 stripe spindles, ~1.35TB raw. That's going to yield a non power of 2 stripe width, which I always try to avoid, though it's not a show stopper.

...

...
Do you mind posting the RAID10 strip/chunk size? The RAID geometry can be critical, not just for mail, but your entire VM setup.

I just used the defaults when I created it (crossing fingers hoping that wasn't a huge mistake).

With VMware, your workloads and user head count, it may make no visible difference. As a general rule for small random IO workloads (which covers most of what you do), smaller strips are better, 32-64KB max. If it defaulted to a 512KB or 1MB strip that's bad. Large strip sizes are really only beneficial for streaming write workloads. When you use large strips with small IO workloads you generally end up sending a disproportionate amount of writes/reads to each drive in the array, thus creating hotspots and decreasing the performance advantage of striping. I.e. you can end up making one disk work harder while the others sit idle more of the time.

...

But - I'm not sure how to provide the answer to the question (is my ignorance showing yet?)...

Fire up whatever tool Dell provides to manage the H700. You should be able to view all the current parameters of the controller.

...

...
Also, what's your mdbox max file size?

Haven't settled on that yet. I was thinking of using the defaults there too. I try to stay with defaults whenever possible, especially if I don't know enough to know why I would want to change something.

IIRC the default is 2MB. The downside to a small value here is more metadata operations, more IOs for full text searches and longer search times, longer backup times, potentially greater filesystem fragmentation, etc. The two advantages I can think of are potentially fewer locking collisions, and a file corruption affects fewer emails. There may be others.

With large mdbox sizes the negatives/positives above flip. As you increase the size the advantages become ever greater, up to a point. You obviously don't want to specify 1GB mdboxes. And if your users regularly send emails with 5MB+ PDF or TIFF attachments then 2MB is probably too small.

Best advice? Take a poll of the list. You'll likely find people using between the 2MB default and 64MB. Some brave souls may be going larger. ...

...

...
However, ISTR you mentioning that your users transfer multi-GB files, up to 50GB, on a somewhat regular basis, to/from the file server over GbE at ~80-100MB/s. If these big copies hit the same 4 RAID10 spindles it may tend to decrease IMAP response times due to seek contention. This has nothing to do with XFS. It's the nature of shared storage.

I think you're confusing me/us with someone else.

Highly possible, and I mean that sincerely. I help a lot of people across various lists. But ISTR when we were discussing your metro ethernet link the possibility of multi-GB file transfers causing contention problems with normal user traffic. Maybe that was your backup process I'm thinking of. That would make sense.

...

This is definitely not something our users do, not even close. We do deal with a lot of large email attachments though. I used to have a max size of 50MB, but reduced it to 25MB about 8 months ago (equivalent of google's max size)...

Get a good idea of what the current max email size is and size mdbox files accordingly.

...

So, looks like I'm fine with what I have now...

You only have 3x 15K effective spindles, which seems a little lite generally, but you've got a decent RAID HBA with 512MB of BBWC which will help write latency tremendously. And you only ~70 users. Your current setup may be fine, as long as drive caches are disabled. Again, ask for other opinions on max mdbox size.

-- Stan

Marc Perkel

4 May 4 May

6:20 p.m.

On 5/2/2013 4:16 AM, Charles Marcus wrote:

...

Hello,

I'm in the process of finalizing the spec for my new dovecot VM, and this is the last question I need to address...

I've read until I'm just about decided on XFS, but I have no experience with it (been using reiserfs on my old box (@ 8 yrs old now), and never had a problem (knock on wood), but considering its current situation (little to no development support for reasons everyone is aware of), I've decided now is the time to switch. It came down to XFS or EXT4, and I like what I've read about XFS, but am unsure how to tune it (or even if I should).

I've decided to use mdbox for storage (been using maildir), and will enable SIS for attachments.

So, anyone (Stan?) have any suggestions? Should I go with EXT4? Or XFS with just the defaults? Or XFS with one or more tuned parameters?

Appreciate any suggestions (including links to docs dealing with tuning XFS for my mail storage conditions that are written more at the layman level) or comments from anyone experienced using both...

Thanks,

For what it's worth if you can afford it I'd use SSD drives. My server screams since I went to SSD.

Reindl Harald

6:50 p.m.

Am 04.05.2013 17:20, schrieb Marc Perkel:

...

For what it's worth if you can afford it I'd use SSD drives. My server screams since I went to SSD

how long running? especially mailserver are write-expensive

short ago where a large SSD advterised at heise which would be dead after a year with the writes of my personal home/virtualization server and the bad thing at SSD's is that the often die from one moment to the next while on rotating media you recognize usaully that a drive goes bad before it is completly gone

Charles Marcus

6:54 p.m.

On 2013-05-04 11:20 AM, Marc Perkel <marc@perkel.com> wrote:

...

For what it's worth if you can afford it I'd use SSD drives. My server screams since I went to SSD.

Hi Marc,

You have no idea how much I would love to use SSDs for this. But the cost was simply not quite justified.

The price keeps coming down on them though - even now, 10 months after buying these servers, the cost would probably be low enough that we may have actually done so, but it was going to be about double the cost of the 15k drives at the time we priced them.

Next time, definitely... :)

Best regards,

Charles

Stan Hoeppner

5 May 5 May

1:22 p.m.

On 5/4/2013 10:54 AM, Charles Marcus wrote:

...

On 2013-05-04 11:20 AM, Marc Perkel <marc@perkel.com> wrote:

...
For what it's worth if you can afford it I'd use SSD drives. My server screams since I went to SSD.

Hi Marc,

You have no idea how much I would love to use SSDs for this. But the cost was simply not quite justified.

The price keeps coming down on them though - even now, 10 months after buying these servers, the cost would probably be low enough that we may have actually done so, but it was going to be about double the cost of the 15k drives at the time we priced them.

Next time, definitely... :)

The verdict is still out on use of "enterprise" SSDs. They've simply not been in use long enough en mass to know what the common failure modes are and what the real lifespan is. I personally wouldn't yet trust long term storage to them, though I have no problem using them for fast temporary storage for things like a busy mail queue.

-- Stan

dirk.jahnke-zumbusch＠desy.de

6:29 p.m.

New subject: [Dovecot] OT/about SSDs (was: XFS vs EXT4 for mail storage)

Hi all,

I found a reference about the robustness of SSDs (and rotating rust) on c0t0d0s0.org (http://www.c0t0d0s0.org/archives/7578-Switching-off-SSDs-and-the-consequence...) pointing to this interesting paper:

http://www.cse.ohio-state.edu/~zhengm/papers/2013_FAST_PowerFaultSSD.pdf

Just in case you ever wondered what might happen to your SSDs if power fails.

Cheers Dirk

----- Ursprüngliche Mail ----- Von: "Stan Hoeppner" <stan@hardwarefreak.com> An: dovecot@dovecot.org Gesendet: Sonntag, 5. Mai 2013 12:22:05 Betreff: Re: [Dovecot] XFS vs EXT4 for mail storage

On 5/4/2013 10:54 AM, Charles Marcus wrote:

...

On 2013-05-04 11:20 AM, Marc Perkel <marc@perkel.com> wrote:

...
For what it's worth if you can afford it I'd use SSD drives. My server screams since I went to SSD.

Hi Marc,

You have no idea how much I would love to use SSDs for this. But the cost was simply not quite justified.

The price keeps coming down on them though - even now, 10 months after buying these servers, the cost would probably be low enough that we may have actually done so, but it was going to be about double the cost of the 15k drives at the time we priced them.

Next time, definitely... :)

-- Stan

Reindl Harald

8 p.m.

New subject: [Dovecot] OT/about SSDs

and this will hurt all the naive people which start buying large mid-range SSD storages and wake up from their dreams the hard way over the long

it will take years until large storages are relieable enough for critical data if you are not a fortune company with endless money

a rotating media will never die silently, a f**ing SD-card in my last phone refused to write any bit on it without any error message, i formatted it with several filesystems, let it completly oveeride with dd (/dev/zero and /dev/urandom) and after put the crap out of the card reader and insert it again the data where the same as 2 weeks ago

the smartphones card-slot died BTW by overheat of the device most likely due this bahvior and did i say that dmesg or /var/log/messages did not contain a single line with a hint of a probelm due writing over hours on the card

from this day on my opinion is that only a idiot stores critical data on this new shiny crap - and yes i know there are large differences between SSD and a SD-card, but that does not change the fact that such a behavior froma rotating media is impossible

Am 05.05.2013 17:29, schrieb dirk.jahnke-zumbusch@desy.de:

...

I found a reference about the robustness of SSDs (and rotating rust) on c0t0d0s0.org (http://www.c0t0d0s0.org/archives/7578-Switching-off-SSDs-and-the-consequence...) pointing to this interesting paper:

http://www.cse.ohio-state.edu/~zhengm/papers/2013_FAST_PowerFaultSSD.pdf

Just in case you ever wondered what might happen to your SSDs if power fails.

----- Ursprüngliche Mail ----- Von: "Stan Hoeppner" <stan@hardwarefreak.com> An: dovecot@dovecot.org Gesendet: Sonntag, 5. Mai 2013 12:22:05 Betreff: Re: [Dovecot] XFS vs EXT4 for mail storage

On 5/4/2013 10:54 AM, Charles Marcus wrote:

...
On 2013-05-04 11:20 AM, Marc Perkel <marc@perkel.com> wrote:

...
For what it's worth if you can afford it I'd use SSD drives. My server screams since I went to SSD.

Hi Marc,

You have no idea how much I would love to use SSDs for this. But the cost was simply not quite justified.

The price keeps coming down on them though - even now, 10 months after buying these servers, the cost would probably be low enough that we may have actually done so, but it was going to be about double the cost of the 15k drives at the time we priced them.

Next time, definitely... :)

The verdict is still out on use of "enterprise" SSDs. They've simply not been in use long enough en mass to know what the common failure modes are and what the real lifespan is. I personally wouldn't yet trust long term storage to them, though I have no problem using them for fast temporary storage for things like a busy mail queue

Stephan von Krawczynski

9:07 p.m.

New subject: [Dovecot] OT/about SSDs

Honestly guys,

most people really have no long-term experiences with flash memory, be it SSD or other forms of. I can tell you from continously using simple CF-Cards as harddisks for about 5 years that _none_ ever got corrupted. Not a single one in 5 years. Taking into account that CF is really no big hit in technology most people really only talking about fearing the black man when talking about flash disks of any kind. Please stop FUD and simply buy acceptable vendors. If you want to see real trouble then buy W* green IT 2 TB. I crashed 5 in a row within the first 3 months of usage.

-- Regards, Stephan

4472

Age (days ago)

4475

Last active (days ago)

List overview

37 comments

11 participants

participants (11)

Alessio Cecchi
Charles Marcus
dirk.jahnke-zumbusch＠desy.de
Gedalya
lst_hoe02＠kwsoft.de
Luigi Rosa
Marc Perkel
Michael Weissenbacher
Reindl Harald
Stan Hoeppner
Stephan von Krawczynski

[Dovecot] XFS vs EXT4 for mail storage

Charles Marcus

Luigi Rosa

Charles Marcus

Stan Hoeppner

Ciao

Charles Marcus

Michael Weissenbacher

Charles Marcus

Stan Hoeppner

Stan Hoeppner

Ciao

Stan Hoeppner

Thanks

Stan Hoeppner

lst_hoe02＠kwsoft.de

Charles Marcus

lst_hoe02＠kwsoft.de

Charles Marcus

Stan Hoeppner

lst_hoe02＠kwsoft.de

Stan Hoeppner

Stan Hoeppner

Charles Marcus

<fs> <mountpoint> <type> <opts> <dump/pass>

Non-LVM volumes

LVM volumes

Stan Hoeppner

<fs> <mountpoint> <type> <opts> <dump/pass>

Charles Marcus

xfs_info /dev/vg/var

xfs_info /dev/vg/snaps

Charles Marcus

Stan Hoeppner

Charles Marcus

Stan Hoeppner

Marc Perkel

Charles Marcus

Stan Hoeppner

dirk.jahnke-zumbusch＠desy.de

tags

participants (11)