[Dovecot] Indexes to MLC-SSD
Hi all,
is anyone on this list who dares/dared to store his index files on a MLC-SSD?
Regards, Patrick
On 10/26/2011 4:13 PM, Patrick Westenberg wrote:
Hi all,
is anyone on this list who dares/dared to store his index files on a MLC-SSD?
I have not. But I can tell you that a 32GB Corsair MLC SSD in my workstation died after 4 months of laughably light duty. It had nothing to do with cell life but low product quality. This was my first foray into SSD. The RMA replacement is still kickin after 2 months, thankfully. I'm holding my breath...
Scanning the reviews on Newegg shows early MLC SSD failures across most brands, early being a year or less. Some models/sizes are worse than others. OCZ has a good reputation overall, but reviews show some of their models to be grenades.
Thus, if you were to put indexes on SSD, you should strongly consider using a mirrored pair.
-- Stan
On 27/10/2011 03:36, Stan Hoeppner wrote:
Hi all,
is anyone on this list who dares/dared to store his index files on a MLC-SSD? I have not. But I can tell you that a 32GB Corsair MLC SSD in my workstation died after 4 months of laughably light duty. It had nothing to do with cell life but low product quality. This was my first foray into SSD. The RMA replacement is still kickin after 2 months,
On 10/26/2011 4:13 PM, Patrick Westenberg wrote: thankfully. I'm holding my breath...
Scanning the reviews on Newegg shows early MLC SSD failures across most brands, early being a year or less. Some models/sizes are worse than others. OCZ has a good reputation overall, but reviews show some of their models to be grenades.
Thus, if you were to put indexes on SSD, you should strongly consider using a mirrored pair.
I don't think you are saying that the advice varies here compared with HDDs? I do agree that some SSDs are showing very early failures, but it's only a tweak to the probability parameter compared with any other storage medium. They ALL fail at some point, and generally well within the life of the rest of the server. Some kind of failure planning is necessary
Caveat the potentially higher failures vs HDDs I don't see any reason why an SSD shouldn't work well? (even more so if you are using maildir where indexes can be regenerated).
More interestingly: for small sizes like 32GB, has anyone played with the "compressed ram with backing store" thing in newer kernels (that I forget the name of now). I think it's been marketed for swap files, but assuming I got the theory it could be used as a ram drive with slow writeback to permanent storage?
Good luck
Ed W
If I/O performance is a concern, you may be interested in ZFS and Flashcache.
Specifically, ZFS' ZIL (ZFS Intent Log) and its L2ARC (Layer 2 Adaptive Read Cache) ZFS does run on Linux http://zfs-fuse.net
Flashcache: https://github.com/facebook/flashcache/
Both of these techniques can use a pair of SSDs in RAID1 rather than a single SSD.
On Oct 27, 2011, at 1:31 AM, Ed W wrote:
On 27/10/2011 03:36, Stan Hoeppner wrote:
Hi all,
is anyone on this list who dares/dared to store his index files on a MLC-SSD? I have not. But I can tell you that a 32GB Corsair MLC SSD in my workstation died after 4 months of laughably light duty. It had nothing to do with cell life but low product quality. This was my first foray into SSD. The RMA replacement is still kickin after 2 months,
On 10/26/2011 4:13 PM, Patrick Westenberg wrote: thankfully. I'm holding my breath...
Scanning the reviews on Newegg shows early MLC SSD failures across most brands, early being a year or less. Some models/sizes are worse than others. OCZ has a good reputation overall, but reviews show some of their models to be grenades.
Thus, if you were to put indexes on SSD, you should strongly consider using a mirrored pair.
I don't think you are saying that the advice varies here compared with HDDs? I do agree that some SSDs are showing very early failures, but it's only a tweak to the probability parameter compared with any other storage medium. They ALL fail at some point, and generally well within the life of the rest of the server. Some kind of failure planning is necessary
Caveat the potentially higher failures vs HDDs I don't see any reason why an SSD shouldn't work well? (even more so if you are using maildir where indexes can be regenerated).
More interestingly: for small sizes like 32GB, has anyone played with the "compressed ram with backing store" thing in newer kernels (that I forget the name of now). I think it's been marketed for swap files, but assuming I got the theory it could be used as a ram drive with slow writeback to permanent storage?
Good luck
Ed W
Dovecot-GDH schrieb:
If I/O performance is a concern, you may be interested in ZFS and Flashcache.
Specifically, ZFS' ZIL (ZFS Intent Log) and its L2ARC (Layer 2 Adaptive Read Cache) ZFS does run on Linux http://zfs-fuse.net
I'm using NexentaStor (Solaris, ZFS) to export iSCSI-LUNs and I was thinking about a SSD based LUN for the indexes. As I'm using multiple servers this LUN will use OCFS2.
I can't imagine running any kind of performance critical app on linux using fuse! There is a native ZFS port going on, but I don't know how stable it is yet.
-----Original Message----- From: dovecot-bounces@dovecot.org [mailto:dovecot-bounces@dovecot.org] On Behalf Of Patrick Westenberg Sent: Tuesday, November 01, 2011 5:19 PM To: dovecot@dovecot.org Subject: Re: [Dovecot] Indexes to MLC-SSD
Dovecot-GDH schrieb:
If I/O performance is a concern, you may be interested in ZFS and Flashcache.
Specifically, ZFS' ZIL (ZFS Intent Log) and its L2ARC (Layer 2 Adaptive Read Cache) ZFS does run on Linux http://zfs-fuse.net
I'm using NexentaStor (Solaris, ZFS) to export iSCSI-LUNs and I was thinking about a SSD based LUN for the indexes. As I'm using multiple servers this LUN will use OCFS2.
I'm using native ZFS (http://zfsonlinux.org) on production here (15k+ users, over 2TB of mail data) with little issues. Dedup and compression disabled, mind that.
Dedup especially is a major source of trouble, I wouldn't recommend it for production just yet.
Cheers, fbscarel
On Tue, Nov 1, 2011 at 19:40, Dan Swartzendruber dswartz@druber.com wrote:
I can't imagine running any kind of performance critical app on linux using fuse! There is a native ZFS port going on, but I don't know how stable it is yet.
-----Original Message----- From: dovecot-bounces@dovecot.org [mailto:dovecot-bounces@dovecot.org] On Behalf Of Patrick Westenberg Sent: Tuesday, November 01, 2011 5:19 PM To: dovecot@dovecot.org Subject: Re: [Dovecot] Indexes to MLC-SSD
Dovecot-GDH schrieb:
If I/O performance is a concern, you may be interested in ZFS and Flashcache.
Specifically, ZFS' ZIL (ZFS Intent Log) and its L2ARC (Layer 2 Adaptive Read Cache) ZFS does run on Linux http://zfs-fuse.net
I'm using NexentaStor (Solaris, ZFS) to export iSCSI-LUNs and I was thinking about a SSD based LUN for the indexes. As I'm using multiple servers this LUN will use OCFS2.
On 03/11/2011 11:32, Felipe Scarel wrote:
I'm using native ZFS (http://zfsonlinux.org) on production here (15k+ users, over 2TB of mail data) with little issues. Dedup and compression disabled, mind that.
OT: but what were the rough criteria that led you to using ZFS over say LVM with EXT4/XFS/btrfs? I can think of plenty for/against reasons for each, just wondering what criteria affected *your* situation? I'm guessing some kind of manageability reason is at the core, but perhaps you can expand on how it's all worked out for you?
I have a fairly static server setup here so I have been "satisfied" with LVM, software raid and mainly ext4. The main thing I miss is simple to use snapshots
Cheers
Ed W
Reasons to choose ZFS were snapshots, and mainly dedup and compression capabilities. I know, it's ironic since I'm not able to use them now due to severe performance issues with them (mostly dedup) turned on.
I do like the emphasis on data integrity and fast on-the-fly configurability of ZFS to an extent, but I wouldn't recommend it highly for new users, especially for production. It works (in fact it's working right now), but has its fair share of troubles.
We've started implementations to move our mail system to a more modular enviroment and we'll probably move away from ZFS. Was a nice experiment nonetheless, I learned quite a bit from it.
On Thu, Nov 3, 2011 at 12:27, Ed W lists@wildgooses.com wrote:
On 03/11/2011 11:32, Felipe Scarel wrote:
I'm using native ZFS (http://zfsonlinux.org) on production here (15k+ users, over 2TB of mail data) with little issues. Dedup and compression disabled, mind that.
OT: but what were the rough criteria that led you to using ZFS over say LVM with EXT4/XFS/btrfs? I can think of plenty for/against reasons for each, just wondering what criteria affected *your* situation? I'm guessing some kind of manageability reason is at the core, but perhaps you can expand on how it's all worked out for you?
I have a fairly static server setup here so I have been "satisfied" with LVM, software raid and mainly ext4. The main thing I miss is simple to use snapshots
Cheers
Ed W
On 11/3/2011 1:24 PM, Felipe Scarel wrote:
Reasons to choose ZFS were snapshots, and mainly dedup and compression capabilities. I know, it's ironic since I'm not able to use them now due to severe performance issues with them (mostly dedup) turned on.
I do like the emphasis on data integrity and fast on-the-fly configurability of ZFS to an extent, but I wouldn't recommend it highly for new users, especially for production. It works (in fact it's working right now), but has its fair share of troubles.
We've started implementations to move our mail system to a more modular enviroment and we'll probably move away from ZFS. Was a nice experiment nonetheless, I learned quite a bit from it.
I find this all very interesting...
"Please keep in mind the current 0.5.2 stable release does not yet support a mountable filesystem. This functionality is currently available only in the 0.6.0-rc6 release candidate."
https://github.com/downloads/zfsonlinux/zfs/zfs-0.6.0-rc6.tar.gz
"Uploaded October 14, 2011"
So in the past ~two weeks, you converted your 15K+ user production server to ZFS on Linux, as an experiment, and have now decided to change to another filesystem solution, a mere two weeks later? Or am I misinterpreting the date given that 0.6.0-rc6 was released?
-- Stan
I'm using the GIT version, that 0.5 version is quite a bit outdated. I was not all that worried about using ZFS on this experiment because we do have the old mail storage on ext3 synchronized and ready to switch back, and I could disable dedup and compression on-the-fly if needed (which eventually was).
On Fri, Nov 4, 2011 at 00:16, Stan Hoeppner stan@hardwarefreak.com wrote:
On 11/3/2011 1:24 PM, Felipe Scarel wrote:
Reasons to choose ZFS were snapshots, and mainly dedup and compression capabilities. I know, it's ironic since I'm not able to use them now due to severe performance issues with them (mostly dedup) turned on.
I do like the emphasis on data integrity and fast on-the-fly configurability of ZFS to an extent, but I wouldn't recommend it highly for new users, especially for production. It works (in fact it's working right now), but has its fair share of troubles.
We've started implementations to move our mail system to a more modular enviroment and we'll probably move away from ZFS. Was a nice experiment nonetheless, I learned quite a bit from it.
I find this all very interesting...
"Please keep in mind the current 0.5.2 stable release does not yet support a mountable filesystem. This functionality is currently available only in the 0.6.0-rc6 release candidate."
https://github.com/downloads/zfsonlinux/zfs/zfs-0.6.0-rc6.tar.gz
"Uploaded October 14, 2011"
So in the past ~two weeks, you converted your 15K+ user production server to ZFS on Linux, as an experiment, and have now decided to change to another filesystem solution, a mere two weeks later? Or am I misinterpreting the date given that 0.6.0-rc6 was released?
-- Stan
I'm using NexentaStor (Solaris, ZFS) to export iSCSI-LUNs and I was thinking about a SSD based LUN for the indexes. As I'm using multiple servers this LUN will use OCFS2.
Given that the SAN always has the network latency behind it, might you be better to look at putting the SSDs in the frontend machines? Obviously this then needs some way to make users "sticky" to one machine (or some few machines) where the indexes are stored?
This seems theoretically likely to give you higher IOPs to the index than having them on the OCFS2 storage? (At a trade off with more complexity for the load balancer front end...)
Ed W
Ed W schrieb:
I'm using NexentaStor (Solaris, ZFS) to export iSCSI-LUNs and I was thinking about a SSD based LUN for the indexes. As I'm using multiple servers this LUN will use OCFS2.
Given that the SAN always has the network latency behind it, might you be better to look at putting the SSDs in the frontend machines? Obviously this then needs some way to make users "sticky" to one machine (or some few machines) where the indexes are stored?
Storing the indexes on several machines? In this case I have to synchronize them.
Patrick Westenberg wrote:
Ed W schrieb:
I'm using NexentaStor (Solaris, ZFS) to export iSCSI-LUNs and I was thinking about a SSD based LUN for the indexes. As I'm using multiple servers this LUN will use OCFS2.
Given that the SAN always has the network latency behind it, might you be better to look at putting the SSDs in the frontend machines? Obviously this then needs some way to make users "sticky" to one machine (or some few machines) where the indexes are stored?
Storing the indexes on several machines? In this case I have to synchronize them.
maybe i am missing something. if a client has to fetch the index, the server has to read the index from disk and pass it back. the network latency is unavoidable, but i don't see why putting the fastest possible SSD on the server isn't a win. possibly i am misunderstanding something?
On 03/11/2011 16:53, Patrick Westenberg wrote:
Ed W schrieb:
I'm using NexentaStor (Solaris, ZFS) to export iSCSI-LUNs and I was thinking about a SSD based LUN for the indexes. As I'm using multiple servers this LUN will use OCFS2.
Given that the SAN always has the network latency behind it, might you be better to look at putting the SSDs in the frontend machines? Obviously this then needs some way to make users "sticky" to one machine (or some few machines) where the indexes are stored?
Storing the indexes on several machines? In this case I have to synchronize them.
See the "sticky" in my reply. You use one of several techniques to ensure that users always end up on the server with the indexes on. That way much of the IO is served from that local machine and you only access the SAN for the (in theory much less frequent) access to the mail files themselves.
Clearly if the machine with the indexes on dies then the load balancer needs to pick a new machine and there will be delay/io/etc while the indexes are regenerated. Various techniques could mitigate this...
I don't have such a larger system - please ignore all my advice... The basis for the suggestion is that I understand file access (locking in particular) is "expensive" on OCFS2/GFS. Therefore I read here on this list that others have found performance issues accessing maildir over OCFS2? It's also not hard to find benchmarks that show OCFS2/GFS are "fast", but slower than accessing the same storage without using a cluster filesystem - this makes sense. Hence it seems like a trade between convenience of storing everything on a central store and "some" performance improvement from a more complex system...
I think if you search on benchmarks of DRBD vs OCFS2 and read here on the list about the "director" and "proxy" services you can see the point? I'm just trying to help you see the effects you might want to measure! (I don't have a system large enough to know much about this stuff from experience...)
Good luck!
Ed W
Ed W schrieb:
See the "sticky" in my reply. You use one of several techniques to ensure that users always end up on the server with the indexes on. That way much of the IO is served from that local machine and you only access the SAN for the (in theory much less frequent) access to the mail files themselves.
I know you can afford that (IMAP) users always end up on one particular server but afaik this only works for incoming IMAP connections.
My mail exchangers use dovecot-lda and I think indexes will be written from these servers too or am I wrong with this?
Patrick
On Mon, 2011-11-07 at 01:08 +0100, Patrick Westenberg wrote:
Ed W schrieb:
See the "sticky" in my reply. You use one of several techniques to ensure that users always end up on the server with the indexes on. That way much of the IO is served from that local machine and you only access the SAN for the (in theory much less frequent) access to the mail files themselves.
I know you can afford that (IMAP) users always end up on one particular server but afaik this only works for incoming IMAP connections.
My mail exchangers use dovecot-lda and I think indexes will be written from these servers too or am I wrong with this?
You can use LMTP and LMTP proxying.
Timo Sirainen schrieb:
On Mon, 2011-11-07 at 01:08 +0100, Patrick Westenberg wrote:
My mail exchangers use dovecot-lda and I think indexes will be written from these servers too or am I wrong with this?
You can use LMTP and LMTP proxying.
I already use lmtp:unix:private/dovecot-lmtp as transport but where is the link to the indexes?
On Wed, 2011-11-16 at 19:40 +0100, Patrick Westenberg wrote:
Timo Sirainen schrieb:
On Mon, 2011-11-07 at 01:08 +0100, Patrick Westenberg wrote:
My mail exchangers use dovecot-lda and I think indexes will be written from these servers too or am I wrong with this?
You can use LMTP and LMTP proxying.
I already use lmtp:unix:private/dovecot-lmtp as transport but where is the link to the indexes?
You can switch to lmtp:tcp:1.2.3.4:24 where 1.2.3.4 would be Dovecot LMTP proxy, which would forward the connection to the backend server which handles that user's IMAP/POP3/LMTP connections.
Timo Sirainen schrieb:
On Wed, 2011-11-16 at 19:40 +0100, Patrick Westenberg wrote:
I already use lmtp:unix:private/dovecot-lmtp as transport but where is the link to the indexes?
You can switch to lmtp:tcp:1.2.3.4:24 where 1.2.3.4 would be Dovecot LMTP proxy, which would forward the connection to the backend server which handles that user's IMAP/POP3/LMTP connections.
I don't know if we're talking about the same :)
On Mon, Nov 21, 2011 at 10:45:49PM +0100, Patrick Westenberg wrote:
Timo Sirainen schrieb:
On Wed, 2011-11-16 at 19:40 +0100, Patrick Westenberg wrote:
I already use lmtp:unix:private/dovecot-lmtp as transport but where is the link to the indexes?
You can switch to lmtp:tcp:1.2.3.4:24 where 1.2.3.4 would be Dovecot LMTP proxy, which would forward the connection to the backend server which handles that user's IMAP/POP3/LMTP connections.
I don't know if we're talking about the same :)
I wondered that too. It looked to me like you tried to ask where the lmtp-service picks up the path to indexes, right? AFAIU it picks that up from the /var/run/dovecot/auth-master socket.
-jf
Jan-Frode Myklebust schrieb:
I wondered that too. It looked to me like you tried to ask where the lmtp-service picks up the path to indexes, right? AFAIU it picks that up from the /var/run/dovecot/auth-master socket.
No. I want to know if dovecot writes to the indexes if it receives a mail via lmtp.
Someone proposed to store the index files on a locally installed SSD on a frontend (imap) machine and stick the users to that machine but if the lmtp-service writes to the indexes (and I think he does), that machine needs access to the indexes too which will bring us back to shared storage.
On Tue, Nov 22, 2011 at 11:17:12AM +0100, Patrick Westenberg wrote:
No. I want to know if dovecot writes to the indexes if it receives a mail via lmtp.
Someone proposed to store the index files on a locally installed SSD on a frontend (imap) machine and stick the users to that machine but if the lmtp-service writes to the indexes (and I think he does), that machine needs access to the indexes too which will bring us back to shared storage.
Ah, then Timo's reply was right. He suggested you do the lmtp-deliveries to the same server that you would send you imap-user to. You can do this trough dovecot director and lmtp-proxying.
So instead of:
lmtp:unix:private/dovecot-lmtp
you should use:
lmtp:tcp:1.2.3.4:24
where 1.2.3.4 would be the Dovecot LMTP proxy that proxies to the same machine as you would use for imap for this particular recipient.
-jf
On Tue, 22 Nov 2011 11:45:47 +0100, Jan-Frode Myklebust janfrode@tanso.net wrote:
Ah, then Timo's reply was right. He suggested you do the lmtp-deliveries to the same server that you would send you imap-user to. You can do this trough dovecot director and lmtp-proxying.
So instead of:
lmtp:unix:private/dovecot-lmtp
you should use:
lmtp:tcp:1.2.3.4:24
where 1.2.3.4 would be the Dovecot LMTP proxy that proxies to the same machine as you would use for imap for this particular recipient.
I see. So as far as I understood:
- I set up a new server as LMTP proxy for my two MX 10 to connect to
- the proxy redirects to my backend imap servers which will then store the mails on my shared storage and the index files to a local disk (so I have to enable LMTP additionally to enable this servers to store the mails)
- I set up a frontend imap server for my users to connect to which will redirect them to the backend servers
Am I right so far?
Patrick
On 11/3/2011 10:21 AM, Ed W wrote:
I'm using NexentaStor (Solaris, ZFS) to export iSCSI-LUNs and I was thinking about a SSD based LUN for the indexes. As I'm using multiple servers this LUN will use OCFS2.
Given that the SAN always has the network latency behind it, might you be better to look at putting the SSDs in the frontend machines?
The latency of the GbE IP network, iSCSI HBAs, GbE switches, etc, is but a fraction of the overhead of the out of band OCFS metadata exchange between cluster members, and the general overhead of OCFS, or of cluster filesystems in general.
Obviously this then needs some way to make users "sticky" to one machine (or some few machines) where the indexes are stored?
This seems theoretically likely to give you higher IOPs to the index than having them on the OCFS2 storage? (At a trade off with more complexity for the load balancer front end...)
Following this logic, simply using local mechanical disk would yield improvement without the cost of SSDs. Depending on the number of nodes, putting a couple of SSDs in the SAN controller may likely be cheaper overall than adding mech disks to each node, let alone SSDs to each node. The random IO latency of SSD is so considerably lower than mechanical disk, even with the OCFS and iSCSI SAN overhead, overall read/write latency will likely be lower than using local mech disk in the nodes. And you get to retain centralized storage of the indexes, eliminating stickiness complexity issues.
Something else to consider is the read/write caching performance of NexentaStor (I've never used it, know nothing about it). If it's very good, and the NexentaStor host has gobs of RAM (think 64-128GB), then adding SSDs for indexes may not improve performance much, if any, depending on the concurrent user load.
I've read cases where adding SLC-SSD to high end FC SAN controllers with gobs of writeback cache RAM yielded little benefit with similar random IO workloads, simply because the cache was never taxed enough to force regular flushing. If your cache is large and fast enough to buffer most of your IOPS, then your current spindle speed is already irrelevant. In such a case adding SSD will yield little, or no, advantage.
-- Stan
Dovecot-GDH ghandidrivesahumvee@rocketfish.com writes:
If I/O performance is a concern, you may be interested in ZFS and Flashcache.
Specifically, ZFS' ZIL (ZFS Intent Log) and its L2ARC (Layer 2 Adaptive Read Cache) ZFS does run on Linux http://zfs-fuse.net
Flashcache: https://github.com/facebook/flashcache/
That site has no information about what flashcache is.
https://github.com/facebook/flashcache/blob/master/doc/flashcache-doc.txt
On Nov 28, 2011, at 4:04 PM, Micah Anderson wrote:
Dovecot-GDH ghandidrivesahumvee@rocketfish.com writes:
If I/O performance is a concern, you may be interested in ZFS and Flashcache.
Specifically, ZFS' ZIL (ZFS Intent Log) and its L2ARC (Layer 2 Adaptive Read Cache) ZFS does run on Linux http://zfs-fuse.net
Flashcache: https://github.com/facebook/flashcache/
That site has no information about what flashcache is.
participants (9)
-
Dan Swartzendruber
-
Dovecot-GDH
-
Ed W
-
Felipe Scarel
-
Jan-Frode Myklebust
-
Micah Anderson
-
Patrick Westenberg
-
Stan Hoeppner
-
Timo Sirainen