[Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Hello
We are using dovecot 1.2.x. In our setup we will have 1200 concurrent imap users (maildirs) and we have 2xraid5 sas 15k diks mounted by iSCSI. The dovecot server (RHEL 5 x64) is
a virtual machine in our vmware esx cluster. We want to minimize disk I/O, what config options should we use. We can "exchange" CPU & RAM to minimize disk i/o.
Should we change to dovecot 2.0? Maybe mdbox can help us? Maybe ext4 instead of ext3? Any idea is welcome.
Regards
Javier
Quoting javierdemiguel@us.es:
in our vmware esx cluster. We want to minimize disk I/O, what config options should we use. We can "exchange" CPU & RAM to minimize disk i/o.
Depends on what you are doing -- pop3, imap, both, deliver or some other LDA? Do you care if the indexes are lost on reboot or not?
You might try putting the indexes in, memory (either via dovecot settings or a RAM DISK) or on SSD.
You could also try using ext3/4 with an external journal on a SSD.
"SSD" would preferably be an enterprise SSD, but it could be a lesser SSD, or even a USB memory stick (replaced periodically). Which is right depends on your needs and budget.
Failing that, you should probably put the indexes on the fastest disks possible (might be local, might be iscsi, you'd have to benchmark).
Should we change to dovecot 2.0?
For any new system, I'd start with the most recent dovecot 2.x version. How easy that is if you are upgrading depends on what version you run now.
-- Eric Rostetter The Department of Physics The University of Texas at Austin
Go Longhorns!
Eric Rostetter put forth on 12/10/2010 10:11 PM:
Quoting javierdemiguel@us.es:
in our vmware esx cluster. We want to minimize disk I/O, what config options should we use. We can "exchange" CPU & RAM to minimize disk i/o.
Depends on what you are doing -- pop3, imap, both, deliver or some other LDA? Do you care if the indexes are lost on reboot or not?
You might try putting the indexes in, memory (either via dovecot settings or a RAM DISK) or on SSD.
<snipped good bare metal recommendations>
Eric you missed up above that he's running Dovecot on an ESX cluster, so SSDs or any hardware dedicated to Dovecot isn't possible for the OP.
Javier, email is an I/O intensive application, whether an MTA spool, an IMAP server, or POP server. The more concurrent users you have the greater the file I/O. Thus, the only way to decrease packets across your iSCSI SAN is to increase memory so more disk blocks are cached. But keep in mind, at one point or another, everything has to be written to disk, or deleted from disk. So, while you can decrease disk *reads* by adding memory to the VM, you will never be able to decrease writes, you can only delay them with things like write cache, or in the case of XFS, the delaylog mount option. These comments refer to mail file I/O.
IMAP is a very file I/O intensive application. As Eric mentioned, you could put your user *index* files in a RAM disk or make them memory resident via Dovecot directive. This would definitely decrease disk reads and writes quite a bit. Also as Eric mentioned, if you reboot you lose the indexes, and along with them Dovecot's key performance enabler. User response times will be poor until the indexes get rebuilt.
If this is a POP server, then you really have no way around the disk I/O issue. Due to the nature of POP, there is very little opportunity to do effective disk block caching, unless the bulk of your users configure their clients to check mail every 60 seconds, 24 hours a day. In this scenario you have good opportunity for block caching. However, if the bulk of the users only pop the server every half hour or more, there is no opportunity for file caching. The files will be read from disk every time a user pops the server. However, in this scenario, you'd have relatively low disk I/O load to begin with, and you'd not be inquiring here. Thus, I can only assume your Dovecot server is configured for IMAP.
So, either:
- Increase memory and/or
- Move indexes to memory
#1 will be less effective at decreasing I/O. #2 will be very effective, but at the cost of lost indexes upon reboot or crash.
-- Stan
Quoting Stan Hoeppner stan@hardwarefreak.com:
<snipped good bare metal recommendations>
Eric you missed up above that he's running Dovecot on an ESX cluster, so SSDs or any hardware dedicated to Dovecot isn't possible for the OP.
Well, it is true I know nothing about vmware/ESX. I know in my virtual machine setups, I _can_ give the virtual instances access to devices which are not used by other virtual instances. This is what I would do. Yes, it is still virtualized, but it is dedicated, and should still perform pretty well -- faster than shared storage, and in the case of SSD faster than normal disk or iscsi.
Javier, email is an I/O intensive application, whether an MTA spool, an IMAP server, or POP server. The more concurrent users you have the greater the file I/O. Thus, the only way to decrease packets across your iSCSI SAN is to increase memory so more disk blocks are cached.
He was already asking about throwing memory at the problem, and I think he implied he had a lot of memory. As such, the caching is there already. Your statement is true, but it is also a "zero config" option if he really does have lots of memory in the machine.
But keep in mind, at one point or another, everything has to be written to disk, or deleted from disk. So, while you can decrease disk *reads* by adding memory to the VM, you will never be able to decrease writes, you can only delay them with things like write cache, or in the case of XFS, the delaylog mount option. These comments refer to mail file I/O.
And in ext3, the flush rate. Good point, that I forgot about. It is set to a very small value by default (2-3 seconds maybe), and can be increased without too much danger (to say 10-30 seconds).
IMAP is a very file I/O intensive application. As Eric mentioned, you could put your user *index* files in a RAM disk or make them memory resident via Dovecot directive. This would definitely decrease disk reads and writes quite a bit. Also as Eric mentioned, if you reboot you lose the indexes, and along with them Dovecot's key performance enabler. User response times will be poor until the indexes get rebuilt.
Assuming normal downtime stats, this would still be a huge win. Since the machine rarely goes down, it would rarely need to rebuild indexes, and hence would only run poorly a very small percentage of the time. Of course, it could run _very_ poorly right after a reboot for a while, but then will be back to normal soon enough.
One way to help mitigate this if using a RAM disk is have your shutdown script flush the RAM disk to physical disk (after stoping dovecot) and the reload it to RAM disk at startup (before starting dovecot). This isn't possible if you use the dovecot index memory settings though.
If this is a POP server, then you really have no way around the disk I/O issue.
I agree. POP is very inefficient...
So, either:
- Increase memory and/or
- Move indexes to memory
#1 will be less effective at decreasing I/O. #2 will be very effective, but at the cost of lost indexes upon reboot or crash.
Still some room for filesystem tuning, of course, but the above two options are of course the ones that will make the largest performance improvement IMHO.
-- Stan
-- Eric Rostetter The Department of Physics The University of Texas at Austin
Go Longhorns!
Guys. Who is interested in obvious reasoning? More memory, bare metal, depends on your needs, bla-bla-bla. Let me remind original concrete question. I am also interested.
We can "exchange" CPU & RAM to minimize disk i/o. Should we change to dovecot 2.0? Maybe mdbox can help us? Maybe ext4 instead of ext3?
Does anybody know concrete answers? Let's consider IMAP and LDA and forget POP3.
- Is migration to dovecot 2.0 good idea if I want to decrease I/O?
- Can mdbox help decrease IO?
- What is better for mdbox or maildir - ext3 or ext4?
On Sat, 11 Dec 2010 09:48:36 -0600, Eric Rostetter rostetter@mail.utexas.edu wrote:
Quoting Stan Hoeppner stan@hardwarefreak.com:
<snipped good bare metal recommendations>
Eric you missed up above that he's running Dovecot on an ESX cluster, so SSDs or any hardware dedicated to Dovecot isn't possible for the OP.
Well, it is true I know nothing about vmware/ESX. I know in my virtual machine setups, I _can_ give the virtual instances access to devices which are not used by other virtual instances. This is what I would do. Yes, it is still virtualized, but it is dedicated, and should still perform pretty well -- faster than shared storage, and in the case of SSD faster than normal disk or iscsi.
Javier, email is an I/O intensive application, whether an MTA spool, an IMAP server, or POP server. The more concurrent users you have the greater the file I/O. Thus, the only way to decrease packets across your iSCSI SAN is to increase memory so more disk blocks are cached.
He was already asking about throwing memory at the problem, and I think he implied he had a lot of memory. As such, the caching is there already. Your statement is true, but it is also a "zero config" option if he really does have lots of memory in the machine.
But keep in mind, at one point or another, everything has to be written to disk, or deleted from disk. So, while you can decrease disk *reads* by adding memory to the VM, you will never be able to decrease writes, you can only delay them with things like write cache, or in the case of XFS, the delaylog mount option. These comments refer to mail file I/O.
And in ext3, the flush rate. Good point, that I forgot about. It is set to a very small value by default (2-3 seconds maybe), and can be increased without too much danger (to say 10-30 seconds).
IMAP is a very file I/O intensive application. As Eric mentioned, you could put your user *index* files in a RAM disk or make them memory resident via Dovecot directive. This would definitely decrease disk reads and writes quite a bit. Also as Eric mentioned, if you reboot you lose the indexes, and along with them Dovecot's key performance enabler. User response times will be poor until the indexes get rebuilt.
Assuming normal downtime stats, this would still be a huge win. Since the machine rarely goes down, it would rarely need to rebuild indexes, and hence would only run poorly a very small percentage of the time. Of course, it could run _very_ poorly right after a reboot for a while, but then will be back to normal soon enough.
One way to help mitigate this if using a RAM disk is have your shutdown script flush the RAM disk to physical disk (after stoping dovecot) and the reload it to RAM disk at startup (before starting dovecot). This isn't possible if you use the dovecot index memory settings though.
If this is a POP server, then you really have no way around the disk I/O issue.
I agree. POP is very inefficient...
So, either:
- Increase memory and/or
- Move indexes to memory
#1 will be less effective at decreasing I/O. #2 will be very effective, but at the cost of lost indexes upon reboot or crash.
Still some room for filesystem tuning, of course, but the above two options are of course the ones that will make the largest performance improvement IMHO.
-- Stan
Quoting a@test123.ru:
Guys. Who is interested in obvious reasoning?
The same people who are interested in vague questions?
Let me remind original concrete question. I am also interested.
We can "exchange" CPU & RAM to minimize disk i/o. Should we change to dovecot 2.0? Maybe mdbox can help us? Maybe ext4 instead of ext3?
Uhm, well, again, depends on your needs. Pop3? Imap? Both? Number of accounts? Can't really help without more details. Maybe I can't help with more details either, but that is a risk you take on a mailing list.
- Is migration to dovecot 2.0 good idea if I want to decrease I/O?
Depends on what version you run now really. But I would recommend it anyway just on principle.
- Can mdbox help decrease IO?
- What is better for mdbox or maildir - ext3 or ext4?
Dont' know. But you can certainly tune the FS in either case (atime/dtime, flush rate, external journal, etc). Some will say XFS is better, etc. Besides, you can hardly decide the best FS until you know the mailbox format (mbox, maildir, mdbox, etc).
If you want concret answers, you need concret questions...
-- Eric Rostetter The Department of Physics The University of Texas at Austin
Go Longhorns!
Hi,
I am running a fair amount of stored e-mails on maildirs(10 GB+) in 846 folders that gets a fair amount of searching, and 20+ users accessing them, mostly via IMAP and a few POP3 accounts. I am running these on a Linode XEN server and have yet to hit any hard limits of "bare metal". User and Virual databases are plain text files.
# 1.2.9: /etc/dovecot/dovecot.conf # OS: Linux 2.6.32.16-linode28 i686 Ubuntu 10.04.1 LTS ext3
Postfix + Dovecot + SSL for Both with Amavisd seems a breeze. No problems related to infrastructure yet.
Yet I will wait to see how this system will grow, as we are planning to include more users and doamins in our system in 2011.
So:
- I am very interested in these questions about performance
- My setup should provide some people another way to do things, since I am not using mysql, ldap etc., kust plain old text files update via scripts
- I am goind to test this system as we scale out, yet we are bound to add LDAP for authentication for single sign on at some point, and I will try to publish my benchmarks public, even if it is just for publicity's sake.
Regards, Kerem
On Sat, Dec 11, 2010 at 6:47 PM, Eric Rostetter rostetter@mail.utexas.eduwrote:
Quoting a@test123.ru:
Guys. Who is interested in obvious reasoning?
The same people who are interested in vague questions?
Let me remind original concrete question. I am also interested.
We can "exchange" CPU & RAM to minimize disk i/o.
Should we change to dovecot 2.0? Maybe mdbox can help us? Maybe ext4 instead of ext3?
Uhm, well, again, depends on your needs. Pop3? Imap? Both? Number of accounts? Can't really help without more details. Maybe I can't help with more details either, but that is a risk you take on a mailing list.
- Is migration to dovecot 2.0 good idea if I want to decrease I/O?
Depends on what version you run now really. But I would recommend it anyway just on principle.
- Can mdbox help decrease IO?
- What is better for mdbox or maildir - ext3 or ext4?
Dont' know. But you can certainly tune the FS in either case (atime/dtime, flush rate, external journal, etc). Some will say XFS is better, etc. Besides, you can hardly decide the best FS until you know the mailbox format (mbox, maildir, mdbox, etc).
If you want concret answers, you need concret questions...
-- Eric Rostetter The Department of Physics The University of Texas at Austin
Go Longhorns!
-- Kerem Erciyes Sistem Danismani http://proje.keremerciyes.com
kerem.erciyes@gmail.com +90 532 737 05 83
On Sat, 2010-12-11 at 23:05 +0700, a@test123.ru wrote:
Does anybody know concrete answers? Let's consider IMAP and LDA and forget POP3.
- Is migration to dovecot 2.0 good idea if I want to decrease I/O?
That alone makes no difference.
- Can mdbox help decrease IO?
Hopefully! No one has still given me any real world numbers though..
Eric Rostetter put forth on 12/11/2010 9:48 AM:
Well, it is true I know nothing about vmware/ESX. I know in my virtual machine setups, I _can_ give the virtual instances access to devices which are not used by other virtual instances. This is what I would do. Yes, it is still virtualized, but it is dedicated, and should still perform pretty well -- faster than shared storage, and in the case of SSD faster than normal disk or iscsi.
He's running an ESX cluster, which assumes use of HA and Vmotion. For Vmotion to work, each node in the cluster must have direct hardware access to every storage device. Thus, to use an SSD, it would have to be installed in Javier's iSCSI SAN array. Many iSCSI arrays are relatively inexpensive and don't offer SSD support.
However, Javier didn't ask for ways to increase his I/O throughput. He asked for the opposite. I assume this is because they have a 1 GbE based ethernet SAN, and probably only 2 or 4, GbE ports on the SAN array controller. With only 200 to 400MB/s bidirectional bandwidth, and many busy guests in the eSX farm, probably many applications besides Dovecot, Javier's organization is likely coming close to bumping the up against the bandwidth limits of the 1 GbE links on the SAN array controller. Thus, adding an SSD to the mix would exacerbate the I/O problem.
Thus, putting the index files in a ramdisk or using the Deovecot memory only index file parameter are really his only two options that I can think of that will help in the way he desires.
He was already asking about throwing memory at the problem, and I think he implied he had a lot of memory. As such, the caching is there already. Your statement is true, but it is also a "zero config" option if he really does have lots of memory in the machine.
He has physical memory available, but he isn't currently assigning it to the Dovecot guest. To do so would require changing the memory setting in ESX for this guest, then rebooting the guest (unless both ESX and his OS support hot plug memory--I don't know if ESX does). This is what Javier was referring to when stating "adding memory".
And in ext3, the flush rate. Good point, that I forgot about. It is set to a very small value by default (2-3 seconds maybe), and can be increased without too much danger (to say 10-30 seconds).
Just to be clear and accurate here, and it's probably a little OT to the thread, XFS delaylog isn't designed to decrease filesystem log I/O activity. It was designed to dramatically increase the rate of write operations to the journal log--metadata operations--and the I/O efficiency for metadata ops.
The major visible benefit of this is a massive increase in delete performance for many tens of thousands (or more) of files. It decreases journal log file fragmentation as more writes can be packed into each inode due to in memory organization before the physical write. This packing thus decreases physical disk I/O as fewer, larger blocks are written per I/O. XFS with delaylog is an excellent match for maildir storage. It won't help much at all with mbox, very slightly more with mdbox.
XFS delaylog is a _perfect_ match for the POP3 workload. Each time a user pulls, then deletes all messages, delaylog will optimize and then burst the metadata journal write operations to disk, again, with far fewer physical I/Os due to the inode optimization.
XFS with delaylog is now much faster than any version of ReiserFS, whose claim to fame was lighting fast mass file deletion. As of 2.6.36, XFS is now the fastest filesystem, and not just on Linux, for almost any workload. This assuming real storage hardware that can take handle massive parallelization of reads and writes. EXT3 is still faster on a single disk system. But EXT3 is the "everyman" OS, which is optimized more for the single disk case. XFS was and still is designed for large parallel servers with big fast storage.
Assuming normal downtime stats, this would still be a huge win. Since the machine rarely goes down, it would rarely need to rebuild indexes, and hence would only run poorly a very small percentage of the time. Of course, it could run _very_ poorly right after a reboot for a while, but then will be back to normal soon enough.
I totally concur.
One way to help mitigate this if using a RAM disk is have your shutdown script flush the RAM disk to physical disk (after stoping dovecot) and the reload it to RAM disk at startup (before starting dovecot).
Excellent idea Eric. I'd never considered this. Truly, that's a fantastic, creative solution, and should be relatively straightforward to implement.
This isn't possible if you use the dovecot index memory settings though.
Yeah, I think the ramdisk is the way to go here. At least if/until a better solution can be found. I don't really see there is one, other than his org investing in a faster SAN architecture such as 4/8Gb FC or 10 Gbit iSCSI.
The former can be had relatively inexpensively. The latter is still really pricy. 10 GbE switches and HBAs are very pricey, and there are only a handful of iSCSI vendors offering 10 GbE SAN arrays. One is NetApp. Their 10 GbE NICs for their filers run in the multiple thousand dollar range per card. And their filers are the most expensive on the planet last I checked, much of that due to the flexibility. A single NetApp can support all speeds of Ethernet for iSCSI and NFS/CIFS access, as well as 2/4/8 Gbit FC. I think they offer Infiniband connectivity as well.
If this is a POP server, then you really have no way around the disk I/O issue.
I agree. POP is very inefficient...
XFS with delaylog can cut down substantially on the metadata operations associated with POP3 mass delete. Without this FS and delaylog, yes, POP3 I/O is very inefficient.
Still some room for filesystem tuning, of course, but the above two options are of course the ones that will make the largest performance improvement IMHO.
Since Javier is looking for ways to decrease I/O load on the SAN, not necessarily increase Dovecot performance, I think putting the index files on a ramdisk is best thing to try first. It may not be a silver bullet. If he's still got spare memory to add to this guest, doing both would be better. Using a ramdisk for the index files will instantly remove all index I/O from the SAN. More of Dovecot's IMAP I/O is to the index files than mail files isn't it? So by moving the index files to ramdisk you should pretty much instantly remove half your SAN I/O load. This is assuming that Javier currently stores his index files on a SAN LUN.
-- Stan
On 12/12/2010 00:49, Stan Hoeppner wrote:
Since Javier is looking for ways to decrease I/O load on the SAN, not necessarily increase Dovecot performance, I think putting the index files on a ramdisk is best thing to try first. It may not be a silver bullet. If he's still got spare memory to add to this guest, doing both would be better. Using a ramdisk for the index files will instantly remove all index I/O from the SAN. More of Dovecot's IMAP I/O is to the index files than mail files isn't it? So by moving the index files to ramdisk you should pretty much instantly remove half your SAN I/O load. This is assuming that Javier currently stores his index files on a SAN LUN.
Speaking of ramdisk/SSD, has anyone tried a PCIe SSD for indexes?
~Seth
participants (7)
-
a@test123.ru
-
Eric Rostetter
-
javierdemiguel@us.es
-
Kerem Erciyes
-
Seth Mattinen
-
Stan Hoeppner
-
Timo Sirainen