[Dovecot] Questions about converting maildir to mdbox.
First of all i would like to thanks Timo for the excellent job done in dovecot.
We are using it in you prodution server with about 5 millions of emails and it is pretty good, only a performance problens but i guess it is realated to ocfs2 that we are using.
So lets begin:
We are using dovecot 2.0.6 with maildir in an ocfs2 partiton.
It is pretty slow access in peak time, but it is related to ocfs2 as i said before, i am trying to mimimize the ocfs2 latancy by using mdbox. Another problem that we have are Full backup that takes about 24h and incremental takes about 13h
I am thinking in converting some email for testing propose, but i am wondoring how would dovecot convert this, like if i convert all mail from maildir to mdbox it would make many mailbox based on this seetings ? Or in the first convert it would make a big mdbox file and after it it will rotate by day or it would make many boxes just based on size?
mdbox_rotate_size = 5m mdbox_rotate_interval = 1d
Well, another thing can i make rotate interval happens in a certain time of day? end from where it takes this time ? it rotates at midnight ?
Another one!, if i lose the index files what really happens ? Sometimes in my server the index get corrupet and as i am reading mdbox index files held information that is not in other place. So what should i do to prevent any possible lose of data even if i lose the indexes?
I was reading about backup restore also, but i did not get it pretty well. With maildir we simple restore the files and the index gets rebuilt! What should i do with mdbox ? Restore the files and the index ? But how about the other emails that the account got after the backup, how i should merge the multiple files and more importanet how do i merge the index files ?
Well, sorry about my english!
And Thanks alot guys!
[]'sf.rique
Henrique Fernandes put forth on 4/10/2011 12:29 PM:
We are using dovecot 2.0.6 with maildir in an ocfs2 partiton.
It is pretty slow access in peak time, but it is related to ocfs2 as i said before, i am trying to mimimize the ocfs2 latancy by using mdbox. Another problem that we have are Full backup that takes about 24h and incremental takes about 13h
Last you posted here you didn't have enough spindles in the array for the size of the workload, and you didn't have a dedicated quality GbE switch for strictly OCFS traffic between Dovecot hosts. Has any of this been rectified? If not, changing mailbox formats isn't going to fix your problems. If you really want to test this theory to the limit, converting to mbox (yes, good old mbox, without the d) will reduce metadata load more than mdbox, by a huge margin.
Given the constraints on your ability to reconfigure the disks in the EMC array (as someone else owns it and you're borrowing space) to meet your needs, and your apparent lack of funding allowing purchasing your own array, the things you should really do to fix most of your problems are:
- Stick with maildir
- Configure a Dovecot director box to achieve stickiness
- Put a quality 200GB SSD drive in each Dovecot IMAP server (smaller if you can get away with it to save $$, larger if if 200GB isn't enough)
- Change the index location on each server to the local SSD drive
Over time the user indexes will be gradually rebuilt onto the local fast SSD drive. This will eliminate the bulk of the IOPS load on OCFS, and should fix your performance issues pretty thoroughly. Well, except for the slow backup. The only way around that is more spindles in the EMC array LUN where mailboxes reside.
-- Stan
Thanks, but we did ran teste in the disks, and the problem was find file, not actuly writing or reading them, so we guess is a ofs2 problem, thats why now we are thinking in mdbox, to reduce the "find files" problem one thing that realy helped the performance was give lots of ram in the virtual machines where dovecot is running, with lots of ram, the kernel makes cache of "location" and it finds much faster the files. As matter fact the space provide to us, we might have to give back some of it. Buying hardware is not an option yet. So we keep thinking how to imprive performance tunning everything up.
I am thinking of chancge io schelduler also. But have not research a lot to try this yet, but it is in the plans.
The backup problem is because the machine that backups ( it joins the ocfs2 cluester just to backup files ) we are not able to make cache. Cause as much as we give RAM the bacula just eat up all ram we gave. I guess it is because of accurate option, still find a way to limit bacule ram use, so the kernel became able to cache some inodes.
Thanks for your reply and i am glad you remembered that old posts.
But i still looking for some info about mdbox.
Before we thought of using mdbox but we did not want to stick with dovecot, i mean, we like the ideia we could be able to change the imap server and etc. But as someone said, what are the other choises we have for opensource imap server ? Even with cyrus we still need to come acroos a big convertion, so it does not make much diference.
About the index files you said, SSD disks are not an possibility, but i though of using another partition as place for index files. Does it will make lot of diference ? As i said, we are using ldiretord with lbrlc that keeps track of ip address to some servers, is not ALWASY send it to the same server, but i tries to do it.
[]'sf.rique
On Sun, Apr 10, 2011 at 5:54 PM, Stan Hoeppner <stan@hardwarefreak.com>wrote:
Henrique Fernandes put forth on 4/10/2011 12:29 PM:
We are using dovecot 2.0.6 with maildir in an ocfs2 partiton.
It is pretty slow access in peak time, but it is related to ocfs2 as i said before, i am trying to mimimize the ocfs2 latancy by using mdbox. Another problem that we have are Full backup that takes about 24h and incremental takes about 13h
Last you posted here you didn't have enough spindles in the array for the size of the workload, and you didn't have a dedicated quality GbE switch for strictly OCFS traffic between Dovecot hosts. Has any of this been rectified? If not, changing mailbox formats isn't going to fix your problems. If you really want to test this theory to the limit, converting to mbox (yes, good old mbox, without the d) will reduce metadata load more than mdbox, by a huge margin.
Given the constraints on your ability to reconfigure the disks in the EMC array (as someone else owns it and you're borrowing space) to meet your needs, and your apparent lack of funding allowing purchasing your own array, the things you should really do to fix most of your problems are:
- Stick with maildir
- Configure a Dovecot director box to achieve stickiness
- Put a quality 200GB SSD drive in each Dovecot IMAP server (smaller if you can get away with it to save $$, larger if if 200GB isn't enough)
- Change the index location on each server to the local SSD drive
Over time the user indexes will be gradually rebuilt onto the local fast SSD drive. This will eliminate the bulk of the IOPS load on OCFS, and should fix your performance issues pretty thoroughly. Well, except for the slow backup. The only way around that is more spindles in the EMC array LUN where mailboxes reside.
-- Stan
Henrique Fernandes put forth on 4/10/2011 5:29 PM:
Thanks, but we did ran teste in the disks, and the problem was find file, not actuly writing or reading them, so we guess is a ofs2 problem, thats why
Hi Henrique,
Finding a file is a filesystem metadata operation, which entails walking the directory tree(s). Thus, finding a file generates IO to/from the disk array where the filesystem metadata resides. The IO requests are small, but there are many of them, and typically generate more head seeks than normal file read/write operations. Currently you don't have enough disk spindles in your array to keep up with the IOPS demands of both metadata and file operations.
now we are thinking in mdbox, to reduce the "find files" problem one thing
From a total IOPS perspective, there is not a huge difference between mdbox and maildir. And mdbox ties your indexes to the mail files, eliminating the flexibility of putting indexes on dedicated storage such as local SSD. If you really want to eliminate the massive IOPS load you currently have, switch to mbox. mbox storage generates almost no metadata IOPS at all.
that realy helped the performance was give lots of ram in the virtual machines where dovecot is running, with lots of ram, the kernel makes cache
Yes, a filesystem with 5 million maildir files is going to have a large directory tree to walk. If OCFS has a tweakable setting to force the amount of metadata that is cached, setting this as high as you possibly can (without destroying other things that need RAM) is a smart move. If not configurable, giving as much RAM as possible to each VM is also worthwhile, as you've noticed, as more more buffer space is allocated to filesystem metadata.
Worth noting: if you have multiple VM guest OSes on a single host, which all access the same OCFS filesystem, you will be unnecessarily caching OCFS data multiple times, wasting memory. If your hypervisor supports memory deduplication, as VMware ESX does, you should enable it, if you haven't already. If that is not an option, it's best to only run one OCFS guest per host. It may be possible to run OCFS in the hypervisor, and map a virtual filesystem to each guest on that host. Read your virtual platform documentation to find the right solution for your environment.
of "location" and it finds much faster the files. As matter fact the space provide to us, we might have to give back some of it. Buying hardware is not an option yet. So we keep thinking how to imprive performance tunning everything up.
Losing some space is fine as long as you don't lose spindles. Losing spindles decreases IOPS. Be sure that OCFS supports filesystem shrinking before the EMC OPs decrease the size of your LUN. And, make sure you've shutdown all of your OCFS hosts/guests _before_ they resize your LUN. If you don't, you'll possibly sustain corruption that will cause the loss of your entire OCFS filesystem. After they resize the LUN, bring one OCFS machine back online but without mounting the OCFS filesystem. Use the appropriate tool to shrink the filesystem, again, if OCFS can even do it. If not, do NOT let them resize your LUN. Make other arrangements.
I am thinking of chancge io schelduler also. But have not research a lot to try this yet, but it is in the plans.
Changing the Linux elevator has little effect in VM guests. Use the "noop" elevator in a VM environment, especially when using SAN storage. The Linux elevator cannot make head positioning decisions with SAN arrays--use noop.
The backup problem is because the machine that backups ( it joins the ocfs2 cluester just to backup files ) we are not able to make cache. Cause as much as we give RAM the bacula just eat up all ram we gave. I guess it is because of accurate option, still find a way to limit bacule ram use, so the kernel became able to cache some inodes.
Considering the nature of backup, I don't think the caching of metadata or files will decrease the IOPS load on the disk array. Backup is fairly sequential in nature, so caching anything would simply waste memory. I'm guessing what you really want is something like read ahead caching of metadata, so after finding one file, the next few directory table reads are from memory instead of disk. I don't know of a way to optimize for this.
Thanks for your reply and i am glad you remembered that old posts.
It's an interesting problem/scenario, and a couple of my interests/specialties are SAN storage arrays and filesystems.
But i still looking for some info about mdbox.
It's a hybrid between maildir and mbox, but with inbuilt indexes. maildir is one mail per file. mbox is one file with lots of mail. mdbox stores multiple mails per file across many files. Using mdbox will drop your IOPS load. Exactly how much I can't say, but it won't be anything near as large a drop as converting to mbox, which again, has almost zero metadata overhead.
Before we thought of using mdbox but we did not want to stick with dovecot, i mean, we like the ideia we could be able to change the imap server and etc. But as someone said, what are the other choises we have for opensource imap server ? Even with cyrus we still need to come acroos a big convertion, so it does not make much diference.
My only concern would be integrated indexes. Once you convert to mdbox you no longer have the flexibility of moving index files to fast dedicated storage.
Right now your best option for a quick decisive solution to your SAN/OCFS bottleneck is to move the maildir indexes off the SAN to fast local disks in the cluster hosts, either an SSD or 15k SAS drive per host. It's cheap, it will work, and you can implement and test it quickly at zero risk. It will be even better if you can use kernel 2.6.36 or later and XFS w/delaylog mount option on the local disks. Delaylog can reduce metadata write disk IOPS by an order of magnitude.
About the index files you said, SSD disks are not an possibility, but i though of using another partition as place for index files. Does it will make lot of diference ? As i said, we are using ldiretord with lbrlc that keeps track of ip address to some servers, is not ALWASY send it to the same server, but i tries to do it.
It will likely make a big difference in OCFS load. But...
The whole point of moving the index files to a local disk is to *dedicate* the entire performance of the disk to index file IOPS and remove that load from the SAN. An average decent quality SATA3 SSD today can perform about 50k random IOPS--far more than you actually need, but the best bang for the buck IOPS wise. A single 15k SAS drive can perform about 300 random IOPS, a 10k SAS drive about 200 random IOPS, and a 7.2k SATA drive performs about 150 random IOPS.
If you have 3 physical Dovecot hosts in your cluster, and can dedicate a 15k SAS drive on each for index file use only, you can potentially decrease the IOPS load on the EMC by 900. That should solve your OCFS performance problems pretty handily. That's a big win for less than $500 USD outlay on 3 15k SAS drives.
If you simply use leftover space on a currently installed local disk in each host, the amount of benefit you receive is completely dependent on the current seek load on that disk. I would do some investigation with iostat over a period of a few days to make sure said disk in each host has plenty of spare IOPS capacity. It's possible that you'll actually decrease overall Dovecot performance substantially if the disks you mention don't have enough spare performance.
Make sure you size the local disks to accommodate the current index files, with 50% headroom for load balancer stickiness overhead inefficiency and index growth. If you have 300GB total of index files currently on the EMC, a new 147GB 15k SAS drive in each of your 3 Dovecot hosts should suffice. 3.5" 15k 147GB Fujitsu Enterprise SAS drives can be obtained for as little as $155 USD. If your servers are HP/IBM/Dell and require their hot swap caged drives, you'll obviously pay quite a bit more.
-- Stan
On Sun, 2011-04-10 at 14:29 -0300, Henrique Fernandes wrote:
I am thinking in converting some email for testing propose, but i am wondoring how would dovecot convert this, like if i convert all mail from maildir to mdbox it would make many mailbox based on this seetings ? Or in the first convert it would make a big mdbox file and after it it will rotate by day or it would make many boxes just based on size?
mdbox_rotate_size = 5m
This setting affects all saves, so dsync doesn't create files larger than 5 MB (although if you have over 5 MB mail then it goes fully into the file).
mdbox_rotate_interval = 1d
All mails are saved within same day so this makes no difference.
Well, another thing can i make rotate interval happens in a certain time of day? end from where it takes this time ? it rotates at midnight ?
It's at midnight currently, yes. Maybe there should be a setting for this.
Another one!, if i lose the index files what really happens ? Sometimes in my server the index get corrupet and as i am reading mdbox index files held information that is not in other place. So what should i do to prevent any possible lose of data even if i lose the indexes?
Don't lose indexes :) mdbox anyway tries really hard to fix corrupted indexes. It also keeps a backup index file around which it can use. So a "corrupted index" typically doesn't lose (almost) any data, but if the entire index file (and its backup) gets lost, then there's nothing Dovecot can do. Keep backups.
I was reading about backup restore also, but i did not get it pretty well. With maildir we simple restore the files and the index gets rebuilt! What should i do with mdbox ? Restore the files and the index ? But how about the other emails that the account got after the backup, how i should merge the multiple files and more importanet how do i merge the index files ?
I guess you're talking about "doveadm import" command. It copies messages from a given message store to destination. You shouldn't ever touch filesystem directly with mdbox, everything should be done via doveadm command.
If you've lost one user's all mails, you can just "doveadm import" everything back there. The mails may be out of order with some clients, but most support sorting by (received) date so it shouldn't matter that much..
I guess you could also 1) restore the mdbox to /tmp from backup, 2) stop mail deliveries to the user, 3) doveadm import the new mails to /tmp, 4) rm -rf new mails and mv all the mails from /tmp.
Something like that..
participants (3)
-
Henrique Fernandes
-
Stan Hoeppner
-
Timo Sirainen