[Dovecot] what to expect from changing index location
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hello everybody,
I have one thousand virtual users with mdbox mailbox format and 10 GByte quota. I have noticed some performance problem related to I/O (the mailbox disk is a 6TB raid1+0 on ISCSI), so I want to put the index files on a different disk. My actual mail_location is:
mail_location = mdbox:/var/vmail/%-1.1u/%u/mdbox
and I want to switch to
mail_location = mdbox:/var/vmail/%-1.1u/%u/mdbox:INDEX=/var/indexes/%-1.1u/%u/
But I cannot figure out a pair of things:
- do the switch trigger the rebuilding of the index files?
- can I get rid of all the old index files?
- how much the index files (no fts squat) can grow?
thanks in advance bye davide
Dott. Davide Vaghetti Centro Servizi Informatici Facolta' di Ingegneria Universita' di Pisa PGP: http://keys.keysigning.org:11371/pks/lookup?op=get&search=0x7A1B3BA18C4E0A4D -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk4J/Z4ACgkQehs7oYxOCk3QagCfd+yVvR2Ps6hbjZOCumn2Wx2W 6cgAnjOvQDfe1O9jDOAbp4k0deEmytd3 =ZJFQ -----END PGP SIGNATURE-----
On 2011-06-28 12:13 PM, Davide Vaghetti wrote:
mail_location = mdbox:/var/vmail/%-1.1u/%u/mdbox
and I want to switch to
mail_location = mdbox:/var/vmail/%-1.1u/%u/mdbox:INDEX=/var/indexes/%-1.1u/%u/
But I cannot figure out a pair of things:
- do the switch trigger the rebuilding of the index files?
- can I get rid of all the old index files?
I'm by no means an expert, but with that many users I think if you did this in one shot (all indexes being rebuilt simultaneously as users logged in) your system would slow to a crawl...
I would first rsync the existing indexes over live, then stop dovecot, do another quick rsync of the indexes, then make the change and restart dovecot...
That will minimize the impact (rebuilding of indexes)...
--
Best regards,
Charles
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 06/28/2011 07:29 PM, Charles Marcus wrote:
On 2011-06-28 12:13 PM, Davide Vaghetti wrote:
mail_location = mdbox:/var/vmail/%-1.1u/%u/mdbox
and I want to switch to
mail_location = mdbox:/var/vmail/%-1.1u/%u/mdbox:INDEX=/var/indexes/%-1.1u/%u/
But I cannot figure out a pair of things:
- do the switch trigger the rebuilding of the index files?
- can I get rid of all the old index files?
I'm by no means an expert, but with that many users I think if you did this in one shot (all indexes being rebuilt simultaneously as users logged in) your system would slow to a crawl...
I would first rsync the existing indexes over live, then stop dovecot, do another quick rsync of the indexes, then make the change and restart dovecot...
That will minimize the impact (rebuilding of indexes)...
Good hint! Thank you.
What about the index grow factor? Do some of you folks have any idea about that (no ftp squat)?
bye davide
Dott. Davide Vaghetti Centro Servizi Informatici Facolta' di Ingegneria Universita' di Pisa PGP: http://keys.keysigning.org:11371/pks/lookup?op=get&search=0x7A1B3BA18C4E0A4D -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk4LSr0ACgkQehs7oYxOCk2iBwCfbcygrvBaO4JJFAtgTb9fXwZg FPMAoI/yZFborIJH+U3gTx28In602H7k =pHbw -----END PGP SIGNATURE-----
On 28/06/2011 17:13, Davide Vaghetti wrote:
I have one thousand virtual users with mdbox mailbox format and 10 GByte quota. I have noticed some performance problem related to I/O (the mailbox disk is a 6TB raid1+0 on ISCSI), so I want to put the index files on a different disk. My actual mail_location is:
mail_location = mdbox:/var/vmail/%-1.1u/%u/mdbox
and I want to switch to
mail_location = mdbox:/var/vmail/%-1.1u/%u/mdbox:INDEX=/var/indexes/%-1.1u/%u/
But I cannot figure out a pair of things:
- do the switch trigger the rebuilding of the index files?
!!!!! DANGER, DANGER !!!!!!
Index files cannot be re-generated under mdbox
Go away and read http://wiki2.dovecot.org/MailboxFormat/dbox
"... with dbox the Index files actually contain significant data which is held nowhere else. Index files for both *single-dbox* and *multi-dbox* contain message flags and keywords. For *multi-dbox*, the index file also contains the map_uids which link (via the "map index") to the actual message data. This data cannot be automatically recreated, so it is important that Index files are treated with the same care as message data files."
If you don't already know this, then you probably shouldn't even be using mdbox.
- can I get rid of all the old index files?
NO!
- how much the index files (no fts squat) can grow?
First solve your understanding problem with mdbox, then worry about details such as this.
Bill
In fact, under sdbox and mdbox, calling these files "index files" is misleading because it implies that they can be re-created, leading to situations like this.
Such situations could result in catastrophic data loss. Whilst we could say it is "user error", users could argue that it is "common knowledge" that files referred to as "index files" can be re-created from the "data files".
In reality, these so-called "index files" are actually database files containing critical data.
They happen to use the same format as Dovecot uses for index files in connection with mbox and maildir, but they contain data which is held nowhere else and cannot be recreated.
Perhaps the per-mailbox index files for sdbox and mdbox should be re-named to "message metadata databases", and the "map index" should be renamed to "message store database".
Specifically we should avoid the word "index". By including the word "database", we make it clearer that these files contain data.
Timo, what do you reckon?
Regards,
Bill
On 29/06/2011 17:36, William Blunn wrote:
On 28/06/2011 17:13, Davide Vaghetti wrote:
I have one thousand virtual users with mdbox mailbox format and 10 GByte quota. I have noticed some performance problem related to I/O (the mailbox disk is a 6TB raid1+0 on ISCSI), so I want to put the index files on a different disk. My actual mail_location is:
mail_location = mdbox:/var/vmail/%-1.1u/%u/mdbox
and I want to switch to
mail_location = mdbox:/var/vmail/%-1.1u/%u/mdbox:INDEX=/var/indexes/%-1.1u/%u/
But I cannot figure out a pair of things:
- do the switch trigger the rebuilding of the index files?
!!!!! DANGER, DANGER !!!!!!
Index files cannot be re-generated under mdbox
Go away and read http://wiki2.dovecot.org/MailboxFormat/dbox
"... with dbox the Index files actually contain significant data which is held nowhere else. Index files for both *single-dbox* and *multi-dbox* contain message flags and keywords. For *multi-dbox*, the index file also contains the map_uids which link (via the "map index") to the actual message data. This data cannot be automatically recreated, so it is important that Index files are treated with the same care as message data files."
If you don't already know this, then you probably shouldn't even be using mdbox.
- can I get rid of all the old index files?
NO!
- how much the index files (no fts squat) can grow?
First solve your understanding problem with mdbox, then worry about details such as this.
Bill
On 29/06/2011 18:00, William Blunn wrote:
Perhaps the per-mailbox index files for sdbox and mdbox should be re-named to "message metadata databases", and the "map index" should be renamed to "message store database".
Also it might be an idea to change the filenames of the files to avoid the word "index".
Perhaps use something like "ddb" instead (means "Dovecot database").
So,
${location}/mailboxes/INBOX/dbox-Mails/dovecot.index ${location}/mailboxes/INBOX/dbox-Mails/dovecot.index.cache ${location}/mailboxes/INBOX/dbox-Mails/dovecot.index.log ${location}/storage/dovecot.map.index
becomes
${location}/mailboxes/INBOX/dbox-Mails/dovecot.ddb ${location}/mailboxes/INBOX/dbox-Mails/dovecot.ddb.cache ${location}/mailboxes/INBOX/dbox-Mails/dovecot.ddb.log ${location}/storage/dovecot.map.ddb
To allow for migration of existing installations, it might be an idea to make Dovecot look for both "ddb" and "index" when opening, but use "ddb" when creating new files.
Regards,
Bill
On Wed, 2011-06-29 at 18:09 +0100, William Blunn wrote:
On 29/06/2011 18:00, William Blunn wrote:
Perhaps the per-mailbox index files for sdbox and mdbox should be re-named to "message metadata databases", and the "map index" should be renamed to "message store database".
Also it might be an idea to change the filenames of the files to avoid the word "index".
Perhaps use something like "ddb" instead (means "Dovecot database").
Or simply "db" :)
${location}/mailboxes/INBOX/dbox-Mails/dovecot.ddb ${location}/mailboxes/INBOX/dbox-Mails/dovecot.ddb.cache ${location}/mailboxes/INBOX/dbox-Mails/dovecot.ddb.log ${location}/storage/dovecot.map.ddb
Yes, this would be nice, but..
To allow for migration of existing installations, it might be an idea to make Dovecot look for both "ddb" and "index" when opening, but use "ddb" when creating new files.
This makes it annoying. It wastes disk I/O..
BTW. Cyrus also has "cyrus.index" file, which is the only storage for message flags. So Dovecot isn't alone with this.
I concede that this is most likely a WIBNI (Wouldn't It Be Nice If...) and most likely will end up on the list of WIBNIs, never to be implemented.
But I would like to take the brainstorm forward another step, just to see.
On 30/06/2011 05:35, Timo Sirainen wrote:
To allow for migration of existing installations, it might be an idea to make Dovecot look for both "ddb" and "index" when opening, but use "ddb" when creating new files.
This makes it annoying. It wastes disk I/O..
OK fair enough.
(Though not actually *disk* I/O /per se/. It is not like we would create any further sync-to-disk requirement (i.e. requiring to wait for another revolution), but rather that it would require more system calls.)
Presumably it's important that it works correctly for existing users with minimal risk of problems if people take the path of least resistance (and people don't read the release notes). I imagine many people will not be bothered about some extra failed "open" calls. But we should still have a way to tune for optimal I/O usage so that systems which are "up against it" for performance can be tuned. OK, how about this:
A configuration directive like this:
filename_word_ddb = ddb index
This specifies a list of words which will be tried in the place where we mean to say "ddb" in a filename.
If the directive is not present, then the default value would be as per the example above. This should allow existing installations to work correctly using old configuration files.
If a new file needs to be created, then it will use the first entry in the list.
So new installs will use "ddb" for all such files, and will be optimal where the file exists already, but mildly sub-optimal where the file doesn't exist (because Dovecot would have to try opening each possible variation before being able to know that the file was not openable). In order to tune for I/O, the administrator can reconfigure the list to be just "ddb".
Old installs will have existing files with "index" with new files being created with "ddb". This will work correctly, but with some degree of sub-optimality. In order to tune for I/O, the administrator would need to:
- Configure filename_word_ddb to "ddb index ddb" (to mitigate the race condition where a file is renamed after "ddb" is tried but before "index" is tried)
- Re-name existing files (from "...index..." to "...ddb...")
- Check that no files with old names exist
- Change the list to "ddb"
This means that things should work correctly by default, and only get messed-up when people actively go and try to optimise things without paying attention to what they're doing.
BTW. Cyrus also has "cyrus.index" file, which is the only storage for message flags. So Dovecot isn't alone with this.
Though two is still a small sample compared to the weight of existing terminology usage.
Besides, Cyrus is somewhat "in-bred", and we would expect it to be quirky :-)
Bill
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 06/29/2011 06:36 PM, William Blunn wrote:
On 28/06/2011 17:13, Davide Vaghetti wrote:
I have one thousand virtual users with mdbox mailbox format and 10 GByte quota. I have noticed some performance problem related to I/O (the mailbox disk is a 6TB raid1+0 on ISCSI), so I want to put the index files on a different disk. My actual mail_location is:
mail_location = mdbox:/var/vmail/%-1.1u/%u/mdbox
and I want to switch to
mail_location = mdbox:/var/vmail/%-1.1u/%u/mdbox:INDEX=/var/indexes/%-1.1u/%u/
But I cannot figure out a pair of things:
- do the switch trigger the rebuilding of the index files?
!!!!! DANGER, DANGER !!!!!!
Index files cannot be re-generated under mdbox
Go away and read http://wiki2.dovecot.org/MailboxFormat/dbox
"... with dbox the Index files actually contain significant data which is held nowhere else. Index files for both *single-dbox* and *multi-dbox* contain message flags and keywords. For *multi-dbox*, the index file also contains the map_uids which link (via the "map index") to the actual message data. This data cannot be automatically recreated, so it is important that Index files are treated with the same care as message data files."
If you don't already know this, then you probably shouldn't even be using mdbox.
- can I get rid of all the old index files?
NO!
- how much the index files (no fts squat) can grow?
First solve your understanding problem with mdbox, then worry about details such as this.
Bill, thanks for all the __important__ info. You almost saved my ass ;-) (BTW, that is why I was asking)
I'll check again the documentation to better understand index in the mdbox context.
Nontheless, I still have to care about the index files grow factor, so if you, or anyone else, can point me to the right documentation, or have a rule of thumbs to know it, please share it.
Regards davide
Dott. Davide Vaghetti Centro Servizi Informatici Facolta' di Ingegneria Universita' di Pisa PGP: http://keys.keysigning.org:11371/pks/lookup?op=get&search=0x7A1B3BA18C4E0A4D -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk4MIUYACgkQehs7oYxOCk1pHwCfeomYITfTiyAhMC2oQhM3cFhW Vh8AoPBSRflEWP4sFTpD1vgZKya+0KtV =e7rX -----END PGP SIGNATURE-----
participants (4)
-
Charles Marcus
-
Davide Vaghetti
-
Timo Sirainen
-
William Blunn