[Dovecot] Questiosn about dbox
Hello
I have read carefully about dbox
(/http://wiki2.dovecot.org/MailboxFormat/dbox/) and I have some questions:
- One of the main advantages (speed wise) of dbox over maildir is
that index files are the only storage for message flags and keywords. What happens when we want to recover some messages from backup? With maildir we can rebuild message indexes, but I am not sure about dbox. Should we also restore "old indexes" and merge with the "new indexes" in order to restore the deleted messages?
- The previous question applies to sdbox and mdbox. In the case of
mdbox, we can configure rotation of files using /mdbox_rotate_size/ . We would like to rotate daily, not based in size (our users ask us for yesterday's backup). How can we accomplish this?
We have now 17.000.000 messages in our maildir, almost 1.5 TB (zlib
compresssion enabled). Our backup time with bacula is rather bad: 24 hours for a full backup, most of the time the backup is busy fstat'ing all those little messages. We think that mdbox can help us in this. Does anybody has good experiences migrating from maildir->mdox in "large" enviroments? What about mdox performance & reliability?
Thank you for your support
Javier
On 24.1.2011, at 15.52, Javier de Miguel Rodrí guez wrote:
- One of the main advantages (speed wise) of dbox over maildir is that index files are the only storage for message flags and keywords. What happens when we want to recover some messages from backup? With maildir we can rebuild message indexes, but I am not sure about dbox. Should we also restore "old indexes" and merge with the "new indexes" in order to restore the deleted messages?
The intended way to restore stuff is to either restore the entire dbox to a temp directory, or at least all the important parts of it (indexes + the files that contain the wanted mails) and then use something like:
doveadm import sdbox:/tmp/restoredbox "" savedsince 2011-01-01
- The previous question applies to sdbox and mdbox. In the case of mdbox, we can configure rotation of files using /mdbox_rotate_size/ . We would like to rotate daily, not based in size (our users ask us for yesterday's backup). How can we accomplish this?
mdbox_rotate_interval = 1d
But note that that doesn't guarantee that there will be only one file. Even if you set mdbox_rotate_size to 10 GB or something (or I think 0 makes it unlimited, not sure), it's possible that two files will be created if mails are being saved at the same time. mdbox never waits for locks when writing to a file, instead it'll just use another file or create a new one.
Anyway, if it's not a big deal restoring the user's entire mailbox temporarily you can restore only yesterday's mails by giving proper search query parameter to doveadm import.
We have now 17.000.000 messages in our maildir, almost 1.5 TB (zlib compresssion enabled). Our backup time with bacula is rather bad: 24 hours for a full backup, most of the time the backup is busy fstat'ing all those little messages.
In case of Maildir there's no point in fstating any mail files. I'd guess it should be possible to patch bacula to not do that.
We think that mdbox can help us in this. Does anybody has good experiences migrating from maildir->mdox in "large" enviroments? What about mdox performance & reliability?
I haven't recently heard of corruption complaints about mdbox.. Previously when there were those, I didn't hear of complains about losing mails or anything, so that's good :)
The intended way to restore stuff is to either restore the entire dbox to a temp directory, or at least all the important parts of it (indexes + the files that contain the wanted mails) and then use something like:
doveadm import sdbox:/tmp/restoredbox "" savedsince 2011-01-01
Thank you for your response, Timo. That was the answer I was
looking for. The above example is for sdbox, mdbox should be exactly the same, am I right?
- The previous question applies to sdbox and mdbox. In the case of mdbox, we can configure rotation of files using /mdbox_rotate_size/ . We would like to rotate daily, not based in size (our users ask us for yesterday's backup). How can we accomplish this?
mdbox_rotate_interval = 1d
Any known issues with mdbox and zlib plugin in lda & imap? I have
read about mbox is /"read-only"/ with zlib plugin. What about mdbox with a high rotate interval (almost a mbox)? How does this work? Is the entire mdbox file loaded into ram and decompressed or a temp file in the filesystem is used?
Another question: any hint about the "hot spot" of size for
/mdbox_rotate_interval/?
We have now 17.000.000 messages in our maildir, almost 1.5 TB (zlib compresssion enabled). Our backup time with bacula is rather bad: 24 hours for a full backup, most of the time the backup is busy fstat'ing all those little messages.
In case of Maildir there's no point in fstating any mail files. I'd guess it should be possible to patch bacula to not do that.
Good idea. I will write to bacula folks about that.
We think that mdbox can help us in this. Does anybody has good experiences migrating from maildir->mdox in "large" enviroments? What about mdox performance& reliability? I haven't recently heard of corruption complaints about mdbox.. Previously when there were those, I didn't hear of complains about losing mails or anything, so that's good :)
Any additional comments about this? We are seriously thinking about
migrating to mdbox, but is always scary "to be the first one"
Thank you for your support
Regards
Javier
On 24.1.2011, at 22.45, Javier de Miguel Rodríguez wrote:
The intended way to restore stuff is to either restore the entire dbox to a temp directory, or at least all the important parts of it (indexes + the files that contain the wanted mails) and then use something like:
doveadm import sdbox:/tmp/restoredbox "" savedsince 2011-01-01
Thank you for your response, Timo. That was the answer I was looking for. The above example is for sdbox, mdbox should be exactly the same, am I right?
Yep.
- The previous question applies to sdbox and mdbox. In the case of mdbox, we can configure rotation of files using /mdbox_rotate_size/ . We would like to rotate daily, not based in size (our users ask us for yesterday's backup). How can we accomplish this? mdbox_rotate_interval = 1d
Any known issues with mdbox and zlib plugin in lda & imap?
I don't think so.
I have read about mbox is /"read-only"/ with zlib plugin. What about mdbox with a high rotate interval (almost a mbox)? How does this work? Is the entire mdbox file loaded into ram and decompressed or a temp file in the filesystem is used?
No, individual messages are compressed inside the file. No temp files or anything. No read only.
Another question: any hint about the "hot spot" of size for /mdbox_rotate_interval/?
Its main point of that is to enable daily incremental backups. If you don't want that, I'd say keep it disabled.
We think that mdbox can help us in this. Does anybody has good experiences migrating from maildir->mdox in "large" enviroments? What about mdox performance& reliability? I haven't recently heard of corruption complaints about mdbox.. Previously when there were those, I didn't hear of complains about losing mails or anything, so that's good :)
Any additional comments about this? We are seriously thinking about migrating to mdbox, but is always scary "to be the first one"
I know there is at least one company using mdbox for I think a few hundred people, and they've been reporting me bugs. A lot of that was actually caused by glusterfs I think, so the problems probably weren't as bad as I thought.
I've also been using mdbox for my own mails for about a year with zero problems.
You can of course start a partial conversion. First convert a couple of people to mdbox. After a few weeks convert a few hundred and see how it goes over a few months. If you encounter problems, convert back to maildir. dsync should make all of this pretty easy.
On 24.1.2011, at 22.56, Timo Sirainen wrote:
You can of course start a partial conversion. First convert a couple of people to mdbox. After a few weeks convert a few hundred and see how it goes over a few months. If you encounter problems, convert back to maildir. dsync should make all of this pretty easy.
Oh, and I'm very interested in knowing details about how much disk I/O mdbox saves, please show some before/after graphs :)
Javier de Miguel Rodríguez <javierdemiguel@us.es> wrote:
We think that mdbox can help us in this. Does anybody has good experiences migrating from maildir->mdox in "large" enviroments? What about mdox performance& reliability?
I haven't recently heard of corruption complaints about mdbox.. Previously when there were those, I didn't hear of complains about losing mails or anything, so that's good :)
Any additional comments about this? We are seriously thinking about migrating to mdbox, but is always scary "to be the first one"
I take this thread and jump in, since we (TH Mittelhessen, Germany) are also investigating the move to Dovecot and we also have the same situation as Javier: Courier with Maildir and Bacula as backup solution, we even have about the same amount of mails in our system.
And I was also wondering which storage format to use: stay at Maildir (no need to worry about indexes, just restore straight to the users $HOME/Maildir and be done with it), use sdbox or use mdbox.
mdbox looks nice, but the following from the wiki got me thinking:
"Expunging a message only decreases the message's refcount. The space is later freed in "purge" step. This is typically done in a nightly cronjob when there's less disk I/O activity. The purging first finds all files that have refcount=0 mails. Then it goes through each file and copies the refcount>0 mails to other mdbox files (to the same files as where newly saved messages would also go), updates the map index and finally deletes the original file."
For example, we got m.1, m.2 and m.3 and all files have deleted mails in it. During expunge, all undeleted mails would go to m.4 and m.5 for example.
Now Bacula backups the mailstorage and has 2 new files to backup and 3 old ones to "delete/forget" (using the accurate backup option).
Wouldn't this massivly increase the size of the backup because I end up backing many mails multiple times?
With Maildir, a mail only gets backupped twice (or if it is moved to a different folder), if it is read in between to backup runs, since the filename changes but with mdbox, a mail may get backupped every day (in the worst case) if it is moved to a new data file every day during the expunge run.
Solution?
I thought of limiting the amount of mails inside the mdbox to one, thus of course defeating the benefit of having multiple mails inside one file, but gaining a stable file name over the whole lifetime of a mail which will never change, even if the file is moved to a different folder or its state changes.
Am I making any sense here?
Problem: I my end up with hundred thousands of m.* files inside a users storage area (Don't ask, we really have this kind of user. And no, there are uneducable about this.), even if the user neatly sorted them into different IMAP folders.
So, right now I have got the feeling, I am missing something important here.
Comments?
Grüße, Sven.
-- Sig lost. Core dumped.
On 24.1.2011, at 23.17, Sven Hartge wrote:
I take this thread and jump in, since we (TH Mittelhessen, Germany) are also investigating the move to Dovecot and we also have the same situation as Javier: Courier with Maildir and Bacula as backup solution, we even have about the same amount of mails in our system.
And I was also wondering which storage format to use: stay at Maildir (no need to worry about indexes, just restore straight to the users $HOME/Maildir and be done with it), use sdbox or use mdbox.
Probably a good idea to switch to Dovecot+Maildir first, and then when everything seems to be working fine switch to mdbox or sdbox.
"Expunging a message only decreases the message's refcount. The space is later freed in "purge" step. This is typically done in a nightly cronjob when there's less disk I/O activity. The purging first finds all files that have refcount=0 mails. Then it goes through each file and copies the refcount>0 mails to other mdbox files (to the same files as where newly saved messages would also go), updates the map index and finally deletes the original file."
For example, we got m.1, m.2 and m.3 and all files have deleted mails in it. During expunge, all undeleted mails would go to m.4 and m.5 for example.
Typically only new messages are deleted, so typically it would be only m.3 file that had deleted mails.
Now Bacula backups the mailstorage and has 2 new files to backup and 3 old ones to "delete/forget" (using the accurate backup option).
Wouldn't this massivly increase the size of the backup because I end up backing many mails multiple times?
Yes, but if you use mdbox_rotate_interval=1d and run the purging before backups, I think there's a good chance that most of the backed up mails will be new files that bacula hasn't seen before.
I thought of limiting the amount of mails inside the mdbox to one, thus of course defeating the benefit of having multiple mails inside one file, but gaining a stable file name over the whole lifetime of a mail which will never change, even if the file is moved to a different folder or its state changes.
Then you'd want to use sdbox, but that won't decrease the backup time compared to maildir, since there's the same number of files.
Problem: I my end up with hundred thousands of m.* files inside a users storage area (Don't ask, we really have this kind of user. And no, there are uneducable about this.), even if the user neatly sorted them into different IMAP folders.
I don't really understand what you're trying to say with this. m.* files anyway aren't folder-specific, all of the user's mails are in the same m.* files. And users can't really affect how m.* files are created, other than deleting messages all around the mailbox.
Timo Sirainen <tss@iki.fi> wrote:
On 24.1.2011, at 23.17, Sven Hartge wrote:
I take this thread and jump in, since we (TH Mittelhessen, Germany) are also investigating the move to Dovecot and we also have the same situation as Javier: Courier with Maildir and Bacula as backup solution, we even have about the same amount of mails in our system.
And I was also wondering which storage format to use: stay at Maildir (no need to worry about indexes, just restore straight to the users $HOME/Maildir and be done with it), use sdbox or use mdbox.
Probably a good idea to switch to Dovecot+Maildir first, and then when everything seems to be working fine switch to mdbox or sdbox.
Of course. Being able to convert just a few mailboxes (probable the ones from the admins, eating our own dog food, etc.) over to a different storage method really helps here.
"Expunging a message only decreases the message's refcount. The space is later freed in "purge" step. This is typically done in a nightly cronjob when there's less disk I/O activity. The purging first finds all files that have refcount=0 mails. Then it goes through each file and copies the refcount>0 mails to other mdbox files (to the same files as where newly saved messages would also go), updates the map index and finally deletes the original file."
For example, we got m.1, m.2 and m.3 and all files have deleted mails in it. During expunge, all undeleted mails would go to m.4 and m.5 for example.
Typically only new messages are deleted, so typically it would be only m.3 file that had deleted mails.
Probably, yes. But I am trying to prevent a sudden and unpredictable surge in the needed backup space for a day. I guess, I will have to experiment with this.
Now Bacula backups the mailstorage and has 2 new files to backup and 3 old ones to "delete/forget" (using the accurate backup option).
Wouldn't this massivly increase the size of the backup because I end up backing many mails multiple times?
Yes, but if you use mdbox_rotate_interval=1d and run the purging before backups, I think there's a good chance that most of the backed up mails will be new files that bacula hasn't seen before.
Do you mean "new mails" instead of "new files"?
Again, I think I will have to experiment with this. Using a new mdbox based on timing and not on the amount or size of mails is an option I have not yet thought of.
I thought of limiting the amount of mails inside the mdbox to one, thus of course defeating the benefit of having multiple mails inside one file, but gaining a stable file name over the whole lifetime of a mail which will never change, even if the file is moved to a different folder or its state changes.
Then you'd want to use sdbox, but that won't decrease the backup time compared to maildir, since there's the same number of files.
Correct. This is why I am very interested in using a bundled format such as mdbox. Right now, I am not able to do real full backups, as this would take about 30 hours. I am limited to VirtualFull backups using the acurate option from Bacula which cuts the daily incremental backup time to about 2 hours.
Problem: I my end up with hundred thousands of m.* files inside a users storage area (Don't ask, we really have this kind of user. And no, there are uneducable about this.), even if the user neatly sorted them into different IMAP folders.
I don't really understand what you're trying to say with this. m.* files anyway aren't folder-specific, all of the user's mails are in the same m.* files. And users can't really affect how m.* files are created, other than deleting messages all around the mailbox.
Yes, exactly.
Image a user with 100 folders with 1000 mails per folder: With one mail per mdbox, I'd have 10.000 m.*-Files in the storage area, if I kind of abuse mdbox by just allowing one mail per file. Not optimal.
But this is just a case of having one's cake and eating it too. (Hopefully got that proverb right.)
Just thinking: can the storage directory for mdbox be hashed? So you for example get
<mail location root>/storage/X/Y/m.*
instead of
<mail location root>/storage/m.*
This way any performance degration caused by too many files per directory could be prevented.
Grüße, Sven.
-- Sig lost. Core dumped.
On 25.1.2011, at 0.11, Sven Hartge wrote:
Yes, but if you use mdbox_rotate_interval=1d and run the purging before backups, I think there's a good chance that most of the backed up mails will be new files that bacula hasn't seen before.
Do you mean "new mails" instead of "new files"?
Same thing. Because of daily mdbox file rotation all new mails are in new files (well, most).
Just thinking: can the storage directory for mdbox be hashed? So you for example get
<mail location root>/storage/X/Y/m.*
instead of
<mail location root>/storage/m.*
This way any performance degration caused by too many files per directory could be prevented.
It would be simple to add such code, but no one's really asked for it yet. Also I wonder what kind of a hashing method would be good here, especially one that would create more dirs when the number of files increases, but without becoming something stupid after file deletion like having only one or two files in each directory..
Timo Sirainen <tss@iki.fi> wrote:
On 25.1.2011, at 0.11, Sven Hartge wrote:
Yes, but if you use mdbox_rotate_interval=1d and run the purging before backups, I think there's a good chance that most of the backed up mails will be new files that bacula hasn't seen before.
Do you mean "new mails" instead of "new files"?
Same thing. Because of daily mdbox file rotation all new mails are in new files (well, most).
Ok, yes, right.
You definitely got me something very interesting to think/sleep about. (Got my best ideas in the shower in the morning after having my subconscious work on the problem during the night.)
Grüße, Sven.
-- Sig lost. Core dumped.
On 1/24/2011 12:45 PM, Javier de Miguel Rodríguez wrote:
Any known issues with mdbox and zlib plugin in lda & imap? I have
read about mbox is /"read-only"/ with zlib plugin. What about mdbox with a high rotate interval (almost a mbox)? How does this work? Is the entire mdbox file loaded into ram and decompressed or a temp file in the filesystem is used?
I've been having errors with the combination of single-instance storage, mdbox, and zlib. I don't THINK I'm losing mail - but a lot of errors in the logs.
Daniel
participants (5)
-
Daniel L. Miller
-
Javier de Miguel Rodríguez
-
Javier de Miguel Rodríguez
-
Sven Hartge
-
Timo Sirainen