Timo Sirainen <tss@iki.fi> wrote:
On 24.1.2011, at 23.17, Sven Hartge wrote:
I take this thread and jump in, since we (TH Mittelhessen, Germany) are also investigating the move to Dovecot and we also have the same situation as Javier: Courier with Maildir and Bacula as backup solution, we even have about the same amount of mails in our system.
And I was also wondering which storage format to use: stay at Maildir (no need to worry about indexes, just restore straight to the users $HOME/Maildir and be done with it), use sdbox or use mdbox.
Probably a good idea to switch to Dovecot+Maildir first, and then when everything seems to be working fine switch to mdbox or sdbox.
Of course. Being able to convert just a few mailboxes (probable the ones from the admins, eating our own dog food, etc.) over to a different storage method really helps here.
"Expunging a message only decreases the message's refcount. The space is later freed in "purge" step. This is typically done in a nightly cronjob when there's less disk I/O activity. The purging first finds all files that have refcount=0 mails. Then it goes through each file and copies the refcount>0 mails to other mdbox files (to the same files as where newly saved messages would also go), updates the map index and finally deletes the original file."
For example, we got m.1, m.2 and m.3 and all files have deleted mails in it. During expunge, all undeleted mails would go to m.4 and m.5 for example.
Typically only new messages are deleted, so typically it would be only m.3 file that had deleted mails.
Probably, yes. But I am trying to prevent a sudden and unpredictable surge in the needed backup space for a day. I guess, I will have to experiment with this.
Now Bacula backups the mailstorage and has 2 new files to backup and 3 old ones to "delete/forget" (using the accurate backup option).
Wouldn't this massivly increase the size of the backup because I end up backing many mails multiple times?
Yes, but if you use mdbox_rotate_interval=1d and run the purging before backups, I think there's a good chance that most of the backed up mails will be new files that bacula hasn't seen before.
Do you mean "new mails" instead of "new files"?
Again, I think I will have to experiment with this. Using a new mdbox based on timing and not on the amount or size of mails is an option I have not yet thought of.
I thought of limiting the amount of mails inside the mdbox to one, thus of course defeating the benefit of having multiple mails inside one file, but gaining a stable file name over the whole lifetime of a mail which will never change, even if the file is moved to a different folder or its state changes.
Then you'd want to use sdbox, but that won't decrease the backup time compared to maildir, since there's the same number of files.
Correct. This is why I am very interested in using a bundled format such as mdbox. Right now, I am not able to do real full backups, as this would take about 30 hours. I am limited to VirtualFull backups using the acurate option from Bacula which cuts the daily incremental backup time to about 2 hours.
Problem: I my end up with hundred thousands of m.* files inside a users storage area (Don't ask, we really have this kind of user. And no, there are uneducable about this.), even if the user neatly sorted them into different IMAP folders.
I don't really understand what you're trying to say with this. m.* files anyway aren't folder-specific, all of the user's mails are in the same m.* files. And users can't really affect how m.* files are created, other than deleting messages all around the mailbox.
Yes, exactly.
Image a user with 100 folders with 1000 mails per folder: With one mail per mdbox, I'd have 10.000 m.*-Files in the storage area, if I kind of abuse mdbox by just allowing one mail per file. Not optimal.
But this is just a case of having one's cake and eating it too. (Hopefully got that proverb right.)
Just thinking: can the storage directory for mdbox be hashed? So you for example get
<mail location root>/storage/X/Y/m.*
instead of
<mail location root>/storage/m.*
This way any performance degration caused by too many files per directory could be prevented.
Grüße, Sven.
-- Sig lost. Core dumped.