[Dovecot] Questiosn about dbox

Sven Hartge sven at svenhartge.de
Tue Jan 25 00:11:12 EET 2011


Timo Sirainen <tss at iki.fi> wrote:
> On 24.1.2011, at 23.17, Sven Hartge wrote:

>> I take this thread and jump in, since we (TH Mittelhessen, Germany) are
>> also investigating the move to Dovecot and we also have the same
>> situation as Javier: Courier with Maildir and Bacula as backup
>> solution, we even have about the same amount of mails in our system.
>> 
>> And I was also wondering which storage format to use: stay at Maildir
>> (no need to worry about indexes, just restore straight to the users
>> $HOME/Maildir and be done with it), use sdbox or use mdbox.

> Probably a good idea to switch to Dovecot+Maildir first, and then when
> everything seems to be working fine switch to mdbox or sdbox.

Of course. Being able to convert just a few mailboxes (probable the ones
from the admins, eating our own dog food, etc.) over to a different
storage method really helps here.

>> "Expunging a message only decreases the message's refcount. The space
>> is later freed in "purge" step. This is typically done in a nightly
>> cronjob when there's less disk I/O activity. The purging first finds
>> all files that have refcount=0 mails. Then it goes through each file
>> and copies the refcount>0 mails to other mdbox files (to the same
>> files as where newly saved messages would also go), updates the map
>> index and finally deletes the original file."
>> 
>> For example, we got m.1, m.2 and m.3 and all files have deleted mails
>> in it. During expunge, all undeleted mails would go to m.4 and m.5
>> for example.

> Typically only new messages are deleted, so typically it would be only
> m.3 file that had deleted mails.

Probably, yes. But I am trying to prevent a sudden and unpredictable
surge in the needed backup space for a day. I guess, I will have to
experiment with this.

>> Now Bacula backups the mailstorage and has 2 new files to backup and
>> 3 old ones to "delete/forget" (using the accurate backup option).
>> 
>> Wouldn't this massivly increase the size of the backup because I end
>> up backing many mails multiple times?

> Yes, but if you use mdbox_rotate_interval=1d and run the purging
> before backups, I think there's a good chance that most of the backed
> up mails will be new files that bacula hasn't seen before.

Do you mean "new mails" instead of "new files"?

Again, I think I will have to experiment with this. Using a new mdbox
based on timing and not on the amount or size of mails is an option I
have not yet thought of.

>> I thought of limiting the amount of mails inside the mdbox to one, thus
>> of course defeating the benefit of having multiple mails inside one
>> file, but gaining a stable file name over the whole lifetime of a mail
>> which will never change, even if the file is moved to a different folder
>> or its state changes.

> Then you'd want to use sdbox, but that won't decrease the backup time
> compared to maildir, since there's the same number of files.

Correct. This is why I am very interested in using a bundled format such
as mdbox. Right now, I am not able to do real full backups, as this
would take about 30 hours. I am limited to VirtualFull backups using the
acurate option from Bacula which cuts the daily incremental backup time
to about 2 hours.

>> Problem: I my end up with hundred thousands of m.* files inside a users
>> storage area (Don't ask, we really have this kind of user. And no, there
>> are uneducable about this.), even if the user neatly sorted them into
>> different IMAP folders.

> I don't really understand what you're trying to say with this. m.*
> files anyway aren't folder-specific, all of the user's mails are in
> the same m.* files. And users can't really affect how m.* files are
> created, other than deleting messages all around the mailbox.

Yes, exactly.

Image a user with 100 folders with 1000 mails per folder: With one mail
per mdbox, I'd have 10.000 m.*-Files in the storage area, if I kind of
abuse mdbox by just allowing one mail per file. Not optimal.

But this is just a case of having one's cake and eating it too.
(Hopefully got that proverb right.)

Just thinking: can the storage directory for mdbox be hashed? So you for
example get

  <mail location root>/storage/X/Y/m.*

instead of

  <mail location root>/storage/m.*

This way any performance degration caused by too many files per
directory could be prevented.

Grüße,
Sven.

-- 
Sig lost. Core dumped.



More information about the dovecot mailing list