Kyle Wheeler wrote:
On Friday, October 12 at 11:06 AM, quoth Daniel Watts:
What actually ARE the advantages of a 'one file per folder' format??
It depends on the environment. It's exceedingly efficient at storage: on a filesystem with 4k blocks, three 1k messages take up 1 block (4k), where in a one-file-per-message format they take up 3 blocks (12k). Some filesystems have mechanisms of coping with files that only occupy a partial block, but those mechanisms tend to be expensive, and are often only employed when strapped for space. The one-file-per-folder arrangement also helps when doing sequential reads (i.e. searches, or loading it into memory, or processing it with a filter, or whatever else): when the OS spools the file from disk, it loads it up a block at a time, which in a one-file-per-folder format is several messages, but in a one-file-per-message format is only ever a single message.
I've often contemplated setting up a separate mbox-based namespace in my Dovecot setup (e.g. everything in the Archive folder is saved as an mbox), just for the space savings.
Thanks for the insights. Is it also true that to read a single message in a 800MB mbox, you need to load 800MB of data into memory which is then searched for that message? That would suggest that mbox is only scaleable to a realtively small inbox size.
There are other tactics that could be considered as well.
eg. Splitting by message size. If a message is much smaller than the block size, use a single file format and if larger, write out to it's own file. Every folder would have two mechanisms and Dovecot could just look at each message as it comes in to decide how to store it.
Messages are normally quite small but attachments are not. One could have a separate attachments directory that stores files individually. This would keep the mbox small and Dovecot would fetch attachments as needed and never load them into memory otherwise.
However inevitably the mbox will still grow large and the original (proposed) problem of "reading a large file to find a single small message" returns, which would mean I remain unconvinced about the scaleabilty of mbox.