[Dovecot] Spliting Folders for Efficiency

Sat Oct 13 11:25:41 EEST 2007

Kyle Wheeler wrote:
> On Friday, October 12 at 11:06 AM, quoth Daniel Watts:
>> What actually ARE the advantages of a 'one file per folder' format?? 
> 
> It depends on the environment. It's exceedingly efficient at storage: on 
> a filesystem with 4k blocks, three 1k messages take up 1 block (4k), 
> where in a one-file-per-message format they take up 3 blocks (12k). Some 
> filesystems have mechanisms of coping with files that only occupy a 
> partial block, but those mechanisms tend to be expensive, and  are often 
> only employed when strapped for space. The one-file-per-folder 
> arrangement also helps when doing sequential reads (i.e. searches, or 
> loading it into memory, or processing it with a filter, or whatever 
> else): when the OS spools the file from disk, it loads it up a block at 
> a time, which in a one-file-per-folder format is several messages, but 
> in a one-file-per-message format is only ever a single message.
> 
> I've often contemplated setting up a separate mbox-based namespace in my 
> Dovecot setup (e.g. everything in the Archive folder is saved as an 
> mbox), just for the space savings.
> 

Thanks for the insights. Is it also true that to read a single message 
in a 800MB mbox, you need to load 800MB of data into memory which is 
then searched for that message? That would suggest that mbox is only 
scaleable to a realtively small inbox size.

There are other tactics that could be considered as well.

eg. Splitting by message size. If a message is much smaller than the 
block size, use a single file format and if larger, write out to it's 
own file. Every folder would have two mechanisms and Dovecot could just 
look at each message as it comes in to decide how to store it.

Messages are normally quite small but attachments are not. One could 
have a separate attachments directory that stores files individually. 
This would keep the mbox small and Dovecot would fetch attachments as 
needed and never load them into memory otherwise.

However inevitably the mbox will still grow large and the original 
(proposed) problem of "reading a large file to find a single small 
message" returns, which would mean I remain unconvinced about the 
scaleabilty of mbox.