[Dovecot] Spliting Folders for Efficiency

Kyle Wheeler kyle-dovecot at memoryhole.net
Thu Nov 1 20:23:25 EET 2007


On Saturday, October 13 at 09:25 AM, quoth Daniel W:
> Thanks for the insights. Is it also true that to read a single 
> message in a 800MB mbox, you need to load 800MB of data into memory 
> which is then searched for that message?

Not at all. If you don't know what message you're looking for, then 
yes (kinda: you could just mmap the mbox file, which reduces your 
latency before beginning the search), but Maildir has an even worse 
problem: if you don't know what message you're looking for, you have 
to open and close every single message-file. And open()/close() 
typically has quite a bit more overhead than lseek(). More to the 
point, when searching for a file in an mbox, the OS has a very good 
idea of what you're going to be looking at next (linear search is 
predictable that way), so it can do a much better job of prefetching 
and I/O scheduling for a search through an mbox than it can for a 
Maildir search. Again, mbox wins.

On the other hand, if you know exactly what message you're looking 
for, the necessary I/O is only slightly different. In an mbox, 
"knowing" which message you're looking for is best expressed as an 
offset within the file. Similarly, in a Maildir, "knowing" which 
message you're looking for is best expressed as a filename, or (better 
still, in some cases) an inode number. In an mbox, then, you have to 
open() the file and lseek() to the correct offset (which, in an 
exceedingly large mbox, may require log(sizeoffile) disk accesses to 
begin the first read). In a Maildir, you have to merely open() the 
file, however rather than dealing with the filesystem's method of 
storing a file, you have to deal with the filesystem's method of 
storing filenames. In fancy filesystems (e.g. ReiserFS or ext3 with 
dir_hashing turned on), this can be pretty fast ---on the order of 
log(numberofmessages), but in boring filesystems (e.g. ext2, ext3 
without dir_hashing, vfat, etc.) this can take a lot of time. Between 
the two, on average, the I/O load is about the same for both actions, 
though the filesystem particulars are what really make one or the 
other a better fit for a given situation.

The really irritating thing about Maildir is that the filenames can 
change, meaning that "knowing" which message you want (i.e. you have a  
filename) may still mean you have to scan through the list of 
available filenames and see which ones are similar to the name you 
wanted (see why having an inode number can be more useful?), which 
takes MUCH longer than lseek().

> That would suggest that mbox is only scaleable to a realtively small 
> inbox size.

Not really.

> eg. Splitting by message size. If a message is much smaller than the 
> block size, use a single file format and if larger, write out to 
> it's own file. Every folder would have two mechanisms and Dovecot 
> could just look at each message as it comes in to decide how to 
> store it.

Yes, but then you get to the question of: what does that buy you? And, 
better still: how do you find any given message? Filename+offset? 
You'd be compounding the worst details of both designs. Not only do 
you have to lseek() to find your small message, but you have to pay 
the filename lookup penalty as well---even if you know exactly where 
your message is. On the other hand, you've reduced the cost of both by 
relying on the other: your lseek overhead is lower because you are 
dealing with a smaller file than you'd ordinarily have to, and your 
filename lookup overhead is lower because you've got fewer files. So, 
whether this is a good idea probably, once again, depends very much on 
where the performance curves bend (e.g. if your filesystem gets much 
slower for more than 10,000 files in one directory, or if it gets much 
slower if your file is over 1G, or something like that). If your 
filesystem scales linearly, though, it's not a net gain.

~Kyle
-- 
Come to me, son of Jor-El. Kneel before Zod. Snootchie-bootchies.
                                                                 -- Jay
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://dovecot.org/pipermail/dovecot/attachments/20071101/5b431064/attachment.bin 


More information about the dovecot mailing list