On Saturday, October 13 at 09:25 AM, quoth Daniel W:
Thanks for the insights. Is it also true that to read a single message in a 800MB mbox, you need to load 800MB of data into memory which is then searched for that message?
Not at all. If you don't know what message you're looking for, then yes (kinda: you could just mmap the mbox file, which reduces your latency before beginning the search), but Maildir has an even worse problem: if you don't know what message you're looking for, you have to open and close every single message-file. And open()/close() typically has quite a bit more overhead than lseek(). More to the point, when searching for a file in an mbox, the OS has a very good idea of what you're going to be looking at next (linear search is predictable that way), so it can do a much better job of prefetching and I/O scheduling for a search through an mbox than it can for a Maildir search. Again, mbox wins.
On the other hand, if you know exactly what message you're looking for, the necessary I/O is only slightly different. In an mbox, "knowing" which message you're looking for is best expressed as an offset within the file. Similarly, in a Maildir, "knowing" which message you're looking for is best expressed as a filename, or (better still, in some cases) an inode number. In an mbox, then, you have to open() the file and lseek() to the correct offset (which, in an exceedingly large mbox, may require log(sizeoffile) disk accesses to begin the first read). In a Maildir, you have to merely open() the file, however rather than dealing with the filesystem's method of storing a file, you have to deal with the filesystem's method of storing filenames. In fancy filesystems (e.g. ReiserFS or ext3 with dir_hashing turned on), this can be pretty fast ---on the order of log(numberofmessages), but in boring filesystems (e.g. ext2, ext3 without dir_hashing, vfat, etc.) this can take a lot of time. Between the two, on average, the I/O load is about the same for both actions, though the filesystem particulars are what really make one or the other a better fit for a given situation.
The really irritating thing about Maildir is that the filenames can
change, meaning that "knowing" which message you want (i.e. you have a
filename) may still mean you have to scan through the list of
available filenames and see which ones are similar to the name you
wanted (see why having an inode number can be more useful?), which
takes MUCH longer than lseek().
That would suggest that mbox is only scaleable to a realtively small inbox size.
Not really.
eg. Splitting by message size. If a message is much smaller than the block size, use a single file format and if larger, write out to it's own file. Every folder would have two mechanisms and Dovecot could just look at each message as it comes in to decide how to store it.
Yes, but then you get to the question of: what does that buy you? And, better still: how do you find any given message? Filename+offset? You'd be compounding the worst details of both designs. Not only do you have to lseek() to find your small message, but you have to pay the filename lookup penalty as well---even if you know exactly where your message is. On the other hand, you've reduced the cost of both by relying on the other: your lseek overhead is lower because you are dealing with a smaller file than you'd ordinarily have to, and your filename lookup overhead is lower because you've got fewer files. So, whether this is a good idea probably, once again, depends very much on where the performance curves bend (e.g. if your filesystem gets much slower for more than 10,000 files in one directory, or if it gets much slower if your file is over 1G, or something like that). If your filesystem scales linearly, though, it's not a net gain.
~Kyle
Come to me, son of Jor-El. Kneel before Zod. Snootchie-bootchies. -- Jay