[Dovecot] Spliting Folders for Efficiency
Kyle Wheeler
kyle-dovecot at memoryhole.net
Thu Nov 1 20:23:25 EET 2007
On Saturday, October 13 at 09:25 AM, quoth Daniel W:
> Thanks for the insights. Is it also true that to read a single
> message in a 800MB mbox, you need to load 800MB of data into memory
> which is then searched for that message?
Not at all. If you don't know what message you're looking for, then
yes (kinda: you could just mmap the mbox file, which reduces your
latency before beginning the search), but Maildir has an even worse
problem: if you don't know what message you're looking for, you have
to open and close every single message-file. And open()/close()
typically has quite a bit more overhead than lseek(). More to the
point, when searching for a file in an mbox, the OS has a very good
idea of what you're going to be looking at next (linear search is
predictable that way), so it can do a much better job of prefetching
and I/O scheduling for a search through an mbox than it can for a
Maildir search. Again, mbox wins.
On the other hand, if you know exactly what message you're looking
for, the necessary I/O is only slightly different. In an mbox,
"knowing" which message you're looking for is best expressed as an
offset within the file. Similarly, in a Maildir, "knowing" which
message you're looking for is best expressed as a filename, or (better
still, in some cases) an inode number. In an mbox, then, you have to
open() the file and lseek() to the correct offset (which, in an
exceedingly large mbox, may require log(sizeoffile) disk accesses to
begin the first read). In a Maildir, you have to merely open() the
file, however rather than dealing with the filesystem's method of
storing a file, you have to deal with the filesystem's method of
storing filenames. In fancy filesystems (e.g. ReiserFS or ext3 with
dir_hashing turned on), this can be pretty fast ---on the order of
log(numberofmessages), but in boring filesystems (e.g. ext2, ext3
without dir_hashing, vfat, etc.) this can take a lot of time. Between
the two, on average, the I/O load is about the same for both actions,
though the filesystem particulars are what really make one or the
other a better fit for a given situation.
The really irritating thing about Maildir is that the filenames can
change, meaning that "knowing" which message you want (i.e. you have a
filename) may still mean you have to scan through the list of
available filenames and see which ones are similar to the name you
wanted (see why having an inode number can be more useful?), which
takes MUCH longer than lseek().
> That would suggest that mbox is only scaleable to a realtively small
> inbox size.
Not really.
> eg. Splitting by message size. If a message is much smaller than the
> block size, use a single file format and if larger, write out to
> it's own file. Every folder would have two mechanisms and Dovecot
> could just look at each message as it comes in to decide how to
> store it.
Yes, but then you get to the question of: what does that buy you? And,
better still: how do you find any given message? Filename+offset?
You'd be compounding the worst details of both designs. Not only do
you have to lseek() to find your small message, but you have to pay
the filename lookup penalty as well---even if you know exactly where
your message is. On the other hand, you've reduced the cost of both by
relying on the other: your lseek overhead is lower because you are
dealing with a smaller file than you'd ordinarily have to, and your
filename lookup overhead is lower because you've got fewer files. So,
whether this is a good idea probably, once again, depends very much on
where the performance curves bend (e.g. if your filesystem gets much
slower for more than 10,000 files in one directory, or if it gets much
slower if your file is over 1G, or something like that). If your
filesystem scales linearly, though, it's not a net gain.
~Kyle
--
Come to me, son of Jor-El. Kneel before Zod. Snootchie-bootchies.
-- Jay
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://dovecot.org/pipermail/dovecot/attachments/20071101/5b431064/attachment.bin
More information about the dovecot
mailing list