[Dovecot] Spliting Folders for Efficiency

Daniel Watts

11 Oct 2007 11 Oct '07

11 a.m.

Dear Timo,

Would there be any sense in giving Dovecot the option to split folders into multiple subfolders when they reached a specified size (probably message count) limit?

Dovecot would monitor folders and when they reached, say, 10,000 messages, silently split the folder on the filesystem to ensure that access remains fast.

I know that Dovecot scales very well but this would give practically unlimited storage capability and also keep things fast. You could even have it so that the latest 100 messages are kept in their own folder for fast access.

.Folder.new .Folder.cur .Folder.tmp

could become:

.Folder__1.new .Folder__1.cur .Folder__1.tmp and .Folder__2.new .Folder__2.cur .Folder__2.tmp

with Dovecot merging them before display as just "Folder" within the mail client.

This could be further extended so that Dovecot could be configured to store 'old' message folders in a separate location. We could then have slower+cheaper+larger storage mounted so that 'old mail' does not take up the expensive local SCSI disks on the machine. Mail from 2 years ago is much less likely to be accessed than mail from the last week.

This would provide very neat behind-the-scenes archiving functionality.

Looking forward to hearing your thoughts.

Best, Daniel

-- Squirrelmail Stable 1.4.8 (and developing on 1.5.2) PHP 5.x Hardened with Eaccelerator Apache 2.x Mysql 5.0.x Imapproxy over Dovecot 1.0.rc27 with Maildir all running on Gentoo Linux for ~5,000 users.

Show replies by date

Curtis Maloney

12 Oct 12 Oct

2:48 a.m.

Daniel Watts wrote:

...

Dear Timo,

Would there be any sense in giving Dovecot the option to split folders into multiple subfolders when they reached a specified size (probably message count) limit?

My understanding is this is partially covered in Timo's "dbox" format, which tries to take the best features of mbox and Maildir.

...

.Folder.new .Folder.cur .Folder.tmp

could become:

.Folder__1.new .Folder__1.cur .Folder__1.tmp and .Folder__2.new .Folder__2.cur .Folder__2.tmp

You would only need to split "cur", unless you expect someone to get over 10,000 new message waiting. "tmp" is only used _whilst_ message are being delivered, so mail clients don't see a partially written message.

...

This could be further extended so that Dovecot could be configured to store 'old' message folders in a separate location. We could then have slower+cheaper+larger storage mounted so that 'old mail' does not take up the expensive local SCSI disks on the machine. Mail from 2 years ago is much less likely to be accessed than mail from the last week.

Also, instead of __N, you could try a different path, so /foo/bar/User/ is for new mail, and /old/slow/disk/User is for older stuff.

...

This would provide very neat behind-the-scenes archiving functionality.

There's really two ideas here... one is the mechanism of multi-directory folders, the other is the policy of separating by age.

-- Curtis Maloney cmaloney@cardgate.net

Daniel Watts

12:06 p.m.

Curtis Maloney wrote:

...

Daniel Watts wrote:

...
Dear Timo,

Would there be any sense in giving Dovecot the option to split folders into multiple subfolders when they reached a specified size (probably message count) limit?

My understanding is this is partially covered in Timo's "dbox" format, which tries to take the best features of mbox and Maildir. Is dbox production ready? It looks interesting. http://wiki.dovecot.org/MailboxFormat/dbox this page says it is not finished.

What actually ARE the advantages of a 'one file per folder' format?? We switched to Maildir because mbox was killing our server. I wouldn't ever switch back. The only thing perhaps is faster Search since you don't have to open lots of files. But for this I reckon it would be best to keep a separate index of content. Dreams of offering a 'google like' imap-search function anyone? =) Are there any (preferably open source) products out there for this?

...

...
.Folder.new .Folder.cur .Folder.tmp

could become:

.Folder__1.new .Folder__1.cur .Folder__1.tmp and .Folder__2.new .Folder__2.cur .Folder__2.tmp

You would only need to split "cur", unless you expect someone to get over 10,000 new message waiting. "tmp" is only used _whilst_ message are being delivered, so mail clients don't see a partially written message.

Ah yes this is true.

...

...
This could be further extended so that Dovecot could be configured to store 'old' message folders in a separate location. We could then have slower+cheaper+larger storage mounted so that 'old mail' does not take up the expensive local SCSI disks on the machine. Mail from 2 years ago is much less likely to be accessed than mail from the last week.

Also, instead of __N, you could try a different path, so /foo/bar/User/ is for new mail, and /old/slow/disk/User is for older stuff.

ah yes - and if it is on the same disk it could just be $HOME/Maildir/cur and $HOME/Maildir/old/cur

...

...
This would provide very neat behind-the-scenes archiving functionality.

There's really two ideas here... one is the mechanism of multi-directory folders, the other is the policy of separating by age.

Ideally there would be a few limits set by the system admin: Min Age of mail Max Age of mail Min number of messages. Max number of messages.

You can then split by either volume or age and control how many emails to keep in 'fast' storage as a minimum - eg always have the most recent 50 emails in local storage, regardless of age.

Dan

Kyle Wheeler

13 Oct 13 Oct

6:53 a.m.

On Friday, October 12 at 11:06 AM, quoth Daniel Watts:

...

What actually ARE the advantages of a 'one file per folder' format??

It depends on the environment. It's exceedingly efficient at storage: on a filesystem with 4k blocks, three 1k messages take up 1 block (4k), where in a one-file-per-message format they take up 3 blocks (12k). Some filesystems have mechanisms of coping with files that only occupy a partial block, but those mechanisms tend to be expensive, and
are often only employed when strapped for space. The one-file-per-folder arrangement also helps when doing sequential reads (i.e. searches, or loading it into memory, or processing it with a filter, or whatever else): when the OS spools the file from disk, it loads it up a block at a time, which in a one-file-per-folder format is several messages, but in a one-file-per-message format is only ever a single message.

I've often contemplated setting up a separate mbox-based namespace in my Dovecot setup (e.g. everything in the Archive folder is saved as an mbox), just for the space savings.

~Kyle

Only the fool hopes to repeat an experience; the wise man knows that every experience is to be viewed as a blessing. -- Henry Miller

Daniel W

10:25 a.m.

Kyle Wheeler wrote:

...

On Friday, October 12 at 11:06 AM, quoth Daniel Watts:

...
What actually ARE the advantages of a 'one file per folder' format??

It depends on the environment. It's exceedingly efficient at storage: on a filesystem with 4k blocks, three 1k messages take up 1 block (4k), where in a one-file-per-message format they take up 3 blocks (12k). Some filesystems have mechanisms of coping with files that only occupy a partial block, but those mechanisms tend to be expensive, and are often only employed when strapped for space. The one-file-per-folder arrangement also helps when doing sequential reads (i.e. searches, or loading it into memory, or processing it with a filter, or whatever else): when the OS spools the file from disk, it loads it up a block at a time, which in a one-file-per-folder format is several messages, but in a one-file-per-message format is only ever a single message.

I've often contemplated setting up a separate mbox-based namespace in my Dovecot setup (e.g. everything in the Archive folder is saved as an mbox), just for the space savings.

Thanks for the insights. Is it also true that to read a single message in a 800MB mbox, you need to load 800MB of data into memory which is then searched for that message? That would suggest that mbox is only scaleable to a realtively small inbox size.

There are other tactics that could be considered as well.

eg. Splitting by message size. If a message is much smaller than the block size, use a single file format and if larger, write out to it's own file. Every folder would have two mechanisms and Dovecot could just look at each message as it comes in to decide how to store it.

Messages are normally quite small but attachments are not. One could have a separate attachments directory that stores files individually. This would keep the mbox small and Dovecot would fetch attachments as needed and never load them into memory otherwise.

However inevitably the mbox will still grow large and the original (proposed) problem of "reading a large file to find a single small message" returns, which would mean I remain unconvinced about the scaleabilty of mbox.

Richard Laager

14 Oct 14 Oct

1:13 a.m.

On Sat, 2007-10-13 at 09:25 +0100, Daniel W wrote:

...

Is it also true that to read a single message in a 800MB mbox, you need to load 800MB of data into memory which is then searched for that message?

Of course not! That's what an index is for.

Richard

Kyle Wheeler

1 Nov 1 Nov

7:23 p.m.

On Saturday, October 13 at 09:25 AM, quoth Daniel W:

...

Thanks for the insights. Is it also true that to read a single message in a 800MB mbox, you need to load 800MB of data into memory which is then searched for that message?

Not at all. If you don't know what message you're looking for, then yes (kinda: you could just mmap the mbox file, which reduces your latency before beginning the search), but Maildir has an even worse problem: if you don't know what message you're looking for, you have to open and close every single message-file. And open()/close() typically has quite a bit more overhead than lseek(). More to the point, when searching for a file in an mbox, the OS has a very good idea of what you're going to be looking at next (linear search is predictable that way), so it can do a much better job of prefetching and I/O scheduling for a search through an mbox than it can for a Maildir search. Again, mbox wins.

On the other hand, if you know exactly what message you're looking for, the necessary I/O is only slightly different. In an mbox, "knowing" which message you're looking for is best expressed as an offset within the file. Similarly, in a Maildir, "knowing" which message you're looking for is best expressed as a filename, or (better still, in some cases) an inode number. In an mbox, then, you have to open() the file and lseek() to the correct offset (which, in an exceedingly large mbox, may require log(sizeoffile) disk accesses to begin the first read). In a Maildir, you have to merely open() the file, however rather than dealing with the filesystem's method of storing a file, you have to deal with the filesystem's method of storing filenames. In fancy filesystems (e.g. ReiserFS or ext3 with dir_hashing turned on), this can be pretty fast ---on the order of log(numberofmessages), but in boring filesystems (e.g. ext2, ext3 without dir_hashing, vfat, etc.) this can take a lot of time. Between the two, on average, the I/O load is about the same for both actions, though the filesystem particulars are what really make one or the other a better fit for a given situation.

The really irritating thing about Maildir is that the filenames can change, meaning that "knowing" which message you want (i.e. you have a
filename) may still mean you have to scan through the list of available filenames and see which ones are similar to the name you wanted (see why having an inode number can be more useful?), which takes MUCH longer than lseek().

...

That would suggest that mbox is only scaleable to a realtively small inbox size.

Not really.

...

eg. Splitting by message size. If a message is much smaller than the block size, use a single file format and if larger, write out to it's own file. Every folder would have two mechanisms and Dovecot could just look at each message as it comes in to decide how to store it.

Yes, but then you get to the question of: what does that buy you? And, better still: how do you find any given message? Filename+offset? You'd be compounding the worst details of both designs. Not only do you have to lseek() to find your small message, but you have to pay the filename lookup penalty as well---even if you know exactly where your message is. On the other hand, you've reduced the cost of both by relying on the other: your lseek overhead is lower because you are dealing with a smaller file than you'd ordinarily have to, and your filename lookup overhead is lower because you've got fewer files. So, whether this is a good idea probably, once again, depends very much on where the performance curves bend (e.g. if your filesystem gets much slower for more than 10,000 files in one directory, or if it gets much slower if your file is over 1G, or something like that). If your filesystem scales linearly, though, it's not a net gain.

~Kyle

Come to me, son of Jor-El. Kneel before Zod. Snootchie-bootchies. -- Jay

Chris Laif

12 Oct 12 Oct

1:40 p.m.

On 10/11/07, Daniel Watts <d@nielwatts.com> wrote:

...

Dear Timo,

Would there be any sense in giving Dovecot the option to split folders into multiple subfolders when they reached a specified size (probably message count) limit?

Many modern file systems offer the possibility to use optimized directory indexes. Listing these directories scales very well. Splitting files into subdirectories would have a negative effect: You have to walk through every directory and merge all file names into one data table.

Chris

Daniel Watts

2:11 p.m.

Chris Laif wrote:

...

On 10/11/07, Daniel Watts <d@nielwatts.com> wrote:

...
Dear Timo,

Would there be any sense in giving Dovecot the option to split folders into multiple subfolders when they reached a specified size (probably message count) limit?

Many modern file systems offer the possibility to use optimized directory indexes. Listing these directories scales very well. Splitting files into subdirectories would have a negative effect: You have to walk through every directory and merge all file names into one data table.

Chris

That is true. But it still leaves the motivation of being able to store rarely accessed 'old' mail in a separate, perhaps remote, location which I can see as valuable. Even though storage is pretty cheap, expensive disks...are not cheap =)

Timo Sirainen

20 Oct 20 Oct

11:18 p.m.

On Thu, 2007-10-11 at 10:00 +0100, Daniel Watts wrote:

...

.Folder__1.new .Folder__1.cur .Folder__1.tmp and .Folder__2.new .Folder__2.cur .Folder__2.tmp

with Dovecot merging them before display as just "Folder" within the mail client.

Virtual folders would enable this, if they're implemented one day..

...

This could be further extended so that Dovecot could be configured to store 'old' message folders in a separate location. We could then have slower+cheaper+larger storage mounted so that 'old mail' does not take up the expensive local SCSI disks on the machine. Mail from 2 years ago is much less likely to be accessed than mail from the last week.

dbox format will support this soon. So that you can configure two (or more) directories for it and then Dovecot will look up the mail files from each of them in order. It would also support automatically moving non-recently accessed mails to the slower dirs.

The current dbox implementation in v1.1 supports only one-message-per-file mode so it's quite similar to maildir. The main problem with implementing fast/slow storage for maildir is that the maildir filenames change all the time, so it would waste the slow storage's I/O all the time when trying to figure out if a file is there or not. dbox doesn't have this problem.

Daniel Watts

25 Jun 25 Jun

12:17 p.m.

Timo Sirainen wrote:

...

On Thu, 2007-10-11 at 10:00 +0100, Daniel Watts wrote:

...
.Folder__1.new .Folder__1.cur .Folder__1.tmp and .Folder__2.new .Folder__2.cur .Folder__2.tmp

with Dovecot merging them before display as just "Folder" within the mail client.

Virtual folders would enable this, if they're implemented one day..

...
This could be further extended so that Dovecot could be configured to store 'old' message folders in a separate location. We could then have slower+cheaper+larger storage mounted so that 'old mail' does not take up the expensive local SCSI disks on the machine. Mail from 2 years ago is much less likely to be accessed than mail from the last week.

dbox format will support this soon. So that you can configure two (or more) directories for it and then Dovecot will look up the mail files from each of them in order. It would also support automatically moving non-recently accessed mails to the slower dirs.

The current dbox implementation in v1.1 supports only one-message-per-file mode so it's quite similar to maildir. The main problem with implementing fast/slow storage for maildir is that the maildir filenames change all the time, so it would waste the slow storage's I/O all the time when trying to figure out if a file is there or not. dbox doesn't have this problem.

Hi Timo! Digging up this thread from 2007. Just had another conversation in my company about how to spread old non-accessed files to cheaper slower storage.

Is this now feasible? I noticed dbox is now v2.0 but see no reference to virtual folders or auto-archiving etc.

Hope you're having a good time State-side!

Best wishes, Dan

Timo Sirainen

27 Jun 27 Jun

11:20 p.m.

On Thu, 2009-06-25 at 11:17 +0100, Daniel Watts wrote:

...

Digging up this thread from 2007. Just had another conversation in my company about how to spread old non-accessed files to cheaper slower storage.

Is this now feasible? I noticed dbox is now v2.0 but see no reference to virtual folders or auto-archiving etc.

Multi-dbox is in v2.0, but single-dbox is already in v1.1! You can configure it to use two directories, e.g.:

mail_location = dbox:~/dbox:ALTPATH=/cheapstorage/%h

v2.0 implements dbox somewhat better, but v1.1's version should work well enough too. v1.1 just creates a pretty useless dbox.index file and also writes (for backup) flag changes to dbox files once in a while.

Moving old messages can be done using expire plugin: http://wiki.dovecot.org/Plugins/Expire#Alternative_dbox_directory_expiration or you can do it manually with mv command as well.

With the above configuration there's no need to use virtual mailboxes.

5987

Age (days ago)

6612

Last active (days ago)

List overview

11 comments

7 participants

participants (7)

Chris Laif
Curtis Maloney
Daniel W
Daniel Watts
Kyle Wheeler
Richard Laager
Timo Sirainen