[Dovecot] Mailbox Hashing
First off, the website documentation is really good for Dovecot but while reading I was not able to find anything pertaining to inbox hashing for Maildirs. I saw plenty about hashing the directories that the user mailboxes live in but nothing about specifically hashing an individual user's inbox directory itself.
Is there any method for hashing the inbox automatically after say 5,000 messages are stored? Example
$Maildir/in/0/message0
$Maildir/in/0/message1
$Maildir/in/0/message2
.
$Maildir/in/0/message4999
$Maildir/in/1/message5000
$Maildir/in/1/message5001
etc
I am not currently using Dovecot but am interested to know if this is available or does running with 20,000+ messages in a single inbox not affect the performance much? I have looked into other file system tuning techniques such as enabling ext3 dir_index or using ReiserFS (maybe not ReiserFS anymore). There will likely be 15,000 to 20,000 accounts spread out on one or more servers using a 6-drive RAID10 setup. Most accounts are not expected to have high message quantities but there will be lots of concurrent connections via pop and imap (and webmail imap).
Any suggestions or feedback would be appreciated.
On Thursday, November 13 at 05:20 PM, quoth Justin Krejci:
Is there any method for hashing the inbox automatically after say 5,000 messages are stored? Example
$Maildir/in/0/message0 $Maildir/in/0/message1 $Maildir/in/0/message2
Not in Maildir. The Maildir format does not allow that, so... It may be possible to do with something like dbox, since that's a Dovecot-specific format.
In general, though, that kind of hashing is usually a workaround for a lousy filesystem (such as ext2), rather than something you'd really *want* to do.
The one exception might be if you want to split someone's inbox over several filesystems, but even that could be accomplished using something like UnionFS. Of course, we're getting outside the realm of production-tested options here, and it would probably introduce all kinds of potential problems with locking and such.
I am not currently using Dovecot but am interested to know if this is available or does running with 20,000+ messages in a single inbox not affect the performance much?
It all depends on the filesystem and what operations you're doing. Dovecot does a *lot* of caching to avoid hitting the filesystem whenever it can. However, randomly accessing messages in your mailbox *will* cause a filesystem access, and the speed of that depends on having a halfway decent filesystem.
I have looked into other file system tuning techniques such as enabling ext3 dir_index or using ReiserFS (maybe not ReiserFS anymore). There will likely be 15,000 to 20,000 accounts spread out on one or more servers using a 6-drive RAID10 setup. Most accounts are not expected to have high message quantities but there will be lots of concurrent connections via pop and imap (and webmail imap).
You should be fine. I'd probably encourage something more stable like ext3 with dir_index (ReiserFS is often viewed as a purely experimental filesystem, and not reliable for production systems). The ext3 documentation suggests that 100k-1M+ files in a single directory should not pose a significant performance problem when using dir_index. I haven't tried it with directories that are *that* big, but I regularly use mailboxes with over 5k messages without problems.
~Kyle
A woman is like a tea bag. It's only when she's in hot water that you realize how strong she is. -- either Eleanor Roosevelt or Carl Sandberg
On Friday, November 14 at 05:30 AM, quoth Charles Marcus:
On 11/13/2008, Kyle Wheeler (kyle-dovecot@memoryhole.net) wrote:
(ReiserFS is often viewed as a purely experimental filesystem, and not reliable for production systems)
Please stop spreading FUD.
<shrug> I'm not saying that's *true*, I'm just saying I've heard that a lot... It's entirely possible that ReiserFS is just as reliable as any other filesystem *now* but someone pushed it into the mainline Linux kernel before it was ready, thereby biting the early adopters with bugs that hadn't been worked out yet and creating the impression that it isn't very stable.
~Kyle
Unthinking respect for authority is the greatest enemy of truth. -- Albert Einstein
On 11/14/2008 10:02 AM, Kyle Wheeler wrote:
On Friday, November 14 at 05:30 AM, quoth Charles Marcus:
On 11/13/2008, Kyle Wheeler (kyle-dovecot@memoryhole.net) wrote:
(ReiserFS is often viewed as a purely experimental filesystem, and not reliable for production systems)
Please stop spreading FUD.
<shrug> I'm not saying that's *true*, I'm just saying I've heard that a lot...
Thats called spreading FUD. If you don't know, you don't know, so why say it? I've heard plenty of horror stories about ext2/ext3, xfs, etc ALL losing data...
The fact is, I've been using reiserfs on numerous boxes for many years with ZERO problems.
The biggest issue is unclean shutdowns, but that problem is not unique to reiserfs, and can be minimized/eliminated by being smart - using battery backed up RAID controllers (if you're using hardware RAID), using good UPSs, using UPS s/w to cleanly shutdown a system before the UPS battery dies in the event of an extended power loss, etc...
Anyway, this is completely OT...
--
Best regards,
Charles
On Friday, November 14 at 11:51 AM, quoth Charles Marcus:
<shrug> I'm not saying that's *true*, I'm just saying I've heard that a lot...
Thats called spreading FUD.
No, it's not. FUD would be "a strategic attempt to influence public opinion by disseminating negative (and vague) information."
I am not trying to influence public opinion, I'm reporting existing public opinion. The consensus opinion of the sysadmins I trust most highly is that ReiserFS is still relatively experimental and has not yet earned their trust---several of them have been bitten by ReiserFS bugs on their development machines (read: data loss due to unrecoverable filesystem corruption). That said, their problems were several years ago. Unfortunately, in the world of filesystem reliability, trust comes slowly once lost (check out how recently ReiserFS has been fixing quota-related problems, including ACL deadlocks).
In any case, I have no strategic purpose here. I have no interest or stake in any filesystem taking over the world. If ReiserFS is extremely stable and extremely reliable, then that's awesome, but it does have a bit of a reputation problem. Denying that it has a negative reputation, or claiming that anyone who describes its reputation is spreading FUD, is not only pointless but also counter productive. If you want to say "well, that may be what you've heard, but I've used ReiserFS on several large, heavily-used, mission-critical systems for several years and have not had any problems", then that would be a useful and important statement. You'd even be helping ReiserFS's reputation. But by having such a knee-jerk reaction to the fact that it's got a negative reputation, you're making the filesystem seem like it's used largely by proselytizers and zealots---which is not a good way to build ReiserFS's reputation.
I've heard plenty of horror stories about ext2/ext3, xfs, etc ALL losing data...
Of course - any filesystem can loose data in bad situations (such as power loss, bad disks, etc.). Ext3 is certainly not perfect for all situations. For example, it's a bad idea on flash media because it keeps its journal file in a fixed spot on the drive, which can wear out that part of a flash drive quickly. The real question is: what are the reputations of those filesystems, and why? Ext2/3 have been around for a very long time, and are extremely well-tested by virtue of their popularity, and as such tend to be more trusted for mission-critical systems (unless there's a reason they shouldn't be used).
The fact is, I've been using reiserfs on numerous boxes for many years with ZERO problems.
Excellent! What kind of systems are we talking about? How heavily loaded? Did you use it with LVM? Did you ever have to use the recovery tools? How well did they work?
~Kyle
The next best thing to solving a problem is finding some humor in it. -- Frank A. Clark
On 11/14/2008, Kyle Wheeler (kyle-dovecot@memoryhole.net) wrote:
FUD would be "a strategic attempt to influence public opinion by disseminating negative (and vague) information."
Well, no need to be pedantic about it... ;)
The fact is, I've been using reiserfs on numerous boxes for many years with ZERO problems.
Excellent! What kind of systems are we talking about?
Nothing fancy... single and dual opteron mostly, with 3ware RAID crds (don't use/recommend those any more - much prefer Areca now)
How heavily loaded?
Nothing special here either... so I'd say low to moderate loads...
Did you use it with LVM?
Yes, on most of them...
Did you ever have to use the recovery tools? How well did they work?
Never had to use them (knock on wood)... :)
--
Best regards,
Charles
participants (3)
-
Charles Marcus
-
Justin Krejci
-
Kyle Wheeler