[Dovecot] rule of thumb for indexing overhead
I realize it's hard to be precise about this, but does anyone have a feel or rule of thumb for a couple of aspects of indexing overhead?
Proportionally, how much space does it take for all 4 files? If I want to give my users a quota of 100 MB for messages, how much real space should I plan for so that I won't run out of space for indexing?
What's the overhead in rebuilding index files? Suppose I use an LDA other than dovecot, so at least the INBOX index is frequently getting out of date. Does it cost much (in CPU, memory, and disk IO) to rebuild the index files? (I'm using maildir.)
WJCarpenter spake the following on 8/22/2007 9:55 PM:
I realize it's hard to be precise about this, but does anyone have a feel or rule of thumb for a couple of aspects of indexing overhead?
Proportionally, how much space does it take for all 4 files? If I want to give my users a quota of 100 MB for messages, how much real space should I plan for so that I won't run out of space for indexing?
What's the overhead in rebuilding index files? Suppose I use an LDA other than dovecot, so at least the INBOX index is frequently getting out of date. Does it cost much (in CPU, memory, and disk IO) to rebuild the index files? (I'm using maildir.)
You can always put the indexes in non-quota space like var. That way the indexes don't get counted against the users files, and won't get corrupted if a user goes over quota.
--
MailScanner is like deodorant... You hope everybody uses it, and you notice quickly if they don't!!!!
wjc> I realize it's hard to be precise about this, but does anyone wjc> have a feel or rule of thumb for a couple of aspects of indexing wjc> overhead?
ss> You can always put the indexes in non-quota space like var. That ss> way the indexes don't get counted against the users files, and ss> won't get corrupted if a user goes over quota.
Thanks, but I think you misunderstood my question. I'm trying to figure out how much disk space I'll actually need for holding the index files. In other words, I'm looking for a planning factor to figure out how much disk to buy when I got to Fry's :-).
(I'll be using the Dovecot quota stuff, so index files are already not in the count.)
WJCarpenter wrote:
Thanks, but I think you misunderstood my question. I'm trying to figure out how much disk space I'll actually need for holding the index files. In other words, I'm looking for a planning factor to figure out how much disk to buy when I got to Fry's :-).
Using my own maildir as an example here:
root#fam [(/var/gopostal/maildirs/copyleft.no/vegarn)] find . | wc -l 54365 root#fam [(/var/gopostal/maildirs/copyleft.no/vegarn)] du -sh . 902M . root#fam [(/var/gopostal/maildirs/copyleft.no/vegarn)] find . -name "dovecot.*" | xargs du -sch | grep total 41M total
Just under a 1:20 ratio in my case.
Cheers,
Vegar Nilsen
Copyleft Software AS
On Wed, 2007-08-22 at 21:55 -0700, WJCarpenter wrote:
I realize it's hard to be precise about this, but does anyone have a feel or rule of thumb for a couple of aspects of indexing overhead?
- Proportionally, how much space does it take for all 4 files? If I want to give my users a quota of 100 MB for messages, how much real space should I plan for so that I won't run out of space for indexing?
It really depends on what IMAP client is being used. Something like 10-20% maybe.
- What's the overhead in rebuilding index files? Suppose I use an LDA other than dovecot, so at least the INBOX index is frequently getting out of date. Does it cost much (in CPU, memory, and disk IO) to rebuild the index files? (I'm using maildir.)
Indexes aren't normally "rebuilt", they're "updated". And the update overhead is practically nothing with maildir.
I just wrote this: http://wiki.dovecot.org/LDA/Indexing
Indexes aren't normally "rebuilt", they're "updated". And the update overhead is practically nothing with maildir.
I just wrote this: http://wiki.dovecot.org/LDA/Indexing
Hi Timo...
So, if I understand this correctly, if I'm using maildir, I could use exim to do the local delivery instead of dovecot's LDA, and the index update wouldn't such a big problem?
In my case, local delivery and the real imap servers are in different boxes, so by having exim doing the local delivery, I'd avoid a dovecot install in the local delivery boxes. And of course, exim doesn't have to spawn dovecot's delivery agent on every siingle piece of mail.
Also, are there any drawbacks of using exim to do the local delivery?
Thanks a bunch, g.
On Wed, 2007-08-29 at 11:44 -0700, Tom Bombadil wrote:
I just wrote this: http://wiki.dovecot.org/LDA/Indexing Also, are there any drawbacks of using exim to do the local delivery?
Like the wiki page says, there shouldn't be really any performance problems with that. I can't say if there are any other problems.
Also, are there any drawbacks of using exim to do the local delivery?
I'm very interested in the answer to this question, too. So far I have found (through reading, not trying things yet) that Dovecot's quota handling is more flexible than Exim's (exim is pretty much limited to FS quotas, I think, which is no good for virtual users). Dovecot's Sieve implementation has more features than Exim's. Both of these happen to matter to me (and led me to Dovecot in the first place.)
An upside for Exim doing the delivery is that you theoretically can arrange for some additional rejections to happen at SMTP time. Pragmatically, those additional rejections are typically not done at SMTP time anyhow (e.g., an explicit fail from a user Exim filter).
On Wed, 29 Aug 2007 12:31:01 -0700 (PDT) "WJCarpenter" bill-dovecot@carpenter.ORG wrote:
Also, are there any drawbacks of using exim to do the local delivery?
I'm very interested in the answer to this question, too. So far I have found (through reading, not trying things yet) that Dovecot's quota handling is more flexible than Exim's (exim is pretty much limited to FS quotas, I think, which is no good for virtual users). Dovecot's Sieve implementation has more features than Exim's. Both of these happen to matter to me (and led me to Dovecot in the first place.)
We use exim all the way to local delivery. And it handles Maildir++ quotas just fine. Depending on your actual needs and userdb/authdb choices the dovecot LDA might be the better choice. But exim can do pretty much everything one needs (we don't use sieve) and the advantages of indexing during delivery are negligible for us in general and potentially negative in some scenarios (mass mail to many/all users).
An upside for Exim doing the delivery is that you theoretically can arrange for some additional rejections to happen at SMTP time. Pragmatically, those additional rejections are typically not done at SMTP time anyhow (e.g., an explicit fail from a user Exim filter).
I don't think there are many situations where a MTA can do better than a LDA when it comes to rejections during SMTP time. Since at least with exim local delivery happens in a separate stage AFTER SMTP has been successfully completed.Now having consistent errors, logs and configurations for all mail delivery stages is something that might you want to stick with your MTA from the edge to the mailbox.
Regards,
Christian
Christian Balzer Network/Systems Engineer NOC chibi@gol.com Global OnLine Japan/Fusion Network Services http://www.gol.com/
We use exim all the way to local delivery. And it handles Maildir++ quotas just fine.
Ah, right. I was misremembering. It's the DB-stored quotas in dovecot that I was thinking of using some time back. Sorry for my misstatement.
of indexing during delivery are negligible for us in general and potentially negative in some scenarios (mass mail to many/all users).
Is this negative hypothetical or have you actually seen load spikes in situations like this?
I don't think there are many situations where a MTA can do better than a LDA when it comes to rejections during SMTP time. Since at
I agree. The opportunities are more theoretical than practical. If you include processing user .forward files as part of the exim recipient verification or if you include a quota check after SMTP DATA ....
of indexing during delivery are negligible for us in general and potentially negative in some scenarios (mass mail to many/all users).
Is this negative hypothetical or have you actually seen load spikes in situations like this?
I actually did see that. In the case of exim, for each piece of mail going to the mailbox, one exim process is spawned, and this exim delivery process spawns the dovecot's LDA. The load pretty much doubles.
But the number of exim delivery processes can be limited in exim's config file.
Cheers, g.
Hello,
On Wed, 29 Aug 2007 18:10:00 -0700 Tom Bombadil grlists@gmail.com wrote:
of indexing during delivery are negligible for us in general and potentially negative in some scenarios (mass mail to many/all users).
Is this negative hypothetical or have you actually seen load spikes in situations like this?
I actually did see that. In the case of exim, for each piece of mail going to the mailbox, one exim process is spawned, and this exim delivery process spawns the dovecot's LDA. The load pretty much doubles.
There is that little detail of the additional processes spawned, but the overhead for that alone is not so much of an issue here (linux, plenty of CPU and memory to spare). But when you have 80000 emails rushing towards your mailstorage (and yes, of course we have sensible limits on number of process, maximum load before defer, etc) your I/O will be a bottleneck and adding the additional strain of indexing on top of that is not a good idea. It is only at times of mass mails like that I see our mailstorge boxes break into a light sweat and avoiding any additional load then is just the sensible thing to do. The indexing for these mails will be spread out over hours to days (frequency of access by the clients) instead of being lumped on top of the existing peak and I/O saturation.
It all boils down to your hardware and usage patterns. Our systems are plenty fast and any delay by having to (re)build the indexes is hardly noticeable. But if you have a system with less reserves, users that access mail very infrequently but get plenty of mail (so indexing at access time will be an involved process) and no mass mail spikes, dovecot LDA starts to look a lot more promising. ^_^
Regards,
Christian
Christian Balzer Network/Systems Engineer NOC chibi@gol.com Global OnLine Japan/Fusion Network Services http://www.gol.com/
participants (8)
-
bill-dovecot@carpenter.ORG
-
Christian Balzer
-
Scott Silva
-
Timo Sirainen
-
Tom Bombadil
-
Vegar Nilsen
-
WJCarpenter
-
WJCarpenter