[Dovecot] Performance with 200k messages in Maildir
What should I expect performance-wise if I put 200000 messages in a Maildir store and use two or three MUAs (mutt and Thunderbird), on an Athlon dual core 2GHz with SATA drives in software RAID (Linux)?
Like; would it be useless/crawling, usable or pretty fast. I imagine MUA startup / first time folder read would be slow, but daily use more or less ok.
Thanks to Dovecot's indexing, I imagine I would be better off interfacing Mutt to Maildir via IMAP and Dovecot rather than accessing Maildir/ directly?
-- Vegard Svanberg vegard@svanberg.no [*Takapa@IRC (EFnet)]
To get benefits of Dovecot indexing you would need to interface Mutt to IMAP or else you will scan the Maildir directory each time Mutt is started.
You need to know more info than just 200,000 messages.
How many mailboxes, users, messages per day etc
Vegard Svanberg wrote, On 18/07/2011 11:27:
What should I expect performance-wise if I put 200000 messages in a Maildir store and use two or three MUAs (mutt and Thunderbird), on an Athlon dual core 2GHz with SATA drives in software RAID (Linux)?
Like; would it be useless/crawling, usable or pretty fast. I imagine MUA startup / first time folder read would be slow, but daily use more or less ok.
Thanks to Dovecot's indexing, I imagine I would be better off interfacing Mutt to Maildir via IMAP and Dovecot rather than accessing Maildir/ directly?
- Ricardo Branco ricardo@wenn.com [2011-07-18 18:10]:
To get benefits of Dovecot indexing you would need to interface Mutt to IMAP or else you will scan the Maildir directory each time Mutt is started.
Yes, that's more or less what I wrote / meant to write :)
You need to know more info than just 200,000 messages.
How many mailboxes, users, messages per day etc
One mailbox, one user. Messages per day varies from 200 to 1500.
(I try to limit the mailbox to 200000.)
-- Vegard Svanberg vegard@svanberg.no [*Takapa@IRC (EFnet)]
that is hardly worth considering the load :)
Running spamassassin/clamav will use more load than dovecot would.
Probably using mutt directly against the mailstore might not be good,
but I have no issues with my own mailstore around that size.
Quoting Vegard Svanberg vegard@svanberg.no:
- Ricardo Branco ricardo@wenn.com [2011-07-18 18:10]:
To get benefits of Dovecot indexing you would need to interface Mutt to IMAP or else you will scan the Maildir directory each time Mutt is started.
Yes, that's more or less what I wrote / meant to write :)
You need to know more info than just 200,000 messages.
How many mailboxes, users, messages per day etc
One mailbox, one user. Messages per day varies from 200 to 1500.
(I try to limit the mailbox to 200000.)
-- Vegard Svanberg vegard@svanberg.no [*Takapa@IRC (EFnet)]
- Patrick Domack patrickdk@patrickdk.com [2011-07-19 01:03]:
that is hardly worth considering the load :)
I'm actually not concerned about the load, but about Dovecot's (or the system's) ability to handle many thousand messages in one Maildir. In other words how I should expect the user experience to be.
-- Vegard Svanberg vegard@svanberg.no [*Takapa@IRC (EFnet)]
If you have 200k all within one folder progs like TB will have issues loading it all up and may hang when you try to do moves/deletes etc, not sure if mutt stores a local cache of headers, thats the biggest worry. Biggest single folder ive seen at our office had 60k messages, it loads slowly on a cold cache in TB. Biggest mailbox has over 350k, my mailbox is over 250k, ofcourse thats across several folders. Mailserver is on VMware server (local drives), datastore (with maildirs) is separate NFS server on 11x2TB SATA R6 array (has other SAS disks for other things). If you have it spread out in different folders then it wont be so bad.
Vegard Svanberg wrote, On 19/07/2011 10:44:
- Patrick Domackpatrickdk@patrickdk.com [2011-07-19 01:03]:
that is hardly worth considering the load :) I'm actually not concerned about the load, but about Dovecot's (or the system's) ability to handle many thousand messages in one Maildir. In other words how I should expect the user experience to be.
On 7/19/2011 5:54 AM, Ricardo Branco wrote:
If you have 200k all within one folder progs like TB will have issues loading it all up and may hang when you try to do moves/deletes etc, not sure if mutt stores a local cache of headers, thats the biggest worry. Biggest single folder ive seen at our office had 60k messages, it loads slowly on a cold cache in TB. Biggest mailbox has over 350k, my mailbox is over 250k, ofcourse thats across several folders. Mailserver is on VMware server (local drives), datastore (with maildirs) is separate NFS server on 11x2TB SATA R6 array (has other SAS disks for other things). If you have it spread out in different folders then it wont be so bad.
60k in a single folder is about the upper limit for TBird (TBird v2 was actually better suited for this). But drag-n-drop breaks if you try to do more then 3-5k messages at a time. When a mailbox gets over 30-50k messages, I archive some of them off to a sub-folder in Thunderbird. One of my TBird mailboxes is about 880,000 messages, almost 6GB of email, spread across dozens of directories.
Assuming MailDir storage, the bigger issue will be (a) how well the filesystem handles tens of thousands of files in a single folder (b) the physical disks / speed / number of spindles (c) how busy the CPU is on the server and maybe (d) the amount of server RAM that can be used as cache/buffer. Ext3 is probably fine as long as directory indexing is turned on, but ext4 might be better (or something else that deals well with lots of small files).
The other side is how fast the disks are on the local client. An SSD drive or 10k RPM drive on the local desktop helps a lot when you get up into the larger mailboxes.
I agree with yr points on TBird, moving large amounts of messages can cause it to hang with CPU pegged at max for ages. TBird v2 was nice and nippy, v3 acceptable, v4/v5 are just awfully slow overall. TBird uses mbox storage format which probably stuffs it up on large deletes/moves etc.
Just did a count on our server, 350G of email (largest single mailbox is 40G, that is 350k messages), total messages is 3.6mil+, biggest problem is on backup, ive read that the latest rsync has fast start now rather than wait to finish scanning. Im intrested in the latest mdbox format to reduce how many files we have. Try backing up small files fast enough to LTO5, tar it all up first before backup I think. Ile move all our maildirs to 10k SAS soon hopefully to lower the load on the SATA disks.
Thomas Harold wrote, On 19/07/2011 16:03:
On 7/19/2011 5:54 AM, Ricardo Branco wrote:
If you have 200k all within one folder progs like TB will have issues loading it all up and may hang when you try to do moves/deletes etc, not sure if mutt stores a local cache of headers, thats the biggest worry. Biggest single folder ive seen at our office had 60k messages, it loads slowly on a cold cache in TB. Biggest mailbox has over 350k, my mailbox is over 250k, ofcourse thats across several folders. Mailserver is on VMware server (local drives), datastore (with maildirs) is separate NFS server on 11x2TB SATA R6 array (has other SAS disks for other things). If you have it spread out in different folders then it wont be so bad.
60k in a single folder is about the upper limit for TBird (TBird v2 was actually better suited for this). But drag-n-drop breaks if you try to do more then 3-5k messages at a time. When a mailbox gets over 30-50k messages, I archive some of them off to a sub-folder in Thunderbird. One of my TBird mailboxes is about 880,000 messages, almost 6GB of email, spread across dozens of directories.
Assuming MailDir storage, the bigger issue will be (a) how well the filesystem handles tens of thousands of files in a single folder (b) the physical disks / speed / number of spindles (c) how busy the CPU is on the server and maybe (d) the amount of server RAM that can be used as cache/buffer. Ext3 is probably fine as long as directory indexing is turned on, but ext4 might be better (or something else that deals well with lots of small files).
The other side is how fast the disks are on the local client. An SSD drive or 10k RPM drive on the local desktop helps a lot when you get up into the larger mailboxes.
On 7/19/2011 11:35 AM, Ricardo Branco wrote:
I agree with yr points on TBird, moving large amounts of messages can cause it to hang with CPU pegged at max for ages. TBird v2 was nice and nippy, v3 acceptable, v4/v5 are just awfully slow overall. TBird uses mbox storage format which probably stuffs it up on large deletes/moves etc.
It's strictly a UI issue in TBird. They changed the code for drag-n-drop in v3 betas, I reported a performance regression bug, they never really fixed it. It's just bad code in the TBird UI because the time required to drag-n-drop N messages grows much faster then O(N) or O(log N). So once you get past 2000-3000 messages, the time required is climbing into the stratosphere.
(Fortunately, there are other, less easy to use ways of moving messages via the right-click, move-to menu - or the "File" menu in the search window. None of them are as convenient as drag-n-drop would be.)
Dovecot itself has no issue with the bigger mailboxes, the problems are mostly either client-side or in running backups.
Just did a count on our server, 350G of email (largest single mailbox is 40G, that is 350k messages), total messages is 3.6mil+, biggest problem is on backup, ive read that the latest rsync has fast start now rather than wait to finish scanning. Im intrested in the latest mdbox format to reduce how many files we have. Try backing up small files fast enough to LTO5, tar it all up first before backup I think. Ile move all our maildirs to 10k SAS soon hopefully to lower the load on the SATA disks.
We backup our Maildir users to another machine on the same network using rdiff-backup. Each user's folder gets processed individually, which keeps memory usage down and it goes faster on the little mailboxes and doesn't choke as hard on the big mailboxes. Currently we keep 27 weeks of snapshots (rdiff-backup only stores deltas each week, so it's not that much space).
We randomize the order of processing so that in case it breaks halfway through then at least a different set of accounts will have been backed up this time.
Takes about 20 minutes to backup that 6GB / 800,000 message mailbox. Other mailboxes take a few minutes or only a few seconds, total backup window is under 2 hours for about 50GB of mail.
Just make sure on the destination volume for an rdiff-backup that you allow lots of extra inodes. Which also holds true for the Maildir store.
(code snippet)
# since RHEL5/CentOS5 don't have "sort -R" option to
# randomize, use the following example
# echo -e "2\n1\n3\n5\n4" |
# perl -MList::Util -e 'print List::Util::shuffle <>'
# yes, there's probably a better way to find MailDirs
DIRS=$FIND $BASE -maxdepth 3 -name subscriptions | \ $GREP '/var/vmail' | \ $SED 's:^/var/vmail/::' | $SED 's:subscriptions$::' | \ perl -MList::Util -e 'print List::Util::shuffle <>'
for DIR in ${DIRS}
do
rdiff-backup -v3 --print-statistics
--create-full-path /var/vmail/$DIR
${BKPHOST}::${BKPBASE}${DIR}
rdiff-backup -v3 --force --remove-older-than 27W \
${BKPHOST}::${BKPBASE}${DIR}
done
On Tue, Jul 19, 2011 at 12:45:07PM -0400, Thomas Harold wrote:
Dovecot itself has no issue with the bigger mailboxes, the problems are
mostly either client-side or in running backups.
Some filesystems have issues with that many files in a single directory. I'm thinking of ext2, but I'm sure there are others. The "HTree" feature of ext3 corrects or at least betters that (by a factor of 50 to 100 in some cases). I hope nobody is installing ext2 today, but I'm sure there are old machines that have not been upgraded.
A litte OT on sed, just for the fun:
DIRS=`$FIND $BASE -maxdepth 3 -name subscriptions |
$GREP '/var/vmail' |
$SED 's:^/var/vmail/::' | $SED 's:subscriptions$::' | \
That would probably give unexpected results when confronted with $BASE/tmp/var/mail or directories (users?) named "subscriptions" :-)
I like sed's -n option. It lets one integrate a previous grep by only outputting the line if it matches:
DIRS=`$FIND $BASE -maxdepth 3 -name subscriptions
| $SED -n 's:^/var/mail/::p' | $SED 's:/subscriptions$:/:' | \
or
DIRS=`$FIND $BASE -maxdepth 3 -name subscriptions
| $SED -n 's:^/var/mail/\(.*/\)subscriptions$:\1:p'
On 18/07/2011 11:27, Vegard Svanberg wrote:
What should I expect performance-wise if I put 200000 messages in a Maildir store and use two or three MUAs (mutt and Thunderbird), on an Athlon dual core 2GHz with SATA drives in software RAID (Linux)?
Like; would it be useless/crawling, usable or pretty fast. I imagine MUA startup / first time folder read would be slow, but daily use more or less ok.
Thanks to Dovecot's indexing, I imagine I would be better off interfacing Mutt to Maildir via IMAP and Dovecot rather than accessing Maildir/ directly?
200,000 messages under Maildir in a single folder.
You will tend to have a relatively large number of inodes compared to the relative overall scale of the system.
This will tend to make backing-up a nuisance.
One thing you could consider is storing your mail under "mdbox". This should drastically reduce the number of inodes.
Just remember that under "mdbox" the so-called "indexes" are actually critical data files (i.e. they cannot be re-created like they could be with Maildir or mbox).
Bill
participants (6)
-
Lorens Kockum
-
Patrick Domack
-
Ricardo Branco
-
Thomas Harold
-
Vegard Svanberg
-
William Blunn