[Dovecot] Performance with 200k messages in Maildir
Thomas Harold
thomas-lists at nybeta.com
Tue Jul 19 19:45:07 EEST 2011
On 7/19/2011 11:35 AM, Ricardo Branco wrote:
> I agree with yr points on TBird, moving large amounts of messages can
> cause it to hang with CPU pegged at max for ages.
> TBird v2 was nice and nippy, v3 acceptable, v4/v5 are just awfully slow
> overall.
> TBird uses mbox storage format which probably stuffs it up on large
> deletes/moves etc.
>
It's strictly a UI issue in TBird. They changed the code for
drag-n-drop in v3 betas, I reported a performance regression bug, they
never really fixed it. It's just bad code in the TBird UI because the
time required to drag-n-drop N messages grows much faster then O(N) or
O(log N). So once you get past 2000-3000 messages, the time required is
climbing into the stratosphere.
(Fortunately, there are other, less easy to use ways of moving messages
via the right-click, move-to menu - or the "File" menu in the search
window. None of them are as convenient as drag-n-drop would be.)
Dovecot itself has no issue with the bigger mailboxes, the problems are
mostly either client-side or in running backups.
> Just did a count on our server, 350G of email (largest single mailbox is
> 40G, that is 350k messages), total messages is 3.6mil+, biggest problem
> is on backup, ive read that the latest rsync has fast start now rather
> than wait to finish scanning.
> Im intrested in the latest mdbox format to reduce how many files we have.
> Try backing up small files fast enough to LTO5, tar it all up first
> before backup I think.
> Ile move all our maildirs to 10k SAS soon hopefully to lower the load on
> the SATA disks.
We backup our Maildir users to another machine on the same network using
rdiff-backup. Each user's folder gets processed individually, which
keeps memory usage down and it goes faster on the little mailboxes and
doesn't choke as hard on the big mailboxes. Currently we keep 27 weeks
of snapshots (rdiff-backup only stores deltas each week, so it's not
that much space).
We randomize the order of processing so that in case it breaks halfway
through then at least a different set of accounts will have been backed
up this time.
Takes about 20 minutes to backup that 6GB / 800,000 message mailbox.
Other mailboxes take a few minutes or only a few seconds, total backup
window is under 2 hours for about 50GB of mail.
Just make sure on the destination volume for an rdiff-backup that you
allow lots of extra inodes. Which also holds true for the Maildir store.
(code snippet)
# since RHEL5/CentOS5 don't have "sort -R" option to
# randomize, use the following example
# echo -e "2\n1\n3\n5\n4" | \
# perl -MList::Util -e 'print List::Util::shuffle <>'
# yes, there's probably a better way to find MailDirs
DIRS=`$FIND $BASE -maxdepth 3 -name subscriptions | \
$GREP '/var/vmail' | \
$SED 's:^/var/vmail/::' | $SED 's:subscriptions$::' | \
perl -MList::Util -e 'print List::Util::shuffle <>'`
for DIR in ${DIRS}
do
rdiff-backup -v3 --print-statistics \
--create-full-path /var/vmail/$DIR \
${BKPHOST}::${BKPBASE}${DIR}
rdiff-backup -v3 --force --remove-older-than 27W \
${BKPHOST}::${BKPBASE}${DIR}
done
More information about the dovecot
mailing list