[Dovecot] Performance with 200k messages in Maildir

Thomas Harold thomas-lists at nybeta.com
Tue Jul 19 19:45:07 EEST 2011


On 7/19/2011 11:35 AM, Ricardo Branco wrote:
> I agree with yr points on TBird, moving large amounts of messages can
> cause it to hang with CPU pegged at max for ages.
> TBird v2 was nice and nippy, v3 acceptable, v4/v5 are just awfully slow
> overall.
> TBird uses mbox storage format which probably stuffs it up on large
> deletes/moves etc.
>

It's strictly a UI issue in TBird.  They changed the code for 
drag-n-drop in v3 betas, I reported a performance regression bug, they 
never really fixed it.  It's just bad code in the TBird UI because the 
time required to drag-n-drop N messages grows much faster then O(N) or 
O(log N).  So once you get past 2000-3000 messages, the time required is 
climbing into the stratosphere.

(Fortunately, there are other, less easy to use ways of moving messages 
via the right-click, move-to menu - or the "File" menu in the search 
window.  None of them are as convenient as drag-n-drop would be.)

Dovecot itself has no issue with the bigger mailboxes, the problems are 
mostly either client-side or in running backups.

> Just did a count on our server, 350G of email (largest single mailbox is
> 40G, that is 350k messages), total messages is 3.6mil+, biggest problem
> is on backup, ive read that the latest rsync has fast start now rather
> than wait to finish scanning.
> Im intrested in the latest mdbox format to reduce how many files we have.
> Try backing up small files fast enough to LTO5, tar it all up first
> before backup I think.
> Ile move all our maildirs to 10k SAS soon hopefully to lower the load on
> the SATA disks.

We backup our Maildir users to another machine on the same network using 
rdiff-backup.  Each user's folder gets processed individually, which 
keeps memory usage down and it goes faster on the little mailboxes and 
doesn't choke as hard on the big mailboxes.  Currently we keep 27 weeks 
of snapshots (rdiff-backup only stores deltas each week, so it's not 
that much space).

We randomize the order of processing so that in case it breaks halfway 
through then at least a different set of accounts will have been backed 
up this time.

Takes about 20 minutes to backup that 6GB / 800,000 message mailbox. 
Other mailboxes take a few minutes or only a few seconds, total backup 
window is under 2 hours for about 50GB of mail.

Just make sure on the destination volume for an rdiff-backup that you 
allow lots of extra inodes.  Which also holds true for the Maildir store.

(code snippet)

# since RHEL5/CentOS5 don't have "sort -R" option to
# randomize, use the following example
# echo -e "2\n1\n3\n5\n4" | \
#    perl -MList::Util -e 'print List::Util::shuffle <>'

# yes, there's probably a better way to find MailDirs
DIRS=`$FIND $BASE -maxdepth 3 -name subscriptions | \
     $GREP '/var/vmail' | \
     $SED 's:^/var/vmail/::' | $SED 's:subscriptions$::' | \
     perl -MList::Util -e 'print List::Util::shuffle <>'`

for DIR in ${DIRS}
do
     rdiff-backup -v3 --print-statistics \
         --create-full-path /var/vmail/$DIR \
         ${BKPHOST}::${BKPBASE}${DIR}

     rdiff-backup -v3 --force --remove-older-than 27W  \
         ${BKPHOST}::${BKPBASE}${DIR}
done



More information about the dovecot mailing list