[Dovecot] dsync on large mailbox "fails"
I've been trying to convert an existing set of mbox mail to maildir using dsync, but it seems to fail when it gets to the large boxes.
$ dsync -Dv mirror maildir:~/Maildir
Everything seems to work reasonably well until the first of the large boxes hits:
dsync(jeff): Warning: Maildir /home/jeff/Maildir/.cron.2010-07-10: Synchronization took 1210 seconds (102289 new msgs, 0 flag change attempts, 0 expunge attempts) dsync(jeff): Warning: Transaction log file /home/jeff/Maildir/.cron.2010-07-10/dovecot.index.log was locked for 1210 seconds
After this, the doveadm process goes from ~75% of the CPU core utilization to 100% and stays there, even overnight.
I've also tried
$ dsync -Dv backup maildir:~/Maildir
with similar results.
While I did see some threads on dsync speed, they seemed to be minutes for a few GB of data, not hours.
There is plenty of space on the drive, and, from what I understand, there is not a file-count limit on FreeBSD's ufs (at least not one that I'm anywhere near)
Any suggestions?
Thanks!
Jeff
[jeff@mail ~]$ du -h -d 0 ~/mail 3.9G /home/jeff/mail
[jeff@mail ~]$ ls -lh /var/mail/jeff -rw------- 1 jeff jeff 554M Apr 23 11:27 /var/mail/jeff
[jeff@mail ~]$ df -h Filesystem Size Used Avail Capacity Mounted on /dev/ada0s2g 38G 18G 16G 53% /
(Note that the hostname and the location of the ssl certificate has been manually edited for this message in the following output)
From dmesg.boot:
Copyright (c) 1992-2012 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 9.0-STABLE #0: Sat Apr 21 17:13:07 PDT 2012
root@my.host.name:/usr/obj/usr/src/sys/GENERIC i386
CPU: Intel(R) Atom(TM) CPU 330 @ 1.60GHz (1600.03-MHz 686-class CPU)
Origin = "GenuineIntel" Id = 0x106c2 Family = 6 Model = 1c
Stepping = 2
Features=0xbfe9fbff
[jeff@mail ~]$ doveconf -n # 2.1.3: /usr/local/etc/dovecot/dovecot.conf # OS: FreeBSD 9.0-STABLE i386 auth_mechanisms = plain login disable_plaintext_auth = no first_valid_gid = 1000 first_valid_uid = 1000 listen = * mail_location = mbox:~/mail/:INBOX=/var/mail/%u mail_privileged_group = mail namespace { inbox = yes location = prefix = separator = / } passdb { args = session=yes dovecot driver = pam } protocols = imap service auth { unix_listener /var/spool/postfix/private/auth { group = postfix mode = 0660 user = postfix } unix_listener auth-master { mode = 0600 } user = root } service imap-login { inet_listener imap { port = 0 } process_limit = 200 } ssl = required ssl_cert =
On 4/23/12 12:56 PM, Jeff Kletsky wrote:
I've been trying to convert an existing set of mbox mail to maildir using dsync, but it seems to fail when it gets to the large boxes.
$ dsync -Dv mirror maildir:~/Maildir
Everything seems to work reasonably well until the first of the large boxes hits:
dsync(jeff): Warning: Maildir /home/jeff/Maildir/.cron.2010-07-10: Synchronization took 1210 seconds (102289 new msgs, 0 flag change attempts, 0 expunge attempts) dsync(jeff): Warning: Transaction log file /home/jeff/Maildir/.cron.2010-07-10/dovecot.index.log was locked for 1210 seconds
After this, the doveadm process goes from ~75% of the CPU core utilization to 100% and stays there, even overnight.
I believe that some of problem may be that there were a handful of boxes named with "." in them. These were not "flagged" in the conversion process in any way.
http://wiki2.dovecot.org/Migration/MailFormat indicates that you can't have a "." in the name of a Maildir box
(I dutifully stopped reading that page at "If you can successfully use dsync, you can skip the rest of this page.")
In my case, it may be a contributing factor that there were boxes that ended up "stacking" inside each other:
- cron
- cron.2010-07-10
- cron.2010-07-10.remaining
Still not blisteringly fast, but
dsync(jeff): Warning: Maildir /home/jeff/Maildir/.cron_2010-07-10: Synchronization took 686 seconds (102289 new msgs, 0 flag change attempts, 0 expunge attempts) dsync(jeff): Warning: Transaction log file /home/jeff/Maildir/.cron_2010-07-10/dovecot.index.log was locked for 686 seconds
is certainly better.
I'm still seeing 100% CPU utilization, with the tail of ls -Altr in ~/Maildir showing
drwx------ 5 jeff jeff 512 Apr 23 18:19 .XDA drwx------ 5 jeff jeff 512 Apr 23 18:19 .Spam drwx------ 5 jeff jeff 512 Apr 23 18:42 .Health drwx------ 5 jeff jeff 512 Apr 23 18:42 .cron_2010-07-10_remaining drwx------ 5 jeff jeff 512 Apr 23 18:58 .cron_2010-07-10
It appears from looking in the .cron_2010-07-10 directory that indexing is the action underway
[jeff@mail ~/Maildir]$ ls -Altr .cron_2010-07-10 total 27 drwx------ 2 jeff jeff 512 Apr 23 18:15 new -rw------- 1 jeff jeff 0 Apr 23 18:15 maildirfolder drwx------ 2 jeff jeff 5819392 Apr 23 18:42 tmp drwx------ 2 jeff jeff 5237248 Apr 23 18:42 cur -rw------- 1 jeff jeff 4122971 Apr 23 18:42 dovecot-uidlist -rw------- 1 jeff jeff 36 Apr 23 18:42 dovecot-keywords -rw------- 1 jeff jeff 1650764 Apr 23 18:42 dovecot.index.log.2 -rw------- 1 jeff jeff 1637008 Apr 23 18:42 dovecot.index -rw------- 1 jeff jeff 56124 Apr 23 19:04 dovecot.index.log -rw------- 1 jeff jeff 9442304 Apr 23 19:04 dovecot.index.cache -rw------- 1 jeff jeff 24 Apr 23 19:04 dovecot-uidlist.lock
(time passes)
It is now 20:35 and it has not advanced. I now no longer believe the problem is the mailbox naming, but the indexing.
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 25934 jeff 1 103 0 135M 102M CPU2 2 128:47 100.00% doveadm
[jeff@mail ~/Maildir]$ ls -Altr .cron_2010-07-10 total 28 drwx------ 2 jeff jeff 512 Apr 23 18:15 new -rw------- 1 jeff jeff 0 Apr 23 18:15 maildirfolder drwx------ 2 jeff jeff 5819392 Apr 23 18:42 tmp drwx------ 2 jeff jeff 5237248 Apr 23 18:42 cur -rw------- 1 jeff jeff 4122971 Apr 23 18:42 dovecot-uidlist -rw------- 1 jeff jeff 36 Apr 23 18:42 dovecot-keywords -rw------- 1 jeff jeff 1650764 Apr 23 18:42 dovecot.index.log.2 -rw------- 1 jeff jeff 1637008 Apr 23 20:26 dovecot.index -rw------- 1 jeff jeff 289120 Apr 23 20:36 dovecot.index.log -rw------- 1 jeff jeff 9442304 Apr 23 20:36 dovecot.index.cache -rw------- 1 jeff jeff 24 Apr 23 20:36 dovecot-uidlist.lock
Any suggestions as to how to proceed?
Jeff
On 24.4.2012, at 6.38, Jeff Kletsky wrote:
25934 jeff 1 103 0 135M 102M CPU2 2 128:47 100.00% doveadm
Looks like it could be in an infinite loop. To find out where:
- strace for a few seconds:
strace -tt -p 25934 2>log
- get gdb backtrace a couple of times (2-5 should be enough, preferably slightly different ones):
gdb /usr/local/bin/doveadm 25934
bt full
On 4/24/12 3:26 AM, Timo Sirainen wrote:
On 24.4.2012, at 6.38, Jeff Kletsky wrote:
25934 jeff 1 103 0 135M 102M CPU2 2 128:47 100.00% doveadm Looks like it could be in an infinite loop. To find out where:
- strace for a few seconds:
strace -tt -p 25934 2>log
- get gdb backtrace a couple of times (2-5 should be enough, preferably slightly different ones):
gdb /usr/local/bin/doveadm 25934 bt full
cont <ctrl-c> bt full cont <ctrl-c> ..etc.. Rather than clutter everyone's inbox, even if we know they have a great IMAP server...
http://wildside.wagsky.com/dovecot/2012-04-24/54097/doveadm.gdb.log.bz2 http://wildside.wagsky.com/dovecot/2012-04-24/54097/doveadm.truss.log.bz2
Thanks for looking into this!
Jeff
On 24.4.2012, at 20.05, Jeff Kletsky wrote:
On 4/24/12 3:26 AM, Timo Sirainen wrote:
On 24.4.2012, at 6.38, Jeff Kletsky wrote:
25934 jeff 1 103 0 135M 102M CPU2 2 128:47 100.00% doveadm Looks like it could be in an infinite loop. To find out where: http://wildside.wagsky.com/dovecot/2012-04-24/54097/doveadm.truss.log.bz2
That's a lot of getdirentries() calls in there, repeating all the time. So I don't think it's an infinite loop, just a lot of slowness repeating the same work over and over again. See if it helps if you add:
maildir_very_dirty_syncs = yes
On 04/24/2012 10:08 AM, Timo Sirainen wrote:
25934 jeff 1 103 0 135M 102M CPU2 2 128:47 100.00% doveadm Looks like it could be in an infinite loop. To find out where:
That's a lot of getdirentries() calls in there, repeating all the time. So I don't think it's an infinite loop, just a lot of slowness repeating the same work over and over again. See if it helps if you add:
maildir_very_dirty_syncs = yes
I hope I understood correctly when I tried
$ dsync -Dv -o maildir_very_dirty_syncs=yes backup maildir:~/Maildir
into fresh directories (moving the previous Maildir aside).
I didn't see any significant change (and it did not complete in several hours) on the usual UFS directory, and have also tried ZFS to see if it might be a filesystem issue. That has not completed either after over nine hours.
From what I can see, I am guessing that it is "hanging" in index creation. Below, "mail" is the source directory, and "Maildir" is the destination of the sync. This is about nine hours of run time.
[jeff@mail ~]$ ls -Al mail/.imap/cron_2010-07-10 total 9 -rw------- 1 jeff jeff 4910320 Feb 22 2011 dovecot.index -rw------- 1 jeff jeff 3600384 Apr 25 01:18 dovecot.index.cache -rw------- 1 jeff jeff 56 Feb 22 2011 dovecot.index.log [jeff@mail ~]$ ls -Al Maildir/.cron_2010-07-10 total 26 drwx------ 2 jeff jeff 102291 Apr 25 01:17 cur -rw------- 1 jeff jeff 36 Apr 25 01:18 dovecot-keywords -rw------- 1 jeff jeff 4122971 Apr 25 01:17 dovecot-uidlist -rw------- 1 jeff jeff 24 Apr 25 07:19 dovecot-uidlist.lock -rw------- 1 jeff jeff 1637008 Apr 25 07:03 dovecot.index -rw------- 1 jeff jeff 9442304 Apr 25 07:19 dovecot.index.cache -rw------- 1 jeff jeff 87984 Apr 25 07:19 dovecot.index.log -rw------- 1 jeff jeff 1048600 Apr 25 07:03 dovecot.index.log.2 -rw------- 1 jeff jeff 0 Apr 24 22:12 maildirfolder drwx------ 2 jeff jeff 2 Apr 24 22:12 new drwx------ 2 jeff jeff 2 Apr 25 01:17 tmp
participants (2)
-
Jeff Kletsky
-
Timo Sirainen