[Dovecot] some mail accesses in dovecot 1.1 stall for a bit (in progress)
I noticed this in an earlier beta and took a note, and I still see it sometimes in 1.1b4, but it is only occasional so I have not tracked down yet what dovecot is doing. Just a FYI for the list incase others see it and to say that its an issue I tend to track down that I did not see in 1.0, and its been low on my list until now.
Sometimes when I login to dovecot and read a message from an imap maildir folder, the message contents do not load for at least 10 seconds, sometimes closer to a minute, but after that first message loads, the rest seem fine in that mailbox. The time seems to vary some, usually longer rather than shorter.
On Mon, 2007-10-29 at 17:15 -0400, Adam McDougall wrote:
Sometimes when I login to dovecot and read a message from an imap maildir folder, the message contents do not load for at least 10 seconds, sometimes closer to a minute, but after that first message loads, the rest seem fine in that mailbox. The time seems to vary some, usually longer rather than shorter.
Getting a strace from what the imap process is doing at the time would be useful. It sounds a bit like a stale dotlock, although Dovecot waits for two minutes for that. http://hg.dovecot.org/dovecot/rev/6dd5df1c1ec9 fixes a stale dotlock overriding issue.
On Fri, Nov 02, 2007 at 06:04:12PM +0200, Timo Sirainen wrote:
On Mon, 2007-10-29 at 17:15 -0400, Adam McDougall wrote:
Sometimes when I login to dovecot and read a message from an imap maildir folder, the message contents do not load for at least 10 seconds, sometimes closer to a minute, but after that first message loads, the rest seem fine in that mailbox. The time seems to vary some, usually longer rather than shorter.
Getting a strace from what the imap process is doing at the time would be useful. It sounds a bit like a stale dotlock, although Dovecot waits for two minutes for that. http://hg.dovecot.org/dovecot/rev/6dd5df1c1ec9 fixes a stale dotlock overriding issue.
Agreed, I've wanted to get a trace since the first time I saw it, but I've been using dovecot 1.1 all week and hardly seen it. Its possible a patch improved it or there was some other cause, so its just something I'll watch out for and try to tackle as soon as it becomes reproducable. After some further preparation this weekend I plan to expand my testing of 1.1 beyond just myself and one server, since I've been very happy with the current state of 1.1 this week versus 1.0.
On Fri, Nov 02, 2007 at 12:12:05PM -0400, Adam McDougall wrote:
On Fri, Nov 02, 2007 at 06:04:12PM +0200, Timo Sirainen wrote:
On Mon, 2007-10-29 at 17:15 -0400, Adam McDougall wrote:
> Sometimes when I login to dovecot and read a message from an imap maildir folder,
> the message contents do not load for at least 10 seconds, sometimes closer to a
> minute, but after that first message loads, the rest seem fine in that mailbox.
> The time seems to vary some, usually longer rather than shorter.
Getting a strace from what the imap process is doing at the time would
be useful. It sounds a bit like a stale dotlock, although Dovecot waits
for two minutes for that. http://hg.dovecot.org/dovecot/rev/6dd5df1c1ec9
fixes a stale dotlock overriding issue.
Agreed, I've wanted to get a trace since the first time I saw it, but I've been using dovecot 1.1 all week and hardly seen it. Its possible a patch improved it or there was some other cause, so its just something I'll watch out for and try to tackle as soon as it becomes reproducable. After some further preparation this weekend I plan to expand my testing of 1.1 beyond just myself and one server, since I've been very happy with the current state of 1.1 this week versus 1.0.
I noticed some sluggish message body fetches today that I don't think I've seen before. Each time I fetch a message from this particular mailbox (network/2007/01-Jan for my own record) it takes a few seconds, and mutt makes it look like the fetch was "slow". So did thunderbird. I didn't waste any time and tried to grab a truss and kdump of the action asap (attached). Apparently strace is only available on i386 for FreeBSD, but truss and kdump should be similar, depends what details you are looking for.
It looks like the message fetch actually reads in from disk fairly promptly (you can see the whole spam message in the kdump output) but then it churns away for a while with something like the below:
fstat(13,{mode=-rw------- ,inode=5656735,size=817368,blksize=4096}) = 0 (0x0) fchown(0xd,0x5321,0xffffffff) = 0 (0x0) madvise(0x10649000,0x82000,0x4) = 0 (0x0) fstat(13,{mode=-rw------- ,inode=5656735,size=817368,blksize=4096}) = 0 (0x0) fchown(0xd,0x5321,0xffffffff) = 0 (0x0) madvise(0x10649000,0x82000,0x4) = 0 (0x0) fstat(13,{mode=-rw------- ,inode=5656735,size=817368,blksize=4096}) = 0 (0x0) fchown(0xd,0x5321,0xffffffff) = 0 (0x0) madvise(0x10649000,0x82000,0x4) = 0 (0x0) fstat(13,{mode=-rw------- ,inode=5656735,size=817368,blksize=4096}) = 0 (0x0) fchown(0xd,0x5321,0xffffffff) = 0 (0x0) madvise(0x10649000,0x82000,0x4) = 0 (0x0) fstat(13,{mode=-rw------- ,inode=5656735,size=817368,blksize=4096}) = 0 (0x0) fchown(0xd,0x5321,0xffffffff) = 0 (0x0) madvise(0x10649000,0x82000,0x4) = 0 (0x0) fstat(13,{mode=-rw------- ,inode=5656735,size=817368,blksize=4096}) = 0 (0x0) fchown(0xd,0x5321,0xffffffff) = 0 (0x0) madvise(0x10649000,0x82000,0x4) = 0 (0x0)
For now I plan to leave mutt with the current folder selected, incase something I do makes the behavior go away.
On Sun, Nov 04, 2007 at 11:28:36AM -0500, Adam McDougall wrote:
I noticed some sluggish message body fetches today that I don't think I've seen before. Each time I fetch a message from this particular mailbox (network/2007/01-Jan for my own record) it takes a few seconds, and mutt makes it look like the fetch was "slow". So did thunderbird. I didn't waste any time and tried to grab a truss and kdump of the action asap (attached). Apparently strace is only available on i386 for FreeBSD, but truss and kdump should be similar, depends what details you are looking for.
It looks like the message fetch actually reads in from disk fairly promptly (you can see the whole spam message in the kdump output) but then it churns away for a while with something like the below:
fstat(13,{mode=-rw------- ,inode=5656735,size=817368,blksize=4096}) = 0 (0x0) fchown(0xd,0x5321,0xffffffff) = 0 (0x0) madvise(0x10649000,0x82000,0x4) = 0 (0x0) fstat(13,{mode=-rw------- ,inode=5656735,size=817368,blksize=4096}) = 0 (0x0) fchown(0xd,0x5321,0xffffffff) = 0 (0x0) madvise(0x10649000,0x82000,0x4) = 0 (0x0) fstat(13,{mode=-rw------- ,inode=5656735,size=817368,blksize=4096}) = 0 (0x0) fchown(0xd,0x5321,0xffffffff) = 0 (0x0) madvise(0x10649000,0x82000,0x4) = 0 (0x0) fstat(13,{mode=-rw------- ,inode=5656735,size=817368,blksize=4096}) = 0 (0x0) fchown(0xd,0x5321,0xffffffff) = 0 (0x0) madvise(0x10649000,0x82000,0x4) = 0 (0x0) fstat(13,{mode=-rw------- ,inode=5656735,size=817368,blksize=4096}) = 0 (0x0) fchown(0xd,0x5321,0xffffffff) = 0 (0x0) madvise(0x10649000,0x82000,0x4) = 0 (0x0) fstat(13,{mode=-rw------- ,inode=5656735,size=817368,blksize=4096}) = 0 (0x0) fchown(0xd,0x5321,0xffffffff) = 0 (0x0) madvise(0x10649000,0x82000,0x4) = 0 (0x0)
For now I plan to leave mutt with the current folder selected, incase something I do makes the behavior go away.
Oops, forgot to mention this was with dovecot 1.1b6, no local patches.
On Sun, 2007-11-04 at 11:28 -0500, Adam McDougall wrote:
fstat(13,{mode=-rw------- ,inode=5656735,size=817368,blksize=4096}) = 0 (0x0) fchown(0xd,0x5321,0xffffffff) = 0 (0x0) madvise(0x10649000,0x82000,0x4) = 0 (0x0) fstat(13,{mode=-rw------- ,inode=5656735,size=817368,blksize=4096}) = 0 (0x0) fchown(0xd,0x5321,0xffffffff) = 0 (0x0) madvise(0x10649000,0x82000,0x4) = 0 (0x0) ..
This should fix it: http://hg.dovecot.org/dovecot/rev/de67ceff3199
On Thu, Nov 08, 2007 at 09:55:48PM +0200, Timo Sirainen wrote:
On Sun, 2007-11-04 at 11:28 -0500, Adam McDougall wrote:
fstat(13,{mode=-rw------- ,inode=5656735,size=817368,blksize=4096}) = 0 (0x0) fchown(0xd,0x5321,0xffffffff) = 0 (0x0) madvise(0x10649000,0x82000,0x4) = 0 (0x0) fstat(13,{mode=-rw------- ,inode=5656735,size=817368,blksize=4096}) = 0 (0x0) fchown(0xd,0x5321,0xffffffff) = 0 (0x0) madvise(0x10649000,0x82000,0x4) = 0 (0x0) ..
This should fix it: http://hg.dovecot.org/dovecot/rev/de67ceff3199
I applied the patch to beta7, the IMAP client "feels" the same as before the patch (several second stalls on each message) but kdump and truss reveal different behavior from the process, lots of the following. Going back to a previously selected message in the same folder is fast, but not if I select another folder then come back to the first. No errors in log except for a few "dovecot: IMAP(mcdouga9): fchown(/egr/mail/shared/decs/temp.boomhauer.49823.6b66a5e5f1b24d07) failed: Operation not permitted" which I doubt is part of the same issue. Not every mail folder is slow, it seems like perhaps just some of my archive folders with around 1000-3000 emails. I haven't established a strong pattern yet. Let me know if you want me to try to. Thanks.
49823 imap RET pwrite 4 49823 imap CALL pwrite(0xd,0x7fffffffcb44,0x4,0,0xaf648) 49823 imap GIO fd 13 wrote 4 bytes 0x0000 e89c 0800 |....|
49823 imap RET pwrite 4 49823 imap CALL pwrite(0xd,0x7fffffffcb44,0x4,0,0xaf680) 49823 imap GIO fd 13 wrote 4 bytes 0x0000 b89e 0800 |....|
49823 imap RET pwrite 4 49823 imap CALL pwrite(0xd,0x7fffffffcb44,0x4,0,0xaf690) 49823 imap GIO fd 13 wrote 4 bytes 0x0000 fc9f 0800 |....|
49823 imap RET pwrite 4 49823 imap CALL pwrite(0xd,0x7fffffffcb44,0x4,0,0xaf6a0) 49823 imap GIO fd 13 wrote 4 bytes 0x0000 5ca1 0800 |\...|
pwrite(0xd,0x7fffffffcb44,0x4,0x0,0xc6f3c) = 4 (0x4) pwrite(0xd,0x7fffffffcb44,0x4,0x0,0xc6f74) = 4 (0x4) pwrite(0xd,0x7fffffffcb44,0x4,0x0,0xc6fac) = 4 (0x4) pwrite(0xd,0x7fffffffcb44,0x4,0x0,0xc6fe4) = 4 (0x4) pwrite(0xd,0x7fffffffcb44,0x4,0x0,0xc701c) = 4 (0x4)
On Thu, Nov 08, 2007 at 04:17:57PM -0500, Adam McDougall wrote:
On Thu, Nov 08, 2007 at 09:55:48PM +0200, Timo Sirainen wrote:
On Sun, 2007-11-04 at 11:28 -0500, Adam McDougall wrote:
> fstat(13,{mode=-rw------- ,inode=5656735,size=817368,blksize=4096}) = 0 (0x0)
> fchown(0xd,0x5321,0xffffffff) = 0 (0x0)
> madvise(0x10649000,0x82000,0x4) = 0 (0x0)
> fstat(13,{mode=-rw------- ,inode=5656735,size=817368,blksize=4096}) = 0 (0x0)
> fchown(0xd,0x5321,0xffffffff) = 0 (0x0)
> madvise(0x10649000,0x82000,0x4) = 0 (0x0)
..
This should fix it: http://hg.dovecot.org/dovecot/rev/de67ceff3199
I applied the patch to beta7, the IMAP client "feels" the same as before the patch (several second stalls on each message) but kdump and truss reveal different behavior from the process, lots of the following. Going back to a previously selected message in the same folder is fast, but not if I select another folder then come back to the first. No errors in log except for a few "dovecot: IMAP(mcdouga9): fchown(/egr/mail/shared/decs/temp.boomhauer.49823.6b66a5e5f1b24d07) failed: Operation not permitted" which I doubt is part of the same issue. Not every mail folder is slow, it seems like perhaps just some of my archive folders with around 1000-3000 emails. I haven't established a strong pattern yet. Let me know if you want me to try to. Thanks.
49823 imap RET pwrite 4 49823 imap CALL pwrite(0xd,0x7fffffffcb44,0x4,0,0xaf648) 49823 imap GIO fd 13 wrote 4 bytes 0x0000 e89c 0800 |....|
49823 imap RET pwrite 4 49823 imap CALL pwrite(0xd,0x7fffffffcb44,0x4,0,0xaf680) 49823 imap GIO fd 13 wrote 4 bytes 0x0000 b89e 0800 |....|
49823 imap RET pwrite 4 49823 imap CALL pwrite(0xd,0x7fffffffcb44,0x4,0,0xaf690) 49823 imap GIO fd 13 wrote 4 bytes 0x0000 fc9f 0800 |....|
49823 imap RET pwrite 4 49823 imap CALL pwrite(0xd,0x7fffffffcb44,0x4,0,0xaf6a0) 49823 imap GIO fd 13 wrote 4 bytes 0x0000 5ca1 0800 |\...|
pwrite(0xd,0x7fffffffcb44,0x4,0x0,0xc6f3c) = 4 (0x4) pwrite(0xd,0x7fffffffcb44,0x4,0x0,0xc6f74) = 4 (0x4) pwrite(0xd,0x7fffffffcb44,0x4,0x0,0xc6fac) = 4 (0x4) pwrite(0xd,0x7fffffffcb44,0x4,0x0,0xc6fe4) = 4 (0x4) pwrite(0xd,0x7fffffffcb44,0x4,0x0,0xc701c) = 4 (0x4)
I did a little more checking to confirm that the filehandle being written to is dovecot.index.cache.
On Fri, Nov 09, 2007 at 10:06:00AM -0500, Adam McDougall wrote:
[snip]
pwrite(0xd,0x7fffffffcb44,0x4,0x0,0xc6f3c) = 4 (0x4)
pwrite(0xd,0x7fffffffcb44,0x4,0x0,0xc6f74) = 4 (0x4)
pwrite(0xd,0x7fffffffcb44,0x4,0x0,0xc6fac) = 4 (0x4)
pwrite(0xd,0x7fffffffcb44,0x4,0x0,0xc6fe4) = 4 (0x4)
pwrite(0xd,0x7fffffffcb44,0x4,0x0,0xc701c) = 4 (0x4)
I did a little more checking to confirm that the filehandle being written to is dovecot.index.cache.
Just to update, right now I am not seeing this issue, but I've made a number of changes this weekend and something may have affected it. I'll bring it up again if I see it, but still writing in chunks of 4 bytes seems suboptimal to me in any case but I won't care if it doesn't happen :)
On 11.11.2007, at 19.28, Adam McDougall wrote:
On Fri, Nov 09, 2007 at 10:06:00AM -0500, Adam McDougall wrote:
[snip]
pwrite(0xd,0x7fffffffcb44,0x4,0x0,0xc6f3c) = 4 (0x4) pwrite(0xd,0x7fffffffcb44,0x4,0x0,0xc6f74) = 4 (0x4) pwrite(0xd,0x7fffffffcb44,0x4,0x0,0xc6fac) = 4 (0x4) pwrite(0xd,0x7fffffffcb44,0x4,0x0,0xc6fe4) = 4 (0x4) pwrite(0xd,0x7fffffffcb44,0x4,0x0,0xc701c) = 4 (0x4)
I did a little more checking to confirm that the filehandle being
written to is dovecot.index.cache.Just to update, right now I am not seeing this issue, but I've made a number of changes this weekend and something may have affected it. I'll bring it up again if I see it, but still writing in chunks of 4 bytes seems suboptimal to me in any case but I won't care if it doesn't happen :)
That kind of behavior happens when Dovecot is linking records in
cache file. For example if you start with an empty cache:
1 fetch 1:* internaldate
writes the first records to cache file and writes indexes to them in
dovecot.index. The next command:
2 fetch 1:* rfc822.size
does the same, except it also needs to link the old and new cache
records together by writing those 4 bytes to each message.
But .. Yes, I suppose it should be possible to change it to just
write links to the old records directly while writing the new
records. :) Although it doesn't work well always. The current code
handles properly the case when those 1 and 2 commands are done
simultaneously by different processes.
participants (2)
-
Adam McDougall
-
Timo Sirainen