[Dovecot] Hang in 1.0.alpha4
My IMAP client hangs intermittently. When it does, Dovecot has prematurely stopped sending a response and entered the poll loop. My client is waiting for more data and Dovecot is waiting for an IMAP command.
This problem seems to occur when I connect to the dovecot process over the network and when I run the imap binary directly via an SSH tunnel.
Below is truss output from the imap process for one of the hangs. It is responding to a FETCH, gets EAGAIN, and then seems to give up sending the rest of the message and enters the poll loop. Below that is the log from my IMAP client to show its side of the story.
Nothing shows up in the normal Dovecot log. I haven't caught it in the act recently when running with rawlog. But rawlog output from before I was running with truss shows that Dovecot does not think it has finished sending its response.
truss output: 12951: stat64("/home/happy/Mail/banner-ss", 0xFFBFF200) = 0 12951: fstat64(12, 0xFFBFF388) = 0 12951: stat64("/home/happy/Mail/.imap/banner-ss/dovecot.index", 0xFFBFF2F0) = 0 12951: fstat64(18, 0xFFBFF388) = 0 12951: pread64(18, "\n F r o m b e v e r l".., 4096, 1592560) = 4096 12951: pread64(18, " S u n G a r d * S C".., 4094, 1596656) = 4094 12951: pread64(18, " M i c r o s o f t S a".., 4094, 1600750) = 4094 12951: pread64(18, " r g a r e t . d i l l a".., 4094, 1604844) = 3720 12951: pread64(18, 0x0025EEA2, 374, 1608564) = 0 12951: pread64(18, 0x0025EEA2, 374, 1608564) = 0 12951: pread64(18, " R e t u r n - P a t h :".., 4096, 1592616) = 4096 12951: write(1, " * 2 5 0 F E T C H ".., 4073) = 4073 12951: pread64(18, " a l v e r n , P A ".., 4095, 1596712) = 4095 12951: write(1, "\n W i l l i a m B . ".., 4060) = 4060 12951: pread64(18, " l ( i n c l u d i n g".., 4095, 1600807) = 4095 12951: write(1, "\n & # 8 2 2 6 ; < / b".., 4069) = 4069 12951: pread64(18, " m o z - t x t - l i n k".., 4095, 1604902) = 3662 12951: write(1, "\n ".., 4079) Err#11 EAGAIN 12951: write(1, "\n ".., 4096) Err#11 EAGAIN 12951: poll(0x000A0AD0, 3, 533) = 1 12951: write(1, "\n ".., 4096) = 4096 12951: poll(0x000A0AD0, 2, 532) = 0 12951: poll(0x000A0AD0, 2, 0) = 0 12951: poll(0x000A0AD0, 2, 999) (sleeping...) 12951: poll(0x000A0AD0, 2, 999) = 0
Here's the log from my IMAP client. I removed email addresses and most of the body of the message to protect the innocent. The "BYE" was caused by me running 'kill $PID' on the imap process.
3551 OK Fetch completed.! 3552 UID FETCH 252 BODY.PEEK[]
- 250 FETCH (UID 252 BODY[] {16341}!
Return-Path: abc@usg.edu!
Received: from usg.edu (abc.usg.edu [1.2.8.1])!
by abc.usg.edu (8.12.11/8.12.11) with ESMTP id j9KFqoga012884!
for abc@USG.EDU; Thu, 20 Oct 2005 11:52:50 -0400 (EDT)!
Message-ID: 4357BD52.1020506@usg.edu!
Date: Thu, 20 Oct 2005 11:52:50 -0400!
From: abc abc@usg.edu!
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)!
X-Accept-Language: en-us, en!
MIME-Version: 1.0!
To: abc@usg.edu!
Subject: [Fwd: [Fwd: Fw: Java Library Problem]] - URGENT, I'm sorry...!
Content-Type: multipart/mixed;!
boundary="------------010604070704040405060605"!
X-USG-MailScanner: Found to be clean!
X-USG-MailScanner-SpamCheck: !
X-MailScanner-From: abc@usg.edu!
!
This is a multi-part message in MIME format.!
--------------010604070704040405060605!
Content-Type: multipart/alternative;!
boundary="------------040009050704020209010709"!
!
!
--------------040009050704020209010709!
Content-Type: text/plain; charset=us-ascii; format=flowed!
Content-Transfer-Encoding: 7bit!
!
Guys,!
Please create a ssh login on... from !
24 hours now) with testing. His account should look like Patrick !
I've requested a static IP from Dan.!
!
...!
</pre>!
</body>! </html>! ! --------------040009050704020209010709--! ! --------------010604070704040405060605! Content-Type: text/x-vcard;! Content-Transfer-Encoding: 7bit! Content-Disposition: attachment;! ! begin:vcard! ...! tel;work:(xxx) xxx-2000! tel;fax:(xxx) xxx-xxxx! url:http://www.u* BYE Server shutting down.! imap(happy): Warning: Killed with signal 15
The mail message in question actually ends like this:
tel;work:(xxx) xxx-2000 tel;fax:(xxx) 583xxx-xxxx url:http://www.usg.edu version:2.1 end:vcard
--------------010604070704040405060605--
On Mon, 2005-10-24 at 12:57 -0400, Mark Plaksin wrote:
My IMAP client hangs intermittently. When it does, Dovecot has prematurely stopped sending a response and entered the poll loop. My client is waiting for more data and Dovecot is waiting for an IMAP command.
If you can easily reproduce this, could you try it with this patch:
http://dovecot.org/patches/hang-debug.diff
And show what the log files says after it hangs?
Timo Sirainen tss@iki.fi writes:
On Mon, 2005-10-24 at 12:57 -0400, Mark Plaksin wrote:
My IMAP client hangs intermittently. When it does, Dovecot has prematurely stopped sending a response and entered the poll loop. My client is waiting for more data and Dovecot is waiting for an IMAP command.
If you can easily reproduce this, could you try it with this patch:
http://dovecot.org/patches/hang-debug.diff
And show what the log files says after it hangs?
The hang usually happens at least once a day. I've been checking mail much less frequently this week (I'm at a conference) and it hasn't happened yet. In the meantime, running with this patch (and the 'rawlog imap' fix--thanks!) I get these messages a lot (~1300 times a day):
imap(happy): Warning: cork changed, set=1, io=0 imap(happy): Warning: cork changed, set=0, io=0 imap(happy): Warning: buffer_flush=1, flush_pending=0
Does this tell us anything? Or is this debug info only useful when there's a hang?
Thanks!
On Fri, 2005-10-28 at 17:37 -0400, Mark Plaksin wrote:
imap(happy): Warning: cork changed, set=1, io=0 imap(happy): Warning: cork changed, set=0, io=0 imap(happy): Warning: buffer_flush=1, flush_pending=0
Does this tell us anything? Or is this debug info only useful when there's a hang?
When it hangs, seeing last of those messages could help in figuring out where the problem is.
Timo Sirainen tss@iki.fi writes:
On Mon, 2005-10-24 at 12:57 -0400, Mark Plaksin wrote:
My IMAP client hangs intermittently. When it does, Dovecot has prematurely stopped sending a response and entered the poll loop. My client is waiting for more data and Dovecot is waiting for an IMAP command.
If you can easily reproduce this, could you try it with this patch:
http://dovecot.org/patches/hang-debug.diff
And show what the log files says after it hangs?
Running with CVS from 10.27.05 15:09 EDT and this patch, I get folder corruption. I can't easily reproduce it but it happens frequently (4 times in 3 hours this morning).
The patch looks innocuous so I suspect CVS. alpha4 doesn't have the problem. Is there anything in CVS which could have affected this? I'll update to the latest CVS and try some more.
I'm running 'dovecot --exec-mail imap' on Solaris with lock_method=fcntl; the rest of the locking is default.
Of the four problems, the body of the message went missing twice. And twice the body of a message were overwritten by the headers of the next message. There's nothing surprising in the rawlogs or any other logs.
Mark Plaksin happy@usg.edu writes:
Timo Sirainen tss@iki.fi writes:
On Mon, 2005-10-24 at 12:57 -0400, Mark Plaksin wrote:
My IMAP client hangs intermittently. When it does, Dovecot has prematurely stopped sending a response and entered the poll loop. My client is waiting for more data and Dovecot is waiting for an IMAP command.
If you can easily reproduce this, could you try it with this patch:
http://dovecot.org/patches/hang-debug.diff> And show what the log files says after it hangs?
Running with CVS from 10.27.05 15:09 EDT and this patch, I get folder corruption. I can't easily reproduce it but it happens frequently (4 times in 3 hours this morning).
The patch looks innocuous so I suspect CVS. alpha4 doesn't have the problem. Is there anything in CVS which could have affected this? I'll update to the latest CVS and try some more.
After 24 hours, CVS from 10.31.05 11:09 EST has not corrupted any folders. Also, the original hang problem I reported has not recurred since I started running the CVS version on 10.27.05.
Perhaps all my troubles are over!
Mark Plaksin happy@usg.edu writes:
Mark Plaksin happy@usg.edu writes:
Timo Sirainen tss@iki.fi writes:
On Mon, 2005-10-24 at 12:57 -0400, Mark Plaksin wrote:
My IMAP client hangs intermittently. When it does, Dovecot has prematurely stopped sending a response and entered the poll loop. My client is waiting for more data and Dovecot is waiting for an IMAP command.
If you can easily reproduce this, could you try it with this patch:
http://dovecot.org/patches/hang-debug.diff>>> And show what the log files says after it hangs?
Running with CVS from 10.27.05 15:09 EDT and this patch, I get folder corruption. I can't easily reproduce it but it happens frequently (4 times in 3 hours this morning).
The patch looks innocuous so I suspect CVS. alpha4 doesn't have the problem. Is there anything in CVS which could have affected this? I'll update to the latest CVS and try some more.
After 24 hours, CVS from 10.31.05 11:09 EST has not corrupted any folders. Also, the original hang problem I reported has not recurred since I started running the CVS version on 10.27.05.
Perhaps all my troubles are over!
No such luck. It just hang. Everything (truss, rawlog, client log) looks the same as before. Here's the end of the debug output that the hang-debug patch added:
imap(happy): Warning: buffer_flush=1, flush_pending=0 imap(happy): Warning: cork changed, set=1, io=0 imap(happy): Warning: cork changed, set=0, io=0 imap(happy): Warning: buffer_flush=1, flush_pending=0 imap(happy): Warning: cork changed, set=1, io=0 imap(happy): Warning: cork changed, set=0, io=0 imap(happy): Warning: buffer_flush=1, flush_pending=0 imap(happy): Warning: cork changed, set=1, io=0 imap(happy): Warning: cork changed, set=0, io=0 imap(happy): Warning: buffer_flush=1, flush_pending=0 imap(happy): Warning: cork changed, set=1, io=0 imap(happy): Warning: func -> command_pending=1 imap(happy): Warning: _flush_pending -> 1 imap(happy): Warning: cork changed, set=0, io=0 imap(happy): Warning: buffer_flush=0, flush_pending=1 imap(happy): Warning: - adding2
Mark Plaksin happy@usg.edu writes:
No such luck. It just hang. Everything (truss, rawlog, client log) looks the same as before. Here's the end of the debug output that the hang-debug patch added:
...
imap(happy): Warning: buffer_flush=1, flush_pending=0 imap(happy): Warning: cork changed, set=1, io=0 imap(happy): Warning: func -> command_pending=1 imap(happy): Warning: _flush_pending -> 1 imap(happy): Warning: cork changed, set=0, io=0 imap(happy): Warning: buffer_flush=0, flush_pending=1 imap(happy): Warning: - adding2
I can now reproduce a hang at will by simulating an IMAP client running the commands from the rawlog.in corresponding to a hang. The hang-debug and truss output is slightly different from before. The debug output is below. Let me know if you need the truss output too (I can't tell how much of it would be useful; it's too big to post the whole thing).
imap(happy): Warning: cork changed, set=0, io=0 imap(happy): Warning: buffer_flush=1, flush_pending=0 imap(happy): Warning: cork changed, set=1, io=0 imap(happy): Warning: cork changed, set=0, io=0 imap(happy): Warning: buffer_flush=1, flush_pending=0 imap(happy): Warning: func -> command_pending=1 imap(happy): Warning: _flush_pending -> 1 imap(happy): Warning: - adding
Mark Plaksin happy@usg.edu writes:
Mark Plaksin happy@usg.edu writes:
No such luck. It just hang. Everything (truss, rawlog, client log) looks the same as before. Here's the end of the debug output that the hang-debug patch added:
...
imap(happy): Warning: buffer_flush=1, flush_pending=0 imap(happy): Warning: cork changed, set=1, io=0 imap(happy): Warning: func -> command_pending=1 imap(happy): Warning: _flush_pending -> 1 imap(happy): Warning: cork changed, set=0, io=0 imap(happy): Warning: buffer_flush=0, flush_pending=1 imap(happy): Warning: - adding2
I can now reproduce a hang at will by simulating an IMAP client running the commands from the rawlog.in corresponding to a hang. The hang-debug and truss output is slightly different from before. The debug output is below. Let me know if you need the truss output too (I can't tell how much of it would be useful; it's too big to post the whole thing).
imap(happy): Warning: cork changed, set=0, io=0 imap(happy): Warning: buffer_flush=1, flush_pending=0 imap(happy): Warning: cork changed, set=1, io=0 imap(happy): Warning: cork changed, set=0, io=0 imap(happy): Warning: buffer_flush=1, flush_pending=0 imap(happy): Warning: func -> command_pending=1 imap(happy): Warning: _flush_pending -> 1 imap(happy): Warning: - adding
And, FWIW, the hang happens with both mbox and maildir. I was using mbox until last Friday when I switched to maildir. This morning I had a hang with maildir which produced this debug output:
imap(happy): Warning: buffer_flush=1, flush_pending=0 imap(happy): Warning: cork changed, set=1, io=0 imap(happy): Warning: cork changed, set=0, io=0 imap(happy): Warning: buffer_flush=1, flush_pending=0 imap(happy): Warning: cork changed, set=1, io=0 imap(happy): Warning: func -> command_pending=1 imap(happy): Warning: _flush_pending -> 1 imap(happy): Warning: cork changed, set=0, io=0 imap(happy): Warning: buffer_flush=0, flush_pending=1 imap(happy): Warning: - adding2
On Wed, 2005-11-02 at 13:32 -0500, Mark Plaksin wrote:
I can now reproduce a hang at will by simulating an IMAP client running the commands from the rawlog.in corresponding to a hang. The hang-debug and truss output is slightly different from before. The debug output is below. Let me know if you need the truss output too (I can't tell how much of it would be useful; it's too big to post the whole thing).
imap(happy): Warning: cork changed, set=0, io=0 imap(happy): Warning: buffer_flush=1, flush_pending=0 imap(happy): Warning: cork changed, set=1, io=0 imap(happy): Warning: cork changed, set=0, io=0 imap(happy): Warning: buffer_flush=1, flush_pending=0 imap(happy): Warning: func -> command_pending=1 imap(happy): Warning: _flush_pending -> 1 imap(happy): Warning: - adding
Did imap/client.c get also patched? None of these show _client_output being called. And these were the last lines before it hanged? It shows that it should be calling _client_output next..
I updated http://dovecot.org/patches/hang-debug.diff , could you try again with it?
Timo Sirainen wrote:
Did imap/client.c get also patched? None of these show _client_output being called. And these were the last lines before it hanged? It shows that it should be calling _client_output next..
I updated http://dovecot.org/patches/hang-debug.diff , could you try again with it?
I have been having the same problems post-0.99: periodic hangs, usually when selecting a large mailbox or fetching a large mail. With 1.0a5 and the above patch, here are the last few lines of logs:
imap(zakj): Warning: buffer_flush=1, flush_pending=0 imap(zakj): Warning: cork changed, set=1, io=0x0 imap(zakj): Warning: cork changed, set=0, io=0x0 imap(zakj): Warning: buffer_flush=1, flush_pending=0 imap(zakj): Warning: cork changed, set=1, io=0x0 imap(zakj): Warning: cork changed, set=0, io=0x0 imap(zakj): Warning: buffer_flush=1, flush_pending=0 imap(zakj): Warning: cork changed, set=1, io=0x0 imap(zakj): Warning: func -> command_pending=1 imap(zakj): Warning: _flush_pending -> 1 imap(zakj): Warning: cork changed, set=0, io=0x0 imap(zakj): Warning: buffer_flush=0, flush_pending=1 imap(zakj): Warning: - adding2 imap(zakj): Warning: output handler started, callback=0x0 imap(zakj): Warning: output handler returned 1, flush_pending=0
The command immediately preceding the hang in this case was:
a4964 UID FETCH 2012 BODY.PEEK[]
-Zak
On Tue, 2006-01-24 at 09:44 -0800, Zak Johnson wrote:
Timo Sirainen wrote:
Did imap/client.c get also patched? None of these show _client_output being called. And these were the last lines before it hanged? It shows that it should be calling _client_output next..
I updated http://dovecot.org/patches/hang-debug.diff , could you try again with it?
I have been having the same problems post-0.99: periodic hangs, usually when selecting a large mailbox or fetching a large mail. With 1.0a5 and the above patch, here are the last few lines of logs:
Thanks, I think I got it. It happens only after APPEND command after there have been some changes to mailbox while it was being done. Attached patch should fix it (also in CVS). At least I can't think of any other possible way it could have happened..
Hello, it looks like that one user seems to have the same problem. After some time the IMAP-process enters a poll loop and the client hangs. It is the same with 1.0stable and 1.0alpha4 (on Solaris 9).
Is this already fixed in CVS or should I wait for 1.0alpha5 ?
Martin
Mark Plaksin wrote:
Timo Sirainen tss@iki.fi writes:
On Mon, 2005-10-24 at 12:57 -0400, Mark Plaksin wrote:
My IMAP client hangs intermittently. When it does, Dovecot has prematurely stopped sending a response and entered the poll loop. My client is waiting for more data and Dovecot is waiting for an IMAP command.
If you can easily reproduce this, could you try it with this patch:
http://dovecot.org/patches/hang-debug.diff
And show what the log files says after it hangs?
Running with CVS from 10.27.05 15:09 EDT and this patch, I get folder corruption. I can't easily reproduce it but it happens frequently (4 times in 3 hours this morning).
The patch looks innocuous so I suspect CVS. alpha4 doesn't have the problem. Is there anything in CVS which could have affected this? I'll update to the latest CVS and try some more.
I'm running 'dovecot --exec-mail imap' on Solaris with lock_method=fcntl; the rest of the locking is default.
Of the four problems, the body of the message went missing twice. And twice the body of a message were overwritten by the headers of the next message. There's nothing surprising in the rawlogs or any other logs.
Martin Preen, Universitaet Freiburg, Institut fuer Informatik Georges-Koehler-Allee 52, Raum 00-006, 79110 Freiburg, Germany
phone: ++49 761 203-8250 preen@informatik.uni-freiburg.de fax: ++49 761 203-8242 www.informatik.uni-freiburg.de
participants (4)
-
Mark Plaksin
-
Martin Preen
-
Timo Sirainen
-
Zak Johnson