[Dovecot] Couldn't init INBOX: Can't sync mailbox: Messages keep getting expunged
Upon upgrading from 1.0.2 to 1.1rc4 I see this error for many of our users:
Getting size of message UID=1 failed Couldn't init INBOX: Can't sync mailbox: Messages keep getting expunged
Logging in with IMAP I would see a bunch of messages with no subject or time and blank bodies (usine horde/imp).
Removing dovecot.index.log, dovecot.index.log.2, or dovecot-uidlist fixes the problem for the account (I chose dovecot.index.log). I imagine that forces a rebuild of the indexes. Removing other files in the directory didn't seem to make a difference, including dovecot.index or some arbitrary non-dovecot file.
Is there a more conventional way around this problem?
On Fri, 2008-05-02 at 18:13 -0700, David Jonas wrote:
Upon upgrading from 1.0.2 to 1.1rc4 I see this error for many of our users:
Getting size of message UID=1 failed Couldn't init INBOX: Can't sync mailbox: Messages keep getting expunged
Logging in with IMAP I would see a bunch of messages with no subject or time and blank bodies (usine horde/imp).
Removing dovecot.index.log, dovecot.index.log.2, or dovecot-uidlist fixes the problem for the account (I chose dovecot.index.log). I imagine that forces a rebuild of the indexes. Removing other files in the directory didn't seem to make a difference, including dovecot.index or some arbitrary non-dovecot file.
Is there a more conventional way around this problem?
If you can show me a way to reproduce the problem I can fix it. I've no idea why it could happen constantly.
The main problem anyway is that getting a message's virtual size fails because the message appears to have been expunged. So Dovecot re-synchronizes the mailbox, which should get rid of the expunged messages. But again it fails the same way, so Dovecot gives up and fails with the error message.
Timo Sirainen wrote:
On Fri, 2008-05-02 at 18:13 -0700, David Jonas wrote:
Upon upgrading from 1.0.2 to 1.1rc4 I see this error for many of our users:
Getting size of message UID=1 failed Couldn't init INBOX: Can't sync mailbox: Messages keep getting expunged
Logging in with IMAP I would see a bunch of messages with no subject or time and blank bodies (usine horde/imp).
Removing dovecot.index.log, dovecot.index.log.2, or dovecot-uidlist fixes the problem for the account (I chose dovecot.index.log). I imagine that forces a rebuild of the indexes. Removing other files in the directory didn't seem to make a difference, including dovecot.index or some arbitrary non-dovecot file.
Is there a more conventional way around this problem?
If you can show me a way to reproduce the problem I can fix it. I've no idea why it could happen constantly.
The main problem anyway is that getting a message's virtual size fails because the message appears to have been expunged. So Dovecot re-synchronizes the mailbox, which should get rid of the expunged messages. But again it fails the same way, so Dovecot gives up and fails with the error message.
This is just a me-to, but I just upgraded from 1.0.8 (ish) to latest 1.1 and seeing the same problem with some users. The problem seems to persist continuously, but if the customer logs in via imap (webmail in our case) it apparently clears?
(Well all I can see from here is a bunch of errors in the log files, then some imap logins, then no more errors - I haven't experienced this from the users point of view, nor do I know what the user sees in the webmail view)
: Getting size of message UID=1154 failed : Couldn't init INBOX: Can't sync mailbox: Messages keep getting expunged : Mailbox init failed bytes=0/38, top=0/0, retr=0/0, del=0/0, size=0
Customer only has one email in their inbox right now and likely only had a couple before
I have also had one report from a customer who thought he had 600 messages to download (based on the feedback from OE), but his inbox would only have had a couple of messages in it - not sure that this is related though...
Grateful for any thoughts?
Ed W
Ed W wrote:
I have also had one report from a customer who thought he had 600 messages to download (based on the feedback from OE), but his inbox would only have had a couple of messages in it - not sure that this is related though...
Update: I have now had a fair number of users reporting that they have hundreds of messages in their inbox when accessing the mailbox via POP (and in fact they have hadly any)
I have since dumped all the index files to see if this helps. However, this has occurred after an upgrade from 1.0.8 to 1.1-rc4
Ed W
On Mon, 2008-05-05 at 22:28 +0100, Ed W wrote:
Ed W wrote:
I have also had one report from a customer who thought he had 600 messages to download (based on the feedback from OE), but his inbox would only have had a couple of messages in it - not sure that this is related though...
Update: I have now had a fair number of users reporting that they have hundreds of messages in their inbox when accessing the mailbox via POP (and in fact they have hadly any)
I have since dumped all the index files to see if this helps. However, this has occurred after an upgrade from 1.0.8 to 1.1-rc4
Do these users use also IMAP? I assume this is reproducible for those users? Could you send me their dovecot.index, dovecot.index.log and dovecot-uidlist files and output of "ls cur/ new/"? Also post your dovecot -n output?
I've done several optimizations to make POP3 use less disk I/O, but I can't really think of what could have caused this breakage.
I finally figured out what's happening:
v1.0 writes expunges as plain "expunged" records to dovecot.index.log file. v1.1 treats such records as "expunge requests", while the actual expunging is done by writing another "expunged (ext)" record.
With v1.1 dovecot.index file isn't always required, because its contents can be produced by reading dovecot.index.log file, assuming that is the first generated log file (prev log file sequence=0 in header).
So what happens is that for some reason v1.1 doesn't read dovecot.index file, but reads dovecot.index.log file and ignores all expunge records. Maildir new/ and cur/ mtimes are all the same as in index files, so Dovecot doesn't bother reading the maildir.
I was able to reproduce this by deleting dovecot.index file, but not without deleting it. So if it exists and you still have this problem, I don't really know why it wouldn't have read it..
Anyway, this fixes the problem: http://hg.dovecot.org/dovecot-1.1/rev/b776f2b8d827
Timo Sirainen wrote:
I finally figured out what's happening:
v1.0 writes expunges as plain "expunged" records to dovecot.index.log file. v1.1 treats such records as "expunge requests", while the actual expunging is done by writing another "expunged (ext)" record.
With v1.1 dovecot.index file isn't always required, because its contents can be produced by reading dovecot.index.log file, assuming that is the first generated log file (prev log file sequence=0 in header).
So what happens is that for some reason v1.1 doesn't read dovecot.index file, but reads dovecot.index.log file and ignores all expunge records. Maildir new/ and cur/ mtimes are all the same as in index files, so Dovecot doesn't bother reading the maildir.
I was able to reproduce this by deleting dovecot.index file, but not without deleting it. So if it exists and you still have this problem, I don't really know why it wouldn't have read it..
Anyway, this fixes the problem: http://hg.dovecot.org/dovecot-1.1/rev/b776f2b8d827
Ha! I spent all afternoon making you a beautiful reproduction of the error. And I was nearly finished, too.
Applied the patch to rc5 and everything looks good.
Thanks!
Timo Sirainen wrote:
I finally figured out what's happening:
v1.0 writes expunges as plain "expunged" records to dovecot.index.log file. v1.1 treats such records as "expunge requests", while the actual expunging is done by writing another "expunged (ext)" record.
With v1.1 dovecot.index file isn't always required, because its contents can be produced by reading dovecot.index.log file, assuming that is the first generated log file (prev log file sequence=0 in header).
So what happens is that for some reason v1.1 doesn't read dovecot.index file, but reads dovecot.index.log file and ignores all expunge records. Maildir new/ and cur/ mtimes are all the same as in index files, so Dovecot doesn't bother reading the maildir.
I was able to reproduce this by deleting dovecot.index file, but not without deleting it. So if it exists and you still have this problem, I don't really know why it wouldn't have read it..
Anyway, this fixes the problem: http://hg.dovecot.org/dovecot-1.1/rev/b776f2b8d827
Well, it looked good but pop3 processes started accumulating and pretty soon the load was over 50. No errors were getting logged. strace showed some processes waiting for disk and others being served their mail. IO-wait was bouncing off 100%.
Applied the patch to a different box (exact same config) that had been running rc4 for about three days and it seems okay.
On Mon, 2008-05-05 at 17:46 -0700, David Jonas wrote:
Anyway, this fixes the problem: http://hg.dovecot.org/dovecot-1.1/rev/b776f2b8d827
Well, it looked good but pop3 processes started accumulating and pretty soon the load was over 50. No errors were getting logged. strace showed some processes waiting for disk and others being served their mail. IO-wait was bouncing off 100%.
I don't think that patch has the ability to cause such a large performance problem.
Sounds more like Dovecot is recalculating virtual message sizes for existing files. It should have looked up the values from dovecot.index.cache files though. Or did you delete them?
Timo Sirainen wrote:
On Mon, 2008-05-05 at 17:46 -0700, David Jonas wrote:
Anyway, this fixes the problem: http://hg.dovecot.org/dovecot-1.1/rev/b776f2b8d827 Well, it looked good but pop3 processes started accumulating and pretty soon the load was over 50. No errors were getting logged. strace showed some processes waiting for disk and others being served their mail. IO-wait was bouncing off 100%.
I don't think that patch has the ability to cause such a large performance problem.
Sounds more like Dovecot is recalculating virtual message sizes for existing files. It should have looked up the values from dovecot.index.cache files though. Or did you delete them?
I didn't delete any files. Switching back to 1.0.2 (after some major cujoling) it's now running at a fine clip under load 1.
I'll try it again. Perhaps it was a freak coincidence.
participants (3)
-
David Jonas
-
Ed W
-
Timo Sirainen