[Dovecot] Wrong message information
We have a few users that every so often get empty data in their message overview. From and subject are empty, and the date is 1-Jan-1970. When opening the message in full view, all things are fine. Also, the data in the actual maildir file is fine.
It looks a lot like the issue described here:
http://dovecot.org/list/dovecot/2008-March/029282.html
The server is Linux, ext3, no NFS, and Dovecot 1.1.rc4. The few clients that have had this problem so far have been Outlook Express.
What can be done to debug this?
Thanks, Anders.
On May 8, 2008, at 6:15 PM, Anders wrote:
We have a few users that every so often get empty data in their
message overview. From and subject are empty, and the date is 1-Jan-1970. When opening the message in full view, all things are fine. Also, the
data in the actual maildir file is fine.It looks a lot like the issue described here:
That was fixed in v1.1.
The server is Linux, ext3, no NFS, and Dovecot 1.1.rc4.
Do you use quota?
The few clients that have had this problem so far have been Outlook Express.
I wouldn't be surprised if it was OE's fault.
What can be done to debug this?
Enable rawlogs for the users experiencing it. Then when they see the
problem for a message, check if there's anything in rawlogs that shows
wrong replies. http://wiki.dovecot.org/Debugging/Rawlog
Timo Sirainen wrote:
On May 8, 2008, at 6:15 PM, Anders wrote:
We have a few users that every so often get empty data in their message overview. From and subject are empty, and the date is 1-Jan-1970. When opening the message in full view, all things are fine. Also, the data in the actual maildir file is fine.
It looks a lot like the issue described here:
That was fixed in v1.1.
Seems not ;-)
The server is Linux, ext3, no NFS, and Dovecot 1.1.rc4.
Do you use quota?
We have quota plugin enabled, but these users have no quota set.
The few clients that have had this problem so far have been Outlook Express.
I wouldn't be surprised if it was OE's fault.
We upgraded from 0.99 recently, and the problem has just appeared now. I guess Dovecot is involved somehow, or at least able to work around OE.
No quota plugin before, though.
What can be done to debug this?
Enable rawlogs for the users experiencing it. Then when they see the problem for a message, check if there's anything in rawlogs that shows wrong replies. http://wiki.dovecot.org/Debugging/Rawlog
I will enable that, and probably post the logs when it happens again. Thanks!
Cheers, Anders.
On Fri, 9 May 2008, Anders wrote:
Timo Sirainen wrote:
On May 8, 2008, at 6:15 PM, Anders wrote:
We have a few users that every so often get empty data in their message overview. From and subject are empty, and the date is 1-Jan-1970. When opening the message in full view, all things are fine. Also, the data in the actual maildir file is fine.
It looks a lot like the issue described here:
That was fixed in v1.1.
Seems not ;-)
I confirm it not being fixed.
The server is Linux, ext3, no NFS, and Dovecot 1.1.rc4.
Do you use quota?
I run Linux on amd64, no NFS, ext3, Dovecot built from the nightly .deb repository posted earlier.
We have quota plugin enabled, but these users have no quota set.
I don't have it enabled.
The few clients that have had this problem so far have been Outlook Express.
I wouldn't be surprised if it was OE's fault.
I'm using alpine and getting the same issue. I would, I guess, suggest adding an assert in the Dovecot that the date isn't epoch start.
What can be done to debug this?
Enable rawlogs for the users experiencing it. Then when they see the problem for a message, check if there's anything in rawlogs that shows wrong replies. http://wiki.dovecot.org/Debugging/Rawlog
I guess I'll do that myself, too.
I will enable that, and probably post the logs when it happens again. Thanks!
Same here.
-- Asheesh.
-- "Give me enough medals, and I'll win any war." -- Napoleon
On May 9, 2008, at 10:42 AM, Asheesh Laroia wrote:
That was fixed in v1.1.
Seems not ;-)
I confirm it not being fixed.
Can you reproduce it using the information in http://dovecot.org/list/dovecot/2008-March/029402.html ? Could you also post your dovecot -n output?
Timo Sirainen wrote:
On May 9, 2008, at 10:42 AM, Asheesh Laroia wrote:
That was fixed in v1.1.
Seems not ;-)
I confirm it not being fixed.
Can you reproduce it using the information in http://dovecot.org/list/dovecot/2008-March/029402.html ? Could you also post your dovecot -n output?
Yes, this reproduces it. I had to change the delay down from 1s, though:
[...] 2008-05-09 18:11:01.101262 No messages in INBOX. Waiting... 2008-05-09 18:11:01.111258 No messages in INBOX. Waiting... 2008-05-09 18:11:01.121290 No messages in INBOX. Waiting... 2008-05-09 18:11:01.131291 No messages in INBOX. Waiting... 2008-05-09 18:11:01.141324 **** ERROR: Got invalid RFC822.SIZE 0: 1 (UID 1726 FLAGS () INTERNALDATE "01-Jan-1970 00:00:00 +0000" RFC822.SIZE 0) 2008-05-09 18:11:02.141339 **** ERROR: Got invalid RFC822.SIZE 0: 1 (UID 1726 FLAGS () INTERNALDATE "01-Jan-1970 00:00:00 +0000" RFC822.SIZE 0) 2008-05-09 18:11:03.141281 Got valid RFC822.SIZE 3132: 1 (UID 1726 FLAGS () INTERNALDATE "01-Jan-1970 00:00:00 +0000" RFC822.SIZE 3132)
The dovecot -n is in here: http://dovecot.org/list/dovecot/2008-May/030470.html
Thanks, Anders.
On Fri, 2008-05-09 at 18:12 +0200, Anders wrote:
Timo Sirainen wrote:
On May 9, 2008, at 10:42 AM, Asheesh Laroia wrote:
That was fixed in v1.1.
Seems not ;-)
I confirm it not being fixed.
Can you reproduce it using the information in http://dovecot.org/list/dovecot/2008-March/029402.html ? Could you also post your dovecot -n output?
Yes, this reproduces it. I had to change the delay down from 1s, though:
I can't reproduce it myself. Does it still break if you disable all plugins? And I guess you're using Dovecot's deliver to save mails?
Timo Sirainen wrote:
On Fri, 2008-05-09 at 18:12 +0200, Anders wrote:
Timo Sirainen wrote:
On May 9, 2008, at 10:42 AM, Asheesh Laroia wrote:
> http://dovecot.org/list/dovecot/2008-March/029282.html That was fixed in v1.1. Seems not ;-) I confirm it not being fixed. Can you reproduce it using the information in http://dovecot.org/list/dovecot/2008-March/029402.html ? Could you also post your dovecot -n output?
Yes, this reproduces it. I had to change the delay down from 1s, though:
I can't reproduce it myself. Does it still break if you disable all plugins? And I guess you're using Dovecot's deliver to save mails?
We have a combination of Dovecot's deliver and some old procmail recipies. I now see that the problem is only present with procmail delivery, *not* when using Dovecot's deliver.
Disabling plugins does not seem to change things. I reloaded config with SIGHUP, is that sufficient? I would rather avoid restarting the server completely, but can do it if it is needed.
Thanks, Anders.
On Sat, 2008-05-10 at 14:24 +0200, Anders wrote:
Timo Sirainen wrote:
On Fri, 2008-05-09 at 18:12 +0200, Anders wrote:
Timo Sirainen wrote:
On May 9, 2008, at 10:42 AM, Asheesh Laroia wrote:
>> http://dovecot.org/list/dovecot/2008-March/029282.html > That was fixed in v1.1. Seems not ;-) I confirm it not being fixed. Can you reproduce it using the information in http://dovecot.org/list/dovecot/2008-March/029402.html ? Could you also post your dovecot -n output?
Yes, this reproduces it. I had to change the delay down from 1s, though:
I can't reproduce it myself. Does it still break if you disable all plugins? And I guess you're using Dovecot's deliver to save mails?
We have a combination of Dovecot's deliver and some old procmail recipies. I now see that the problem is only present with procmail delivery, *not* when using Dovecot's deliver.
Does procmail deliver the mails correctly, i.e. first writes to tmp/ directory and then links (or renames) to new/ directory? (You're really using maildir format, not MH format where filenames begin with "msg."?)
Disabling plugins does not seem to change things. I reloaded config with SIGHUP, is that sufficient?
Yes.
Timo Sirainen wrote:
Does procmail deliver the mails correctly, i.e. first writes to tmp/ directory and then links (or renames) to new/ directory? (You're really using maildir format, not MH format where filenames begin with "msg."?)
I have confirmed from inotifywait output that the file is correctly created in tmp/ and linked into new/. No MH format.
Something that may be helpful has shown up: the problem is only visible when I already have the mailbox open in Thunderbird while running the test script. Probably, you are using a test account that you do not have open?
Cheers, Anders.
On Sun, 2008-05-11 at 01:57 +0200, Anders wrote:
Timo Sirainen wrote:
Does procmail deliver the mails correctly, i.e. first writes to tmp/ directory and then links (or renames) to new/ directory? (You're really using maildir format, not MH format where filenames begin with "msg."?)
I have confirmed from inotifywait output that the file is correctly created in tmp/ and linked into new/. No MH format.
Something that may be helpful has shown up: the problem is only visible when I already have the mailbox open in Thunderbird while running the test script. Probably, you are using a test account that you do not have open?
Right, that helped me to reproduce it. Fixed: http://hg.dovecot.org/dovecot-1.1/rev/cb1c6c942768
Timo Sirainen wrote:
Right, that helped me to reproduce it. Fixed: http://hg.dovecot.org/dovecot-1.1/rev/cb1c6c942768
Hi again.
That fixed my testcase as well, but now I just got another report of a mail with an epoch date.
It turns out that I can still get in trouble if running the send_fetch_test script concurrently, with eventual output like this:
2008-05-23 10:54:27.123649 Got valid RFC822.SIZE 1104: 1 (UID 1255 FLAGS () INTERNALDATE "01-Jan-1970 00:00:00 +0000" RFC822.SIZE 1104)
It does not fail every time, but quite often. I guess it depends on the exact timing of me starting the second instance.
It still only fails with procmail delivery, not Dovecot's deliver.
Regards, Anders.
On Fri, 2008-05-23 at 11:12 +0200, Anders wrote:
Timo Sirainen wrote:
Right, that helped me to reproduce it. Fixed: http://hg.dovecot.org/dovecot-1.1/rev/cb1c6c942768
Hi again.
That fixed my testcase as well, but now I just got another report of a mail with an epoch date.
It turns out that I can still get in trouble if running the send_fetch_test script concurrently, with eventual output like this:
2008-05-23 10:54:27.123649 Got valid RFC822.SIZE 1104: 1 (UID 1255 FLAGS () INTERNALDATE "01-Jan-1970 00:00:00 +0000" RFC822.SIZE 1104)
Right.. This should help: http://hg.dovecot.org/dovecot-1.1/rev/d920ec986c34
Timo Sirainen wrote:
On Fri, 2008-05-23 at 11:12 +0200, Anders wrote:
Timo Sirainen wrote:
Right, that helped me to reproduce it. Fixed: http://hg.dovecot.org/dovecot-1.1/rev/cb1c6c942768 Hi again.
That fixed my testcase as well, but now I just got another report of a mail with an epoch date.
It turns out that I can still get in trouble if running the send_fetch_test script concurrently, with eventual output like this:
2008-05-23 10:54:27.123649 Got valid RFC822.SIZE 1104: 1 (UID 1255 FLAGS () INTERNALDATE "01-Jan-1970 00:00:00 +0000" RFC822.SIZE 1104)
Right.. This should help: http://hg.dovecot.org/dovecot-1.1/rev/d920ec986c34
That does not seem to change anything. I have been cherry picking patches since 1.1.rc4, can you think of other recent changes that I need? I am waiting for rc6 for a complete upgrade.
Note that the size is actually correct now, most of the time, so it is not exactly the same issue as initially.
Also, I set up the test account wrongly, and then actually had this issue with Dovecot "deliver" as well. It triggers most easily by removing the Maildir completely, and then running "send_fetch_test.py" twice at the same time.
Regards, Anders.
On Sun, 2008-05-25 at 13:02 +0200, Anders wrote:
Timo Sirainen wrote:
On Fri, 2008-05-23 at 11:12 +0200, Anders wrote:
Timo Sirainen wrote:
Right, that helped me to reproduce it. Fixed: http://hg.dovecot.org/dovecot-1.1/rev/cb1c6c942768 Hi again.
That fixed my testcase as well, but now I just got another report of a mail with an epoch date.
It turns out that I can still get in trouble if running the send_fetch_test script concurrently, with eventual output like this:
2008-05-23 10:54:27.123649 Got valid RFC822.SIZE 1104: 1 (UID 1255 FLAGS () INTERNALDATE "01-Jan-1970 00:00:00 +0000" RFC822.SIZE 1104)
Right.. This should help: http://hg.dovecot.org/dovecot-1.1/rev/d920ec986c34
That does not seem to change anything. I have been cherry picking patches since 1.1.rc4, can you think of other recent changes that I need? I am waiting for rc6 for a complete upgrade.
Nothing else should be needed.
Note that the size is actually correct now, most of the time, so it is not exactly the same issue as initially.
I could yesterday also reproduce the broken INTERNALDATE pretty easily, but I can't anymore after that patch..
Timo Sirainen tss@iki.fi writes:
I could yesterday also reproduce the broken INTERNALDATE pretty easily, but I can't anymore after that patch..
I can now reproduce one problem quite easily, with:
$ A=6 ; ./send_fetch_test.py user@domain.invalid pw file& sleep $A; ./send_fetch_test.py user@domain.invalid pw file
The A sleep value is the runtime of the script. If I have the mailbox open and in IDLE, 2 seems to work out. So the trick is to have the second instance expunge the mailbox just as the first instance is about to receive its message.
That gives me output like this (with the two processes mixed):
['./send_fetch_test.py', 'user@domain.invalid', 'pw', 'file'] 2008-05-26 00:06:13.190925 Sent 30 bytes to user@domain.invalid 2008-05-26 00:06:13.191258 No messages in INBOX. Waiting... 2008-05-26 00:06:14.191755 No messages in INBOX. Waiting... 2008-05-26 00:06:15.192545 No messages in INBOX. Waiting... 2008-05-26 00:06:16.193528 No messages in INBOX. Waiting... 2008-05-26 00:06:17.194486 No messages in INBOX. Waiting... 2008-05-26 00:06:18.198344 No messages in INBOX. Waiting... ['./send_fetch_test.py', 'user@domain.invalid', 'pw', 'file'] 2008-05-26 00:06:19.188813 Sent 30 bytes to user@domain.invalid 2008-05-26 00:06:19.189098 No messages in INBOX. Waiting... 2008-05-26 00:06:19.198841 **** ERROR: Got invalid RFC822.SIZE 0: 1 (UID 1368 FLAGS (\Recent) INTERNALDATE "01-Jan-1970 00:00:00 +0000" RFC822.SIZE 0) 2008-05-26 00:06:20.189559 No messages in INBOX. Waiting... 2008-05-26 00:06:20.199554 **** ERROR: Got invalid RFC822.SIZE 0: 1 (UID 1368 FLAGS (\Deleted \Recent) INTERNALDATE "01-Jan-1970 00:00:00 +0000" RFC822.SIZE 0) 2008-05-26 00:06:21.190381 No messages in INBOX. Waiting... 2008-05-26 00:06:21.200448 **** ERROR: Got invalid RFC822.SIZE 0: 1 (UID 1368 FLAGS (\Deleted \Recent) INTERNALDATE "01-Jan-1970 00:00:00 +0000" RFC822.SIZE 0) 2008-05-26 00:06:22.191418 No messages in INBOX. Waiting... 2008-05-26 00:06:22.201470 **** ERROR: Got invalid RFC822.SIZE 0: 1 (UID 1368 FLAGS (\Deleted \Recent) INTERNALDATE "01-Jan-1970 00:00:00 +0000" RFC822.SIZE 0) 2008-05-26 00:06:23.192353 No messages in INBOX. Waiting... 2008-05-26 00:06:23.207362 **** ERROR: Got invalid RFC822.SIZE 0: 1 (UID 1368 FLAGS (\Deleted \Recent) INTERNALDATE "01-Jan-1970 00:00:00 +0000" RFC822.SIZE 0) 2008-05-26 00:06:24.193537 No messages in INBOX. Waiting... 2008-05-26 00:06:24.208543 Got valid RFC822.SIZE 1104: 2 (UID 1369 FLAGS (\Recent) INTERNALDATE "26-May-2008 00:06:19 +0200" RFC822.SIZE 1104) 2008-05-26 00:06:25.194555 Got valid RFC822.SIZE 1104: 1 (UID 1369 FLAGS () INTERNALDATE "26-May-2008 00:06:19 +0200" RFC822.SIZE 1104)
I can only get this with Procmail delivery, not Dovecot "deliver".
Anders.
On Mon, 2008-05-26 at 00:18 +0200, Anders wrote:
Timo Sirainen tss@iki.fi writes:
I could yesterday also reproduce the broken INTERNALDATE pretty easily, but I can't anymore after that patch..
I can now reproduce one problem quite easily, with:
$ A=6 ; ./send_fetch_test.py user@domain.invalid pw file& sleep $A; ./send_fetch_test.py user@domain.invalid pw file
The A sleep value is the runtime of the script. If I have the mailbox open and in IDLE, 2 seems to work out. So the trick is to have the second instance expunge the mailbox just as the first instance is about to receive its message.
Oh, that's expected behavior then. If you're trying to FETCH a message that has already been expunged and Dovecot doesn't have the necessary information already in its cache file, there are only a few bad choices Dovecot can do:
a) lie about the message's contents by using "unknown" values (this is what you're seeing now)
b) Disconnect the client
c) Don't return any information and reply NO to the FETCH command. This can cause problems with some clients.
If you're using Dovecot's deliver and it already knows that INTERNALDATE and RFC822.SIZE are wanted to be cached (i.e. you didn't delete any dovecot* files between the send_fetch_test.py), deliver should have written those to the cache file and IMAP *usually* would be able to look up the values from there even after the message is expunged.
Timo Sirainen wrote:
On Mon, 2008-05-26 at 00:18 +0200, Anders wrote:
Timo Sirainen tss@iki.fi writes:
I could yesterday also reproduce the broken INTERNALDATE pretty easily, but I can't anymore after that patch..
I can now reproduce one problem quite easily, with:
[...]
Oh, that's expected behavior then. If you're trying to FETCH a message that has already been expunged and Dovecot doesn't have the necessary information already in its cache file, there are only a few bad choices Dovecot can do:
Thanks for the explanation. That also covers why "deliver" would fail if I cleared out the Maildir. I had a hunch that this was expected, so I carefully called it a problem rather than an error :-).
Hopefully it is all fixed, then.
Cheers, Anders.
Anders wrote:
Well, I got another report about a 1970 timestamp :-(.
Now I have enabled rawlog for all users that are still on Procmail delivery. Will this be useful? Is there something else that I can do?
Hi. I now caught this issue once with rawlog. I hope Squirrelmail does not mangle it too badly:
============ In
mkl5 IDLE DONE g6uu UID FETCH 18660:* (UID FLAGS RFC822.SIZE BODY.PEEK[HEADER] INTERNALDATE) ppj9 IDLE DONE
============ Out
- idling
- 15 EXISTS mkl5 OK Idle completed.
- 15 FETCH (UID 18660 FLAGS () RFC822.SIZE 0 INTERNALDATE "01-Jan-1970 00:00:00 +0000" BODY[HEADER] NIL) g6uu OK Fetch completed.
- idling
- OK Still here ppj9 OK Idle completed.
============
Doing the same FETCH by hand now gives the right information.
It seems that whenever this problem shows up, two messages are delivered at roughly the same time, and it is the later one that has its information zeroed out. Here is stat(1) output from the message above, and the one that arrived right before it:
File: `1212044868.8445_0.mail:2,' Size: 6331 Blocks: 16 IO Block: 4096 regular file Device: fd00h/64768d Inode: 124977206 Links: 1 Access: (0600/-rw-------) Uid: ( 5000/ vmail) Gid: ( 5000/ vmail) Access: 2008-05-29 09:42:59.000000000 +0200 Modify: 2008-05-29 09:07:48.000000000 +0200 Change: 2008-05-29 09:22:30.000000000 +0200
File: `1212044877.8482_0.mail:2,S' Size: 6242 Blocks: 16 IO Block: 4096 regular file Device: fd00h/64768d Inode: 124977209 Links: 1 Access: (0600/-rw-------) Uid: ( 5000/ vmail) Gid: ( 5000/ vmail) Access: 2008-05-29 09:42:59.000000000 +0200 Modify: 2008-05-29 09:07:57.000000000 +0200 Change: 2008-05-29 09:23:53.000000000 +0200
Nine seconds is well above the time spent delivering a message on this server.
Cheers, Anders.
Anders wrote:
Anders wrote:
Well, I got another report about a 1970 timestamp :-(.
Now I have enabled rawlog for all users that are still on Procmail delivery. Will this be useful? Is there something else that I can do?
[...]
It seems that whenever this problem shows up, two messages are delivered at roughly the same time, and it is the later one that has its information zeroed out. Here is stat(1) output from the message above, and the one that arrived right before it:
I am sorry for the horse that has to take this continuous beating. It is unfortunately not dead yet.
Now I got it one time where there was no concurrent delivery, the previous one was ten minutes before. The user was still logged in, though, and apparently active with the mailbox.
I have not been able to reproduce this at will, but do see 1-Jan-1970 in rawlog about two times every day now (with some 80 accounts and probably thousands of deliveries).
Cheers, Anders.
On Tue, 2008-06-03 at 17:28 +0200, Anders wrote:
I have not been able to reproduce this at will, but do see 1-Jan-1970 in rawlog about two times every day now (with some 80 accounts and probably thousands of deliveries).
Could you try with the attached debug patch? What does it log when it happens?
Timo Sirainen wrote:
On Tue, 2008-06-03 at 17:28 +0200, Anders wrote:
I have not been able to reproduce this at will, but do see 1-Jan-1970 in rawlog about two times every day now (with some 80 accounts and probably thousands of deliveries).
Could you try with the attached debug patch? What does it log when it happens?
Hi Timo, thanks for pursuing this issue.
The logfile is quite busy, let me know if I missed something when cutting out the relevant entries (attached).
As far as I can tell, there was no other activity on the mailbox when this happend.
BTW, the procmail recipe is delivering each message to two folders. I think that this should not make a difference, as each Maildir is separate.
Thanks again, Anders.
On Fri, 2008-06-06 at 09:45 +0200, Anders wrote:
Timo Sirainen wrote:
On Tue, 2008-06-03 at 17:28 +0200, Anders wrote:
I have not been able to reproduce this at will, but do see 1-Jan-1970 in rawlog about two times every day now (with some 80 accounts and probably thousands of deliveries).
Could you try with the attached debug patch? What does it log when it happens?
Hi Timo, thanks for pursuing this issue.
The logfile is quite busy, let me know if I missed something when cutting out the relevant entries (attached).
As far as I can tell, there was no other activity on the mailbox when this happend.
BTW, the procmail recipe is delivering each message to two folders. I think that this should not make a difference, as each Maildir is separate.
I think this will finally fix it: http://hg.dovecot.org/dovecot-1.1/rev/5f481022db04
Wonder why it never happened in my stress tests.
Timo Sirainen wrote:
I think this will finally fix it: http://hg.dovecot.org/dovecot-1.1/rev/5f481022db04
This has finally gotten rid of the problem: no more 1970 in rawlog after three days of logging. Thanks!
Anders.
participants (3)
-
Anders
-
Asheesh Laroia
-
Timo Sirainen