[Dovecot] Still weird UID ordering issues with maildir
Nightly 20050829 (includes everything currently in nightly/ChangeLog):
Sep 1 09:25:02 server dovecot: imap-login: Login: user=<tv>, method=PLAIN, rip=192.168.1.1, lip=192.168.1.3, TLS Sep 1 09:25:02 server dovecot: IMAP(tv): Maildir /home/tv/.maildir sync: UID < next_uid (1187 < 1188, file = 1125581033.7788_2.server.duh.org:2,)
repeated every time I try to login. I had to nuke the dovecot indexes and let it reindex (which throws off the ordering of some messages). It had been working without incident since updating to 20050829, until now.
-- -- Todd Vierling <tv@duh.org> <tv@pobox.com> <todd@vierling.name>
On Thu, 1 Sep 2005, Todd Vierling wrote:
Nightly 20050829 (includes everything currently in nightly/ChangeLog):
Sep 1 09:25:02 server dovecot: imap-login: Login: user=<tv>, method=PLAIN, rip=192.168.1.1, lip=192.168.1.3, TLS Sep 1 09:25:02 server dovecot: IMAP(tv): Maildir /home/tv/.maildir sync: UID < next_uid (1187 < 1188, file = 1125581033.7788_2.server.duh.org:2,)
repeated every time I try to login. I had to nuke the dovecot indexes and let it reindex (which throws off the ordering of some messages). It had been working without incident since updating to 20050829, until now.
I spoke too soon. Right after reindexing:
Sep 1 09:30:44 server dovecot: imap-login: Login: user=<tv>, method=PLAIN, rip=192.168.1.1, lip=192.168.1.3, TLS Sep 1 09:31:01 server dovecot: IMAP(tv): Maildir /home/tv/.maildir sync: UID < next_uid (654 < 655, file = 1125581406.14218_2.server.duh.org:2,)
And I had to nuke the indexes again.
-- -- Todd Vierling <tv@duh.org> <tv@pobox.com> <todd@vierling.name>
On Thu, 2005-09-01 at 09:30 -0400, Todd Vierling wrote:
Nightly 20050829 (includes everything currently in nightly/ChangeLog):
Sep 1 09:25:02 server dovecot: imap-login: Login: user=<tv>, method=PLAIN, rip=192.168.1.1, lip=192.168.1.3, TLS Sep 1 09:25:02 server dovecot: IMAP(tv): Maildir /home/tv/.maildir sync: UID < next_uid (1187 < 1188, file = 1125581033.7788_2.server.duh.org:2,)
repeated every time I try to login. I had to nuke the dovecot indexes and let it reindex (which throws off the ordering of some messages). It had been working without incident since updating to 20050829, until now.
Actually deleting indexes shouldn't matter at all, since Dovecot does that by itself after writing that error message.
But you're still using it with NFS? Probably somehow related to that.. The error message means that Dovecot saw a message 1187 that wasn't in index, but index said it already had seen message 1188. Probably means that the file was temporarily lost in some sync, but later seen again in another sync.. Dovecot isn't very forgiving if readdir() skips files.
Are there other maildir clients than Dovecot updating the maildir? That can cause those problems.
Hi,
On 11/09/2005 11:12 p.m., Timo Sirainen wrote:
On Thu, 2005-09-01 at 09:30 -0400, Todd Vierling wrote:
Nightly 20050829 (includes everything currently in nightly/ChangeLog):
Sep 1 09:25:02 server dovecot: imap-login: Login: user=<tv>, method=PLAIN, rip=192.168.1.1, lip=192.168.1.3, TLS Sep 1 09:25:02 server dovecot: IMAP(tv): Maildir /home/tv/.maildir sync: UID < next_uid (1187 < 1188, file = 1125581033.7788_2.server.duh.org:2,)
repeated every time I try to login. I had to nuke the dovecot indexes and let it reindex (which throws off the ordering of some messages). It had been working without incident since updating to 20050829, until now.
Actually deleting indexes shouldn't matter at all, since Dovecot does that by itself after writing that error message.
But you're still using it with NFS? Probably somehow related to that.. The error message means that Dovecot saw a message 1187 that wasn't in index, but index said it already had seen message 1188. Probably means that the file was temporarily lost in some sync, but later seen again in another sync.. Dovecot isn't very forgiving if readdir() skips files.
Are there other maildir clients than Dovecot updating the maildir? That can cause those problems.
I'm seeing this problem occasionally too with -CVS, even on Maildirs where there is only one client reading the mail. Although in this case it was Outlook Express, so it may have had more than one concurrent connection to the server:
dovecot: Sep 10 19:01:35 Error: imap(archive): Maildir /home/archive/Maildir sync: UID inserted in the middle of mailbox (3598 > 3597, file = 1125940298.P27273Q0M937794.tornado.reub.net:2,)
The UID numbers always seem to be 1 out, like the one above. I'm not using NFS, but am using the CVS version of dovecot-lda for local delivery.
As I had to rebuild my server a week ago, I don't have logs going back any further. However, in the week since I've only had two instances of the error above. Before then, the last instance was with Thunderbird, and from memory I also had to wipe the index to get things going.
reuben
On Sun, 11 Sep 2005, Timo Sirainen wrote:
Sep 1 09:25:02 server dovecot: imap-login: Login: user=<tv>, method=PLAIN, rip=192.168.1.1, lip=192.168.1.3, TLS Sep 1 09:25:02 server dovecot: IMAP(tv): Maildir /home/tv/.maildir sync: UID < next_uid (1187 < 1188, file = 1125581033.7788_2.server.duh.org:2,)
repeated every time I try to login. I had to nuke the dovecot indexes and let it reindex (which throws off the ordering of some messages). It had been working without incident since updating to 20050829, until now.
Actually deleting indexes shouldn't matter at all, since Dovecot does that by itself after writing that error message.
No it doesn't, at least not in nightly 20050829. The imap process exits and I have to delete them manually; they're still sitting around. An artifact of this is the fact that if I don't delete the indexes manually before reopening the mailbox, the error happens again with exactly the same UIDs:
Sep 6 15:49:01 server dovecot: IMAP(tv): Maildir /home/tv/.maildir/.groups.netbsd sync: UID < next_uid (556 < 557, file = 1126036138.8618_1.server.duh.org:2,) Sep 6 15:49:09 server dovecot: IMAP(tv): Maildir /home/tv/.maildir/.groups.netbsd sync: UID < next_uid (556 < 557, file = 1126036138.8618_1.server.duh.org:2,) Sep 6 16:43:32 server dovecot: IMAP(tv): Maildir /home/tv/.maildir/.groups.netbsd sync: UID < next_uid (556 < 557, file = 1126036138.8618_1.server.duh.org:2,)
But you're still using it with NFS?
No. It's ffs on NetBSD.
The error message means that Dovecot saw a message 1187 that wasn't in index, but index said it already had seen message 1188. Probably means that the file was temporarily lost in some sync, but later seen again in another sync.. Dovecot isn't very forgiving if readdir() skips files.
I don't see how skipping files like that should happen -- HOWEVER, you do realize that readdir(3) has no guarantee of ordering, right? (Meaning that two successive calls to readdir(3) can, per POSIX, return the same list of files in a completely different order.)
If you are relying on readdir(3) ordering, then this could also explain an oddness I have seen and dismissed as just a quirk of dovecot: After deleting indexes, messages sorted by "arrival" order seem to be partially reordered in the mailbox. That could be an artifact of ffs "soft updates" functionality, which can reorder inode pointers in a directory file in order to provide its metadata consistency guarantees.
If you want to guarantee ordering, the index must be responsible for that, or else you need to readdir(3) the whole thing, stat the files, and qsort them against st_ctime/st_ctimespec. Any other use of readdir(3) *must not* rely on its ordering.
Are there other maildir clients than Dovecot updating the maildir? That can cause those problems.
The only other mailbox touching process is procmail, which is putting mail into the folders. However, this happens even when procmail is not *actively* putting mail into the folder (it could be that mail had been added between the last time the mailbox was closed, and when it was now reopened).
But isn't collision-free access supposed to be a feature of using maildir -- even if I were using nfs? :)
I've had a couple days now with no problems, and then it happened again this morning:
Sep 9 11:25:16 server dovecot: IMAP(tv): Maildir /home/tv/.maildir/.groups.netbsd sync: UID < next_uid (633 < 634, file = 1126279502.23902_1.server.duh.org:2,) Sep 11 10:46:22 server dovecot: IMAP(tv): Maildir /home/tv/.maildir sync: UID < next_uid (1579 < 1580, file = 1126449940.5309_2.server.duh.org:2,)
-- -- Todd Vierling <tv@duh.org> <tv@pobox.com> <todd@vierling.name>
On Sun, 11 Sep 2005, Todd Vierling wrote:
I don't see how skipping files like that should happen -- HOWEVER, you do realize that readdir(3) has no guarantee of ordering, right? (Meaning that two successive calls to readdir(3) can, per POSIX, return the same list of files in a completely different order.)
Actually that comment should have read:
Meaning that two successive passes over the same directory with readdir(3) can, per POSIX, return the same list of files in a completely different order.
This means that once you do a rewinddir(3) or closedir(3), all bets are off as to what kind of ordering you will get on the next pass through the directory. Oftentimes rewinddir(3) followed by readdir(3) will still be consistent, but on OSs that do so, you may not see new files either; the dir handle may be a snapshot of data at the time opendir(3) happened instead.
-- -- Todd Vierling <tv@duh.org> <tv@pobox.com> <todd@vierling.name>
On 11.9.2005, at 18:22, Todd Vierling wrote:
repeated every time I try to login. I had to nuke the dovecot indexes and let it reindex (which throws off the ordering of some messages). It had been working without incident since updating to 20050829, until now.
Actually deleting indexes shouldn't matter at all, since Dovecot does that by itself after writing that error message.
No it doesn't, at least not in nightly 20050829. The imap process exits and I have to delete them manually; they're still sitting around. An artifact of this is the fact that if I don't delete the indexes manually before reopening the mailbox, the error happens again with exactly the same UIDs:
Hmm. Well, it doesn't exactly delete them but it marks them corrupted, which should make it rebuild them when opening the next time. Guess I'll have to look at that too.
The error message means that Dovecot saw a message 1187 that wasn't in index, but index said it already had seen message 1188. Probably means that the file was temporarily lost in some sync, but later seen again in another sync.. Dovecot isn't very forgiving if readdir() skips files.
I don't see how skipping files like that should happen -- HOWEVER, you do realize that readdir(3) has no guarantee of ordering, right? (Meaning that two successive calls to readdir(3) can, per POSIX, return the same list of files in a completely different order.)
Yes, the order isn't relied on. The filenames are all read into a hash table which is used.
Are there other maildir clients than Dovecot updating the maildir? That can cause those problems.
The only other mailbox touching process is procmail, which is putting mail into the folders. However, this happens even when procmail is not *actively* putting mail into the folder (it could be that mail had been added between the last time the mailbox was closed, and when it was now reopened).
OK. That shouldn't matter since it's touching only new/ dir.
But isn't collision-free access supposed to be a feature of using maildir -- even if I were using nfs? :)
That's the theory, but in practise if files are being deleted or renamed (I'm not sure about adding) inside a directory while another process is reading its contents, some files might get skipped. That's why Dovecot keeps dovecot-uidlist file locked while it's reading maildir, so the skipping shouldn't happen if only Dovecot is accessing the maildir.
But if there aren't other maildir clients accessing the maildir, do you then use some IMAP client that opens multiple connections to same mailbox? Just wondering if there's a way I could reproduce this problem..
On Sun, 11 Sep 2005, Timo Sirainen wrote:
The error message means that Dovecot saw a message 1187 that wasn't in index, but index said it already had seen message 1188. Probably means that the file was temporarily lost in some sync, but later seen again in another sync.. Dovecot isn't very forgiving if readdir() skips files.
Reading your statement above again, something sounds odd here. If UID 1187 is not in index, how does dovecot even know that the message file in question is UID 1187? (This sounds very much like an ordering problem of some sort.)
I don't see how skipping files like that should happen -- HOWEVER, you do realize that readdir(3) has no guarantee of ordering, right? (Meaning that two successive calls to readdir(3) can, per POSIX, return the same list of files in a completely different order.)
Yes, the order isn't relied on. The filenames are all read into a hash table which is used.
Then by "skips files" you mean that a file is disappearing and reappearing? I don't know how that could be, as it would make many other programs on the system go insane....
The only other mailbox touching process is procmail,
OK. That shouldn't matter since it's touching only new/ dir.
Which is what I thought, yes. No other apps on my system know about the maildir format at all.
But if there aren't other maildir clients accessing the maildir, do you then use some IMAP client that opens multiple connections to same mailbox?
I use Pine and PC-Pine, which only opens one connection per mailbox at a time. Unfortunately I can't figure out the conditions that cause this problem, so it's not reliably reproduceable.
-- -- Todd Vierling <tv@duh.org> <tv@pobox.com> <todd@vierling.name>
On Mon, 2005-09-12 at 09:52 -0400, Todd Vierling wrote:
On Sun, 11 Sep 2005, Timo Sirainen wrote:
The error message means that Dovecot saw a message 1187 that wasn't in index, but index said it already had seen message 1188. Probably means that the file was temporarily lost in some sync, but later seen again in another sync.. Dovecot isn't very forgiving if readdir() skips files.
Reading your statement above again, something sounds odd here. If UID 1187 is not in index, how does dovecot even know that the message file in question is UID 1187? (This sounds very much like an ordering problem of some sort.)
There's also dovecot-uidlist file which keeps the filenames there for a bit longer.
Yes, the order isn't relied on. The filenames are all read into a hash table which is used.
Then by "skips files" you mean that a file is disappearing and reappearing? I don't know how that could be, as it would make many other programs on the system go insane....
If you're getting a list of files in a large directory using readdir() but at the same time there are rename()s and/or unlink()s happening in the same directory, the readdir() can skip over some files.
It just hit me again, and now I realize that there is a common behavior pattern happening. In every case that this has happened, I have been deleting large numbers (>10) of messages from a mailbox. IIRC, in Maildir flags are expressed as filename additions, so a rename(2) is happening.
Thanks, that helped me enough to reproduce it. Fixed in CVS now.
Problem was that sometimes Dovecot updated new next_uid to index file even while it hadn't actually added the new mails to index file (because it was doing a "partial" syncing, which is done when maildir file is lost and we're trying to find it again).
On Sun, 11 Sep 2005, Timo Sirainen wrote:
Just wondering if there's a way I could reproduce this problem..
It just hit me again, and now I realize that there is a common behavior pattern happening. In every case that this has happened, I have been deleting large numbers (>10) of messages from a mailbox. IIRC, in Maildir flags are expressed as filename additions, so a rename(2) is happening.
Perhaps dovecot is doing the rename, but not updating something in the index to match? Or, this could be a race condition related to the rename(2)....
-- -- Todd Vierling <tv@duh.org> <tv@pobox.com> <todd@vierling.name>
Hi, While I am using mbox instead of maildir, I've noticed similar weirdness. I'm using 1.0alpha2 on Solaris 9 (since last night). I connect to imap with Pine 4.63. My mailbox is NFS mounted to the IMAP server from our mail server (Solaris 9 for both boxes).
If I delete large numbers of messages and then my pine message list screen refreshes (via eXpunge) then sometimes my message numbering is off by one. After the expunge, if I move to message 67 and open it, dovecot will show me message 68 instead. The only way to fix this is to close pine (and the imap connection) and restart. This off-by-one error most often happens while I am deleting messages and a new message arrives.
An annoyance, but I never had this problem with UW imap.
Jeff Earickson Colby College
On Mon, 12 Sep 2005, Todd Vierling wrote:
Date: Mon, 12 Sep 2005 09:58:41 -0400 (EDT) From: Todd Vierling <tv@duh.org> To: Timo Sirainen <tss@iki.fi> Cc: dovecot@dovecot.org Subject: Re: [Dovecot] Still weird UID ordering issues with maildir
On Sun, 11 Sep 2005, Timo Sirainen wrote:
Just wondering if there's a way I could reproduce this problem..
It just hit me again, and now I realize that there is a common behavior pattern happening. In every case that this has happened, I have been deleting large numbers (>10) of messages from a mailbox. IIRC, in Maildir flags are expressed as filename additions, so a rename(2) is happening.
Perhaps dovecot is doing the rename, but not updating something in the index to match? Or, this could be a race condition related to the rename(2)....
-- -- Todd Vierling <tv@duh.org> <tv@pobox.com> <todd@vierling.name>
On Mon, 2005-09-12 at 10:34 -0400, Jeff A. Earickson wrote:
If I delete large numbers of messages and then my pine message list screen refreshes (via eXpunge) then sometimes my message numbering is off by one. After the expunge, if I move to message 67 and open it, dovecot will show me message 68 instead. The only way to fix this is to close pine (and the imap connection) and restart. This off-by-one error most often happens while I am deleting messages and a new message arrives.
I just tried it a couple of times but couldn't reproduce it. Sounds pretty strange and a dangerous problem anyway.. If you can reproduce it, could you do it while saving the session to rawlog? That way I could see if Dovecot sends some reply wrong or if it just gets internally confused.
Timo, Sorry to be dense here, but how does one create a rawlog? I read the docs in INSTALL and on the dovecot docs website. It says that rawlog capability is just there in 1.0alpha, no special compile flags needed. I created a dovecot.rawlog directory in my homedir, then stopped and restarted my imap client (pine). Nothing appears in this directory. What am I doing wrong? Thanks.
Jeff Earickson Colby College
On Sun, 18 Sep 2005, Timo Sirainen wrote:
Date: Sun, 18 Sep 2005 18:48:16 +0300 From: Timo Sirainen <tss@iki.fi> To: Jeff A. Earickson <jaearick@colby.edu> Cc: dovecot@dovecot.org Subject: Re: [Dovecot] Still weird UID ordering issues with maildir
On Mon, 2005-09-12 at 10:34 -0400, Jeff A. Earickson wrote:
If I delete large numbers of messages and then my pine message list screen refreshes (via eXpunge) then sometimes my message numbering is off by one. After the expunge, if I move to message 67 and open it, dovecot will show me message 68 instead. The only way to fix this is to close pine (and the imap connection) and restart. This off-by-one error most often happens while I am deleting messages and a new message arrives.
I just tried it a couple of times but couldn't reproduce it. Sounds pretty strange and a dangerous problem anyway.. If you can reproduce it, could you do it while saving the session to rawlog? That way I could see if Dovecot sends some reply wrong or if it just gets internally confused.
On Sun, 2005-09-18 at 18:14 -0400, Jeff A. Earickson wrote:
Sorry to be dense here, but how does one create a rawlog? I read
the docs in INSTALL and on the dovecot docs website. It says that rawlog capability is just there in 1.0alpha, no special compile flags needed. I created a dovecot.rawlog directory in my homedir, then stopped and restarted my imap client (pine). Nothing appears in this directory. What am I doing wrong? Thanks.
You'll have to put the imap connection through rawlog binary, something like:
mail_executable = /usr/libexec/dovecot/rawlog /usr/libexec/dovecot/imap
I've got rawlog running in my home directory, I'll see what I can capture. It may take a while...
Jeff Earickson Colby College
On Sun, 18 Sep 2005, Timo Sirainen wrote:
Date: Sun, 18 Sep 2005 18:48:16 +0300 From: Timo Sirainen <tss@iki.fi> To: Jeff A. Earickson <jaearick@colby.edu> Cc: dovecot@dovecot.org Subject: Re: [Dovecot] Still weird UID ordering issues with maildir
On Mon, 2005-09-12 at 10:34 -0400, Jeff A. Earickson wrote:
If I delete large numbers of messages and then my pine message list screen refreshes (via eXpunge) then sometimes my message numbering is off by one. After the expunge, if I move to message 67 and open it, dovecot will show me message 68 instead. The only way to fix this is to close pine (and the imap connection) and restart. This off-by-one error most often happens while I am deleting messages and a new message arrives.
I just tried it a couple of times but couldn't reproduce it. Sounds pretty strange and a dangerous problem anyway.. If you can reproduce it, could you do it while saving the session to rawlog? That way I could see if Dovecot sends some reply wrong or if it just gets internally confused.
On Mon, 2005-09-12 at 10:34 -0400, Jeff A. Earickson wrote:
If I delete large numbers of messages and then my pine message list screen refreshes (via eXpunge) then sometimes my message numbering is off by one. After the expunge, if I move to message 67 and open it, dovecot will show me message 68 instead. The only way to fix this is to close pine (and the imap connection) and restart. This off-by-one error most often happens while I am deleting messages and a new message arrives.
Timo writes:
I just tried it a couple of times but couldn't reproduce it. Sounds pretty strange and a dangerous problem anyway.. If you can reproduce it, could you do it while saving the session to rawlog? That way I could see if Dovecot sends some reply wrong or if it just gets internally confused.
I'm using dovecot 1.0.61 on Sparc Gentoo Linux. I have about 200 mail accounts which are mostly read by the Mozilla family on both Linux and Windows, often with two or more browsers running against the same account at once, home and work. Every week or two I get a situation where the table of contents pane in Mozilla or Thunderbird doesn't match what message body is displayed. I also had the same problem with a 0.99 version, most probably 0.99.13. This problem has happened often enough that they were probably not all after large deletes. The workaround I use is to stop that particular dovecot and delete all the dovecot index files. Brian
On Mon, 2005-09-19 at 11:00 -0400, Brian Bartholomew wrote:
Timo writes:
I just tried it a couple of times but couldn't reproduce it. Sounds pretty strange and a dangerous problem anyway.. If you can reproduce it, could you do it while saving the session to rawlog? That way I could see if Dovecot sends some reply wrong or if it just gets internally confused.
I'm using dovecot 1.0.61 on Sparc Gentoo Linux. I have about 200 mail accounts which are mostly read by the Mozilla family on both Linux and Windows, often with two or more browsers running against the same account at once, home and work. Every week or two I get a situation where the table of contents pane in Mozilla or Thunderbird doesn't match what message body is displayed. I also had the same problem with a 0.99 version, most probably 0.99.13. This problem has happened often enough that they were probably not all after large deletes. The workaround I use is to stop that particular dovecot and delete all the dovecot index files.
A bit late reply, but I think this was fixed by 1.0.alpha4?
Every week or two I get a situation where the table of contents pane in Mozilla or Thunderbird doesn't match what message body is displayed.
| A bit late reply, but I think this was fixed by 1.0.alpha4?
Spiffy, thanks! I haven't been tracking the version changes, but upgrading the mail server is on my list for the first quarter next year, and I'll let the list know if I see the symptom again after that.
Brian
Timo Sirainen wrote:
On Thu, 2005-09-01 at 09:30 -0400, Todd Vierling wrote:
Nightly 20050829 (includes everything currently in nightly/ChangeLog):
Sep 1 09:25:02 server dovecot: imap-login: Login: user=<tv>, method=PLAIN, rip=192.168.1.1, lip=192.168.1.3, TLS Sep 1 09:25:02 server dovecot: IMAP(tv): Maildir /home/tv/.maildir sync: UID < next_uid (1187 < 1188, file = 1125581033.7788_2.server.duh.org:2,)
I saw this with 1.0a1 but never see it with 1.0t51. Server on Solaris 8, Maildir on NFS from a NetAppliance, Indexs on local UFS. Client is Thunderbird on OS/X - but I saw this ever after removing all the indexes and setting Thunderbird to use only 1 IMAP connection.
participants (6)
-
Brian Bartholomew
-
Gregory Bond
-
Jeff A. Earickson
-
Reuben Farrelly
-
Timo Sirainen
-
Todd Vierling