I am using dovecot 1.0.rc15 (a similar problem occurred in rc10) on Solaris 9 (sparc). When working with a user who's home dir is on a local disk everything seems fine. But when that home is on an NFS-mounted disk things are very badly awry.
Both the indices and the subscriptions file are being destroyed and what is left behind are files with names of the form .nfs72C034 etc. These files are described in the shell script nfsfind - every Solaris system ships with a root cron to run nfsfind and delete stale files:
# These files are created by NFS clients when an open file # is removed. To preserve some semblance of Unix semantics # the client renames the file to a unique name so that the # file appears to have been removed from the directory, but # is still usable by the process that has the file open.
I dont't see why dovecot would be unlink'ing it's active files while running - but I actually managed to see the dovecot.index.log appear briefly and then change to a .nfsXXXX file during the test user login.
dovecot.index always seems to exist correctly but from local tests I know there should be a dovecot-uidlist, dovecot.index.cache, dovecot.index.log too, and these all appear to be .nfsXXXX files
I can't seem to find anything relating to this specific problem in the archives, although these files have been mentioned a few times. And yes I do have mmap_disable = yes (but that is only relevant to the indexes, and the same thing is happening to the subscriptions file - ie some common library code for dealing with files has an issue)
I should also note that these NFS mounts have been in existence for years and lots of things are running through them, so there shouldn't be anything wrong there.
Suggestions, please?
Thanks,
John Harper
Senior Systems Administrator Information and Instructional Technology Services University of Toronto Scarborough harper@utsc.utoronto.ca
On Fri, 15 Dec 2006 13:08:08 -0500 John Harper harper@utsc.utoronto.ca wrote:
I am using dovecot 1.0.rc15 (a similar problem occurred in rc10) on Solaris 9 (sparc). When working with a user who's home dir is on a local disk everything seems fine. But when that home is on an NFS-mounted disk things are very badly awry.
Any particular reason to Not keep the indexes on the dovecot server?
I have been running with home directory on Solaris 9 (sparc) nfs and the indexes on the imap server Solaris 10 (sparc).
default_mail_env = maildir:%h/Maildir:INDEX=/opt/csw/var/dovecot/indexes/%n
I have not noticed any problems and I have moved hundreds of messages around to different subfolders looking for a problem. I have all of the files that you had listed. Some are on Maildir and some are on index folders.
Alex
On Fri, 2006-12-15 at 13:08 -0500, John Harper wrote:
I am using dovecot 1.0.rc15 (a similar problem occurred in rc10) on Solaris 9 (sparc). When working with a user who's home dir is on a local disk everything seems fine. But when that home is on an NFS-mounted disk things are very badly awry.
Both the indices and the subscriptions file are being destroyed and what is left behind are files with names of the form .nfs72C034 etc.
I've heard of this before. I think there's a Solaris kernel patch to fix this, but I'm not sure. If you find it, please add a note about it to http://wiki.dovecot.org/NFS Anyway if I remember correctly, the problem went like this: 1. Dovecot creates a temp.1234 file, link()s it into subscriptions.lock file and unlink()s temp.1234. 2. The subscriptions are written to the lock file, and then Dovecot does rename(subscriptions.lock, subscriptions) 3. The file is close()d By closing the file before renaming, the problem went away. I think most of these problems could be fixed with a simple patch, but I didn't want to do that change because it probably still breaks with other less obvious things, so it's better to be fully broken. You could anyway try if this mostly-fixes it: Index: src/lib/file-dotlock.c =================================================================== RCS file: /var/lib/cvs/dovecot/src/lib/file-dotlock.c,v retrieving revision 1.35.2.3 diff -u -r1.35.2.3 file-dotlock.c --- src/lib/file-dotlock.c 8 Jun 2006 16:13:46 -0000 1.35.2.3 +++ src/lib/file-dotlock.c 16 Dec 2006 01:30:06 -0000 @@ -664,6 +664,12 @@ } } + if (dotlock->fd != -1) { + if (close(dotlock->fd) < 0) + i_error("close(%s) failed: %m", dotlock->path); + dotlock->fd = -1; + } + if (rename(lock_path, dotlock->path) < 0) { i_error("rename(%s, %s) failed: %m", lock_path, dotlock->path); file_dotlock_free(dotlock);
Timo - For some reason, this patch didn't want to apply, for me, but I put the code in manually and it seems to have fixed all the obvious problems. Thanks! Mario Timo Sirainen wrote:
On Fri, 2006-12-15 at 13:08 -0500, John Harper wrote:
I am using dovecot 1.0.rc15 (a similar problem occurred in rc10) on Solaris 9 (sparc). When working with a user who's home dir is on a local disk everything seems fine. But when that home is on an NFS-mounted disk things are very badly awry.
Both the indices and the subscriptions file are being destroyed and what is left behind are files with names of the form .nfs72C034 etc.
I've heard of this before. I think there's a Solaris kernel patch to fix this, but I'm not sure. If you find it, please add a note about it to http://wiki.dovecot.org/NFS
Anyway if I remember correctly, the problem went like this:
1. Dovecot creates a temp.1234 file, link()s it into subscriptions.lock file and unlink()s temp.1234. 2. The subscriptions are written to the lock file, and then Dovecot does rename(subscriptions.lock, subscriptions) 3. The file is close()d
By closing the file before renaming, the problem went away. I think most of these problems could be fixed with a simple patch, but I didn't want to do that change because it probably still breaks with other less obvious things, so it's better to be fully broken.
You could anyway try if this mostly-fixes it:
Index: src/lib/file-dotlock.c =================================================================== RCS file: /var/lib/cvs/dovecot/src/lib/file-dotlock.c,v retrieving revision 1.35.2.3 diff -u -r1.35.2.3 file-dotlock.c --- src/lib/file-dotlock.c 8 Jun 2006 16:13:46 -0000 1.35.2.3 +++ src/lib/file-dotlock.c 16 Dec 2006 01:30:06 -0000 @@ -664,6 +664,12 @@ } }
+ if (dotlock->fd != -1) { + if (close(dotlock->fd) < 0) + i_error("close(%s) failed: %m", dotlock->path); + dotlock->fd = -1; + } + if (rename(lock_path, dotlock->path) < 0) { i_error("rename(%s, %s) failed: %m", lock_path, dotlock->path); file_dotlock_free(dotlock);
-- I don't need a name; my number's just fine. | Mario.Nigrovic@freescale.com It's nobody else's -- just mine, all mine. | 480-413-3578 Internal Use Only
This patch fixed the subscriptions file problem, but there is still a similar issue for some of the index files, eg dovecot.index.cache. I suppose I could put the indices on local storage (I really wanted to keep everything about a user together in their account), but I have to wonder if there are other places where the problem might again arise. Are draft messages manipulated in a similar manner? Down the road I'd be moving to a newer server (my test on a Solaris 10 x86 has hit an assert snag), but right now that's not an option, nor is an O/S upgrade (there's a reason these servers have been up rather more than 2 years..). Apparently all of this is not an issue Solaris 9 (sparc) Generic_118558-34 (as reported by Alex Moore), whereas my earlier version is Generic_117171-08, so the kernel fix lies somewhere between. Thanks, John Harper ------------------------------------------------- Senior Systems Administrator Information and Instructional Technology Services University of Toronto Scarborough harper@utsc.utoronto.ca On Sat, Dec 16, 2006 at 03:31:25AM +0200, Timo Sirainen wrote:
On Fri, 2006-12-15 at 13:08 -0500, John Harper wrote:
I am using dovecot 1.0.rc15 (a similar problem occurred in rc10) on Solaris 9 (sparc). When working with a user who's home dir is on a local disk everything seems fine. But when that home is on an NFS-mounted disk things are very badly awry.
Both the indices and the subscriptions file are being destroyed and what is left behind are files with names of the form .nfs72C034 etc.
I've heard of this before. I think there's a Solaris kernel patch to fix this, but I'm not sure. If you find it, please add a note about it to http://wiki.dovecot.org/NFS
Anyway if I remember correctly, the problem went like this:
1. Dovecot creates a temp.1234 file, link()s it into subscriptions.lock file and unlink()s temp.1234. 2. The subscriptions are written to the lock file, and then Dovecot does rename(subscriptions.lock, subscriptions) 3. The file is close()d
By closing the file before renaming, the problem went away. I think most of these problems could be fixed with a simple patch, but I didn't want to do that change because it probably still breaks with other less obvious things, so it's better to be fully broken.
You could anyway try if this mostly-fixes it:
Index: src/lib/file-dotlock.c =================================================================== RCS file: /var/lib/cvs/dovecot/src/lib/file-dotlock.c,v retrieving revision 1.35.2.3 diff -u -r1.35.2.3 file-dotlock.c --- src/lib/file-dotlock.c 8 Jun 2006 16:13:46 -0000 1.35.2.3 +++ src/lib/file-dotlock.c 16 Dec 2006 01:30:06 -0000 @@ -664,6 +664,12 @@ } }
+ if (dotlock->fd != -1) { + if (close(dotlock->fd) < 0) + i_error("close(%s) failed: %m", dotlock->path); + dotlock->fd = -1; + } + if (rename(lock_path, dotlock->path) < 0) { i_error("rename(%s, %s) failed: %m", lock_path, dotlock->path); file_dotlock_free(dotlock);
participants (4)
-
Alex Moore
-
John Harper
-
Mario Nigrovic
-
Timo Sirainen