[Dovecot] Unable to (un)subscribe mbox with AIX, NFS and netapp filer
Hi Timo, dovecot 1.2.0 is great! Faster and more stable on mboxes that 1.1.x by far... good job :-) Today I stumbled over a strange problem when I tried to subscribe an existing Mailbox (mbox), which doesn't work. Thunderbird IMAP log -------------------- 9932[4124558]: 40f3498:imap.fh-trier.de:A:SendData: 11 subscribe "Mail/Archiv/1998" 9932[4124558]: ReadNextLine [stream=403f280 nb=108 needmore=0] 9932[4124558]: 40f3498:imap.fh-trier.de:A:CreateNewLineFromSocket: 11 NO [SERVERBUG] Internal error occurred. Refer to server log for more information. [2009-07-06 08:14:32] Server log ---------- Jul 6 08:14:32 trevi mail:info dovecot: IMAP(beckerr): Namespace Mail/: Using permissions from /u/f0/rzuser/beckerr/Mail: mode=0700 gid=-1 Jul 6 08:14:32 trevi mail:err|error dovecot: IMAP(beckerr): fchown(/u/f0/rzuser/beckerr/Mail/.subscriptions.lock, -1, -1) failed: Invalid argument Jul 6 08:14:32 trevi mail:err|error dovecot: IMAP(beckerr): file_dotlock_open() failed with subscription file /u/f0/rzuser/beckerr/Mail/.subscriptions: Invalid argument The error just appears on NFS mounted shared and I'm not sure if AIX or netapp is the cause. So to determine the real problem is not easy, but to fix it is: While uid and gid are both -1 the call could be suppressed, because nothing is really changed: --- ./lib/file-dotlock.c.org 2009-07-06 09:25:14.000000000 +0200 +++ ./lib/file-dotlock.c 2009-07-06 09:24:48.000000000 +0200 @@ -780,7 +780,7 @@ fd = file_dotlock_open(set, path, flags, &dotlock); umask(old_mask); - if (fd != -1) { + if (fd != -1 && (uid != -1 || gid != -1)) { if (fchown(fd, uid, gid) < 0) { if (errno == EPERM && uid == (uid_t)-1) { i_error("%s", eperm_error_get_chgrp("fchown", Ralf -- ______________________________________________________________________ Dipl.-Inform. (FH) Ralf Becker Rechenzentrum (r/ft) der FH Trier (Network|Mail|Web|Firewall) University of applied sciences Administrator Schneidershof, D-54293 Trier Mail: beckerr@fh-trier.de Fon: +49 651 8103 499 Web: http://www.fh-trier.de/~beckerr Fax: +49 651 8103 214 PubKey: http://www.fh-trier.de/~beckerr Crypto: GnuPG, S/MIME ______________________________________________________________________ Wenn Gott gewollt haette, dass E-Mail in HTML geschrieben wuerden, endeten Gebete traditionell mit </amen>. (Tom Listen)
Le 6 juil. 09 à 09:47, Ralf Becker a écrit :
[...]
Jul 6 08:14:32 trevi mail:err|error dovecot: IMAP(beckerr): fchown(/u/f0/rzuser/beckerr/Mail/.subscriptions.lock, -1, -1) failed: Invalid argument
[...]
The error just appears on NFS mounted shared and I'm not sure if AIX or netapp is the cause.
According to the posix specification, fchown may return EINVAL when the owner or group ID is not a value supported by the implementation, or when the fildes argument refers to a pipe or socket or an fattach()- ed STREAM and the implementation disallows execution of fchown() on a pipe. Wouldn't it be worth to check what kind of entity gets created under your environment? I ask because I wouldn't exclude without further investigations the possibility of encountering other side effects wrt files throughout the code.
So to determine the real problem is not easy, but to fix it is:
While uid and gid are both -1 the call could be suppressed, because nothing is really changed:
--- ./lib/file-dotlock.c.org 2009-07-06 09:25:14.000000000 +0200 +++ ./lib/file-dotlock.c 2009-07-06 09:24:48.000000000 +0200 @@ -780,7 +780,7 @@ fd = file_dotlock_open(set, path, flags, &dotlock); umask(old_mask);
- if (fd != -1) { + if (fd != -1 && (uid != -1 || gid != -1)) { if (fchown(fd, uid, gid) < 0) { if (errno == EPERM && uid == (uid_t)-1) { i_error("%s", eperm_error_get_chgrp("fchown",
Alternatively, perhaps could you write a small c program so as to test a fchown(fd,-1,-1) operation on a "regular" file as see whether it fails or not. HTH, Axel
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Which Ontap version do you run on the filer ? Axel Luttgens wrote:
Le 6 juil. 09 à 09:47, Ralf Becker a écrit :
[...]
Jul 6 08:14:32 trevi mail:err|error dovecot: IMAP(beckerr): fchown(/u/f0/rzuser/beckerr/Mail/.subscriptions.lock, -1, -1) failed: Invalid argument
[...]
The error just appears on NFS mounted shared and I'm not sure if AIX or netapp is the cause.
According to the posix specification, fchown may return EINVAL when the owner or group ID is not a value supported by the implementation, or when the fildes argument refers to a pipe or socket or an fattach()-ed STREAM and the implementation disallows execution of fchown() on a pipe.
Wouldn't it be worth to check what kind of entity gets created under your environment? I ask because I wouldn't exclude without further investigations the possibility of encountering other side effects wrt files throughout the code.
So to determine the real problem is not easy, but to fix it is:
While uid and gid are both -1 the call could be suppressed, because nothing is really changed:
--- ./lib/file-dotlock.c.org 2009-07-06 09:25:14.000000000 +0200 +++ ./lib/file-dotlock.c 2009-07-06 09:24:48.000000000 +0200 @@ -780,7 +780,7 @@ fd = file_dotlock_open(set, path, flags, &dotlock); umask(old_mask);
- if (fd != -1) { + if (fd != -1 && (uid != -1 || gid != -1)) { if (fchown(fd, uid, gid) < 0) { if (errno == EPERM && uid == (uid_t)-1) { i_error("%s", eperm_error_get_chgrp("fchown",
Alternatively, perhaps could you write a small c program so as to test a fchown(fd,-1,-1) operation on a "regular" file as see whether it fails or not.
HTH, Axel
-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.11 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkpR5j0ACgkQ6f7UMO5oSsURHACfU1FMVB+sLctoB991QkN2RgvY 924AoJ2je0oDa9Z9zCmb4TFsOnfQBz6p =4bt3 -----END PGP SIGNATURE-----
Frank Bonnet schrieb:
Which Ontap version do you run on the filer ?
NetApp Release 7.2.4: Fri Nov 16 00:34:57 PST 2007
--
Dipl.-Inform. (FH) Ralf Becker Rechenzentrum (r/ft) der FH Trier (Network|Mail|Web|Firewall) University of applied sciences Administrator Schneidershof, D-54293 Trier
Mail: beckerr@fh-trier.de Fon: +49 651 8103 499 WWW: http://www.fh-trier.de/~beckerr Fax: +49 651 8103 214
Hello Axel,
attached is a small tool to test fchown on a freshly created file:
<
These are the tests I've made using this tool:
compile
xlc /tmp/t.c -o /tmp/t
environemnt
mount | grep -E "(filer0|hd4)" /dev/hd4 / jfs Jun 11 23:25 rw,log=/dev/hd8 filer0 /vol/mail /net/var_mail nfs3 Jun 11 23:26 rw,proto=tcp,port=2049,wsize=65534,rsize=65534 filer0 /vol/home /u/f0 nfs4 Jul 06 08:09 rw,proto=tcp,port=2049,vers=4,wsize=65534,rsize=65534
local filesystem => SUCCESS
/tmp/t /tmp/test.txt fchown returns 0
NFS3 filesystem => SUCCESS
/tmp/t /net/var_mail/spool/test.txt fchown returns 0
NFS4 filesystem => ERROR
/tmp/t /u/f0/test.txt fchown returns -1 errno=22 (Invalid argument)
So I should alter the to subject to ... with AIX, NFS4 and netapp ... :-)
Back to your question:
Wouldn't it be worth to check what kind of entity gets created under your environment?
yes and it's just a raw file
and as suspected conclusion
EINVAL is returned because the owner or group ID is not a value supported by the implementation (of NFS4 on netapp filers?)
Ralf
Axel Luttgens schrieb:
According to the posix specification, fchown may return EINVAL when the owner or group ID is not a value supported by the implementation, or when the fildes argument refers to a pipe or socket or an fattach()-ed STREAM and the implementation disallows execution of fchown() on a pipe.
Wouldn't it be worth to check what kind of entity gets created under your environment? I ask because I wouldn't exclude without further investigations the possibility of encountering other side effects wrt files throughout the code.
--
Dipl.-Inform. (FH) Ralf Becker Rechenzentrum (r/ft) der FH Trier (Network|Mail|Web|Firewall) University of applied sciences Administrator Schneidershof, D-54293 Trier
Mail: beckerr@fh-trier.de Fon: +49 651 8103 499 WWW: http://www.fh-trier.de/~beckerr Fax: +49 651 8103 214
Le 6 juil. 09 à 14:00, Ralf Becker a écrit :
Hello Axel,
attached is a small tool to test fchown on a freshly created file: <
>
Damn... didn't go through... ;-)
These are the tests I've made using this tool:
[...]
So I should alter the to subject to ... with AIX, NFS4 and netapp ... :-)
Back to your question:
Wouldn't it be worth to check what kind of entity gets created under your environment?
yes and it's just a raw file
and as suspected conclusion
EINVAL is returned because the owner or group ID is not a value
supported by the implementation (of NFS4 on netapp filers?)
I just encountered this one:
http://www.dovecot.org/list/dovecot/2007-July/024059.html
which seems to indicate that a fchmod(fd, -1, -1) appears to fail on
some NFS/OS combinations.
As a result, I'm not sure whether entirely hiding the log message is a
good idea; perhaps just change the logging level would be better, so
that one keeps the ability to track possibly problematic file systems...
Axel
Hello Axel,
attached is a small tool to test fchown on a freshly created file: <
> Damn... didn't go through... ;-)
ok... maybe i missed it... let's do it inline:
#include
main(int argc, char **argv) { int f = open(argv[1],O_CREAT|O_TRUNC); printf("fchown returns %i\n",fchown(f,-1L,-1L)); if (errno) printf("errno=%i (%s)\n", errno, strerror(errno)); close(f); unlink(argv[1]); }
As a result, I'm not sure whether entirely hiding the log message is a good idea; perhaps just change the logging level would be better, so that one keeps the ability to track possibly problematic file systems...
It's not just the log message. If you have a look on the entire function, you'll see that it fails if fchown fails:
static int file_dotlock_open_mode_full(<...>,uid_t uid, gid_t gid,<...>) { <...>
if (fd != -1) {
if (fchown(fd, uid, gid) < 0) {
if (errno == EPERM && uid == (uid_t)-1) {
i_error("%s", eperm_error_get_chgrp("fchown",
file_dotlock_get_lock_path(dotlock),
gid, gid_origin));
} else {
i_error("fchown(%s, %ld, %ld) failed: %m",
file_dotlock_get_lock_path(dotlock),
(long)uid, (long)gid);
}
file_dotlock_delete(&dotlock);
return -1;
}
}
*dotlock_r = dotlock;
return fd;
}
While this function seems to create all dotlock files (not just for the .subscribtions file) this means that on same NFS(4) file systems dotlocking is actually not working.
The linux man page of chown(3) (in place of fchown(3)) says:
-----------------------------8<-------------------------------- If owner or group is specified as ( uid_t)-1 or ( gid_t)-1, respectively, the corresponding ID of the file shall not be changed. If both owner and group are -1, the times need not be updated. Upon successful completion, chown() shall mark for update the st_ctime field of the file. ------------------->8------------------------------------------
Is my understanding of these sentences correct? "If owner and group are -1, nothing is done?"
In this case it should be save to skip the call, shouldn't it?
Ralf
--
Dipl.-Inform. (FH) Ralf Becker Rechenzentrum (r/ft) der FH Trier (Network|Mail|Web|Firewall) University of applied sciences Administrator Schneidershof, D-54293 Trier
Mail: beckerr@fh-trier.de Fon: +49 651 8103 499 Web: http://www.fh-trier.de/~beckerr Fax: +49 651 8103 214 PubKey: http://www.fh-trier.de/~beckerr Crypto: GnuPG, S/MIME
Wenn Gott gewollt haette, dass E-Mail in HTML geschrieben wuerden, endeten Gebete traditionell mit </amen>. (Tom Listen)
Le 6 juil. 09 à 23:07, Ralf Becker a écrit :
Hello Axel,
[...]
It's not just the log message. If you have a look on the entire function, you'll see that it fails if fchown fails:
[...]
Yes, you're right; I've been too elliptic... :-(
I was worried with a "solution" that potentially could fully hide
deeper problems, hence the suggestion to at least leave something in
the logs.
While this function seems to create all dotlock files (not just for the .subscribtions file) this means that on same NFS(4) file systems dotlocking is actually not working.
After a quick look, it seems that dotlocking for mbox mailboxes goes
through another path, and skips that fchmod() operation.
But please don't ask why. ;-)
The linux man page of chown(3) (in place of fchown(3)) says:
-----------------------------8<-------------------------------- If owner or group is specified as ( uid_t)-1 or ( gid_t)-1, respectively, the corresponding ID of the file shall not be changed. If both owner and group are -1, the times need not be updated. Upon successful completion, chown() shall mark for update the st_ctime field of the file. ------------------->8------------------------------------------
Is my understanding of these sentences correct? "If owner and group are -1, nothing is done?"
In this case it should be save to skip the call, shouldn't it?
Yes, I guess so.
Unless the rationale for that call is to ensure a correct cache
flushing for NFS clients, while being some kind of (costly) no-op
otherwise?
Ahem... Timo?
Axel
On Tue, 2009-07-07 at 17:58 +0200, Axel Luttgens wrote:
Is my understanding of these sentences correct? "If owner and group are -1, nothing is done?"
In this case it should be save to skip the call, shouldn't it?
Yes, I guess so.
Yes. Committed: http://hg.dovecot.org/dovecot-1.2/rev/d6337be8ae30
Unless the rationale for that call is to ensure a correct cache
flushing for NFS clients, while being some kind of (costly) no-op
otherwise?
In that case I would have used those nfs_flush_*() functions.
Hi Timo, today I found this in the logs again: Jul 29 10:38:27 trevi mail:err|error dovecot: IMAP(beckerr): fchown(/u/f0/rzuser/beckerr/Mail/.subscriptions.lock, -1, -1) failed: Invalid argument Jul 29 10:38:27 trevi mail:err|error dovecot: IMAP(beckerr): file_dotlock_open() failed with subscription file /u/f0/rzuser/beckerr/Mail/.subscriptions: Invalid argument I located the bug in src/lib/file-dotlock.c ... a patch is attached. Ralf Timo Sirainen schrieb am 07.07.2009 18:40:
On Tue, 2009-07-07 at 17:58 +0200, Axel Luttgens wrote:
Is my understanding of these sentences correct? "If owner and group are -1, nothing is done?"
In this case it should be save to skip the call, shouldn't it? Yes, I guess so.
Yes. Committed: http://hg.dovecot.org/dovecot-1.2/rev/d6337be8ae30
Unless the rationale for that call is to ensure a correct cache flushing for NFS clients, while being some kind of (costly) no-op otherwise?
In that case I would have used those nfs_flush_*() functions.
-- ______________________________________________________________________ Dipl.-Inform. (FH) Ralf Becker Rechenzentrum (r/ft) der FH Trier (Network|Mail|Web|Firewall) University of applied sciences Administrator Schneidershof, D-54293 Trier Mail: beckerr@fh-trier.de Fon: +49 651 8103 499 Web: http://www.fh-trier.de/~beckerr Fax: +49 651 8103 214 PubKey: http://www.fh-trier.de/~beckerr Crypto: GnuPG, S/MIME ______________________________________________________________________ Wenn Gott gewollt haette, dass E-Mail in HTML geschrieben wuerden, endeten Gebete traditionell mit </amen>. (Tom Listen) --- dovecot-1.2.2/src/lib/file-dotlock.c.org 2009-07-29 10:44:21.000000000 +0200 +++ dovecot-1.2.2/src/lib/file-dotlock.c 2009-07-29 10:44:42.000000000 +0200 @@ -780,7 +780,7 @@ fd = file_dotlock_open(set, path, flags, &dotlock); umask(old_mask); - if (fd != -1 && (uid != (uid)-1 || gid != (gid_t)-1)) { + if (fd != -1 && (uid != (uid_t)-1 || gid != (gid_t)-1)) { if (fchown(fd, uid, gid) < 0) { if (errno == EPERM && uid == (uid_t)-1) { i_error("%s", eperm_error_get_chgrp("fchown",
On Wed, 2009-07-29 at 11:04 +0200, Ralf Becker wrote:
Jul 29 10:38:27 trevi mail:err|error dovecot: IMAP(beckerr): fchown(/u/f0/rzuser/beckerr/Mail/.subscriptions.lock, -1, -1) failed: Invalid argument .. I located the bug in src/lib/file-dotlock.c ... a patch is attached.
- if (fd != -1 && (uid != (uid)-1 || gid != (gid_t)-1)) {
- if (fd != -1 && (uid != (uid_t)-1 || gid != (gid_t)-1)) {
Whops, thanks, committed. :)
participants (4)
-
Axel Luttgens
-
Frank Bonnet
-
Ralf Becker
-
Timo Sirainen