[Dovecot] hanging imap... and users getting other users' emails!
Hi All,
We recently upgraded from UW (boy, starting to sound like a broken
record, eh?)...
Two problems:
1. Every two or three days users are unable to access their email
via IMAP. It seems that it's because dovecot-auth has hung. Has anyone seen anything like this?
2. (the really serious one)... one of our users, "angie" has twice
had her Outlook download other users' email! Once as another user, "helene", and just now as "mkarlin". The first time she got 800 of helene's email, and just now, about 30 of mkarlin's.... from our log, I noticed:
Mar 2 17:27:10 core imap-login: Login: mkarlin [::ffff:128.195.131.2] Mar 2 17:27:10 core imap-login: Login: mkarlin [::ffff:128.195.131.2] Mar 2 17:27:11 core pop3-login: Login: angie [::ffff:68.231.223.128] Mar 2 17:27:11 core imap-login: Login: mkarlin [::ffff:128.195.131.2] Mar 2 17:27:11 core imap-login: Login: mkarlin [::ffff:128.195.131.2]
Our server setup:
Fedora Core 3. dovecot-0.99.13. OpenLDAP, and dovecot auth'ing off PAM (mostly default setup, using Mailbox, and mail/ as a subdir).
Any ideas? Any more debug info I can supply?
.josh
Josh Burley said:
Hi All,
We recently upgraded from UW (boy, starting to sound like a broken
record, eh?)...
Two problems: 1. Every two or three days users are unable to access their email
via IMAP. It seems that it's because dovecot-auth has hung. Has anyone seen anything like this?
Ive dont believe I've seen this and I've had dovecot running for months on FreeBSD with pam authentication. Perhaps that indicates it could be a Fedora thing?
2. (the really serious one)... one of our users, "angie" has twice
had her Outlook download other users' email! Once as another user, "helene", and just now as "mkarlin". The first time she got 800 of helene's email, and just now, about 30 of mkarlin's.... from our log, I noticed:
Never seen this either. But I use maildir, so that could be related too, I guess.
Mar 2 17:27:10 core imap-login: Login: mkarlin [::ffff:128.195.131.2] Mar 2 17:27:10 core imap-login: Login: mkarlin [::ffff:128.195.131.2] Mar 2 17:27:11 core pop3-login: Login: angie [::ffff:68.231.223.128] Mar 2 17:27:11 core imap-login: Login: mkarlin [::ffff:128.195.131.2] Mar 2 17:27:11 core imap-login: Login: mkarlin [::ffff:128.195.131.2]
Our server setup:
Fedora Core 3. dovecot-0.99.13. OpenLDAP, and dovecot auth'ing off PAM (mostly default setup, using Mailbox, and mail/ as a subdir).
Thought about using a different version (newer, or even older) of dovecot and seeing if that makes the problem go away.
Any ideas? Any more debug info I can supply?
.josh
-- Dominic
- Every two or three days users are unable to access their email via IMAP. It seems that it's because dovecot-auth has hung. Has anyone seen anything like this?
Ive dont believe I've seen this and I've had dovecot running for months on FreeBSD with pam authentication. Perhaps that indicates it could be a Fedora thing?
Yep, a strange one, eh? I wonder what kind of debugging I can do to figure it out. I've switched from PAM to the direct LDAP authentication, in an attempt to stabilize mail access.
- (the really serious one)... one of our users, "angie" has twice had her Outlook download other users' email! Once as another user, "helene", and just now as "mkarlin". The first time she got 800 of helene's email, and just now, about 30 of mkarlin's.... from our log, I noticed:
Never seen this either. But I use maildir, so that could be related too, I guess.
Well, it really seems like perhaps the UID's are getting confused. The format of the box (in my mind) should matter as much... it'd be one thing if Angie was just receiving her mail in a corrupt manner... but she's actually getting OTHER people's mail. Notice that she and mkarlin accessed at the exact same time, via two different methods. Sounds like a race condition between the two in the code? Or perhaps part of the PAM layer issue (I read in the docs about PAM not returning UID's?).
Mar 2 17:27:10 core imap-login: Login: mkarlin [::ffff:128.195.131.2] Mar 2 17:27:10 core imap-login: Login: mkarlin [::ffff:128.195.131.2] Mar 2 17:27:11 core pop3-login: Login: angie [::ffff:68.231.223.128] Mar 2 17:27:11 core imap-login: Login: mkarlin [::ffff:128.195.131.2] Mar 2 17:27:11 core imap-login: Login: mkarlin [::ffff:128.195.131.2]
Thought about using a different version (newer, or even older) of dovecot and seeing if that makes the problem go away.
As a matter of fact, I attempted to upgrade the server to the latest stable version last night. I had it all installed and reconfigured. When I fired it up I tested IMAP access with four different accounts. Three of the accounts were fine, but then the fourth account only showed a single email in the INBOX, the oldest one. It seemed that for that Mailbox, for some reason, it was unable to find the delimiters between emails (or something like that ;). As soon as I downgraded back, all mailboxes worked fine.
.josh
On Wed, 2005-03-02 at 18:04 -0800, Josh Burley wrote:
2. (the really serious one)... one of our users, "angie" has twice
had her Outlook download other users' email! Once as another user, "helene", and just now as "mkarlin". The first time she got 800 of helene's email, and just now, about 30 of mkarlin's.... from our log, I noticed:
What userdb are you using? ldap? Is it returning mail-setting for users? This is breaking only randomly?
I've just switched to ldap.
But, when this problem was occurring (don't know if it is still), it was set up with userdb = passwd, and passwd = pam. But, our passwd only has none of our users in it (I don't really understand the use of this in the PAM context).
What mail-settings do you mean?
It's breaking randomly, yes.
login_preocess_per_connection is commented out. What is the default? I'll change this to "yes" now.
.josh
Timo Sirainen wrote:
On Wed, 2005-03-02 at 18:04 -0800, Josh Burley wrote:
- (the really serious one)... one of our users, "angie" has twice had her Outlook download other users' email! Once as another user, "helene", and just now as "mkarlin". The first time she got 800 of helene's email, and just now, about 30 of mkarlin's.... from our log, I noticed:
What userdb are you using? ldap? Is it returning mail-setting for users? This is breaking only randomly?
On Thu, 2005-03-03 at 09:25 -0800, Josh Burley wrote:
I've just switched to ldap.
But, when this problem was occurring (don't know if it is still), it was set up with userdb = passwd, and passwd = pam. But, our passwd only has none of our users in it (I don't really understand the use of this in the PAM context).
Um. How was it working at all then, if your users weren't in passwd? Because that's where Dovecot looked their UID, GID and home directory. PAM only checks that user's password is correct.
What mail-settings do you mean?
With passwd there's no such thing. With LDAP it's possible to return a "mail" field which overrides default_mail_env.
login_preocess_per_connection is commented out. What is the default? I'll change this to "yes" now.
Default is shown in the commented line, so it's yes.
Good question, eh? I thought the same when I went back and read the comments about the UID.
But, it *was* working. For a few weeks (we did the upgrade about three weeks ago). Only about 30 users in /etc/passwd, but over 300 in LDAP.
We have the default_main_env set, and no mail field in our LDAP schema.
Timo Sirainen wrote:
On Thu, 2005-03-03 at 09:25 -0800, Josh Burley wrote:
I've just switched to ldap.
But, when this problem was occurring (don't know if it is still), it was set up with userdb = passwd, and passwd = pam. But, our passwd only has none of our users in it (I don't really understand the use of this in the PAM context).
Um. How was it working at all then, if your users weren't in passwd? Because that's where Dovecot looked their UID, GID and home directory. PAM only checks that user's password is correct.
What mail-settings do you mean?
With passwd there's no such thing. With LDAP it's possible to return a "mail" field which overrides default_mail_env.
login_preocess_per_connection is commented out. What is the default? I'll change this to "yes" now.
Default is shown in the commented line, so it's yes.
On Thu, 2005-03-03 at 10:05 -0800, Josh Burley wrote:
Good question, eh? I thought the same when I went back and read the comments about the UID.
But, it *was* working. For a few weeks (we did the upgrade about three weeks ago). Only about 30 users in /etc/passwd, but over 300 in LDAP.
"passwd" actually doesn't mean /etc/passwd, but using getpwnam() function. That then uses /etc/nsswitch.conf and whatever configuration to figure out where to look up the users, which may end up using LDAP as well. Sounds like your system is configured this way, and sounds like the bug is in the LDAP NSS module.
What OS/distibution is this?
Ah, I misunderstood the comment, then.
Yes, nsswitch is set up to use ldap. Do you think that switching from pam to ldap might help with this problem?
It's a Fedora Core 3 machine.
Timo Sirainen wrote:
On Thu, 2005-03-03 at 10:05 -0800, Josh Burley wrote:
Good question, eh? I thought the same when I went back and read the comments about the UID.
But, it *was* working. For a few weeks (we did the upgrade about three weeks ago). Only about 30 users in /etc/passwd, but over 300 in LDAP.
"passwd" actually doesn't mean /etc/passwd, but using getpwnam() function. That then uses /etc/nsswitch.conf and whatever configuration to figure out where to look up the users, which may end up using LDAP as well. Sounds like your system is configured this way, and sounds like the bug is in the LDAP NSS module.
What OS/distibution is this?
PAM is working fine, as long as it's checking users' passwords correctly. Your problem was only because of the userdb, and changing that to ldap should fix it.
I just looked at libnss-ldap code. It seems to be using threads, which makes it even more likely to be the source of your problems. Perhaps someone should tell them (or RH bugzilla) about this bug and see what they say..
On Thu, 2005-03-03 at 10:14 -0800, Josh Burley wrote:
Ah, I misunderstood the comment, then.
Yes, nsswitch is set up to use ldap. Do you think that switching from pam to ldap might help with this problem?
It's a Fedora Core 3 machine.
Timo Sirainen wrote:
On Thu, 2005-03-03 at 10:05 -0800, Josh Burley wrote:
Good question, eh? I thought the same when I went back and read the comments about the UID.
But, it *was* working. For a few weeks (we did the upgrade about three weeks ago). Only about 30 users in /etc/passwd, but over 300 in LDAP.
"passwd" actually doesn't mean /etc/passwd, but using getpwnam() function. That then uses /etc/nsswitch.conf and whatever configuration to figure out where to look up the users, which may end up using LDAP as well. Sounds like your system is configured this way, and sounds like the bug is in the LDAP NSS module.
What OS/distibution is this?
On Mar 3, 2005, at 19:26, Timo Sirainen wrote:
I just looked at libnss-ldap code. It seems to be using threads, which makes it even more likely to be the source of your problems. Perhaps someone should tell them (or RH bugzilla) about this bug and see what they say..
nss_ldap does not use threads as such.
Since it can be loaded by any program using the C library, it can also be loaded by a threaded program, and must then attempt to do things in a thread safe way.
Regards, Frode
On Thu, 2005-03-03 at 10:14 -0800, Josh Burley wrote:
Ah, I misunderstood the comment, then.
Yes, nsswitch is set up to use ldap. Do you think that switching from pam to ldap might help with this problem?
It's a Fedora Core 3 machine.
Timo Sirainen wrote:
On Thu, 2005-03-03 at 10:05 -0800, Josh Burley wrote:
Good question, eh? I thought the same when I went back and read the comments about the UID.
But, it *was* working. For a few weeks (we did the upgrade about three weeks ago). Only about 30 users in /etc/passwd, but over 300 in LDAP.
"passwd" actually doesn't mean /etc/passwd, but using getpwnam() function. That then uses /etc/nsswitch.conf and whatever configuration to figure out where to look up the users, which may end up using LDAP as well. Sounds like your system is configured this way, and sounds like the bug is in the LDAP NSS module.
What OS/distibution is this?
participants (4)
-
Dominic Marks
-
Frode Nordahl
-
Josh Burley
-
Timo Sirainen