[Dovecot] you got chocolate in my peanut butter?!
Hey everyone,
Ran into something positively perplexing today. A user came to me and said that this morning when they checked their mail, they got about 120 strange new messages. Upon further inspection, it seems the "new" messages are all addressed to a certain other different user and are all 3 or 4 months old. And looking in that other user's /var/mail/(username) mailbox, I see the same messages that arrived mistakenly. And they're not in the /var/mail mailbox of the user whose desktop computer they ended up on. So it would appear that, possibly, when this user connected to the server, they got someone else's messages! Messages that, in fact, came from an account they don't even have the password to!
I'm really curious if anyone has seen something like this before. We're using dovecot-1.0.beta9, and have been since mid-June. I've never seen anything happen like this before. I'm perfectly willing to upgrade to the latest release candidate, but it's hard for me to "upgrade and see if that fixes it", because it happens so rarely and it won't be easy to know empirically. So what I'm really hoping for is confirmation that this is/was a known problem, if in fact it is.
Of course, I don't know that this is a dovecot bug, but I could imagine that it might be (maybe a daemon forgets to switch users after one session is closed and another is opened?), so I thought I'd ask. I did see what looked like TLS fixes and login fixes in the changelogs, so it doesn't seem out of the question that such a bug could've existed.
Some more information:
I checked the user's settings on their desktop computer, where the unexpected messages appear, and sure enough, there is only one POP server account configured there, and it has the correct username. What's more, I asked them what time this happened, and they said probably at 7:00am or maybe a little earlier. Looking at my dovecot logs, I see this (where 'theuser' is the user who received the messages):
Sep 28 06:59:49 myhostname dovecot: pop3-login: Disconnected: user=<theuser>, method=PLAIN, rip=192.168.1.245, lip=192.168.1.20, TLS
Sep 28 06:59:55 myhostname dovecot: pop3-login: Login: user=<theuser>, method=PLAIN, rip=192.168.1.245, lip=192.168.1.20, TLS
Sep 28 06:59:55 myhostname dovecot: POP3(theuser): Disconnected: Logged out top=0/0, retr=0/0, del=0/9, size=130585
So, it would seem that the user did login at the time they claimed and it was at that time (or close to it) that the weird messages appeared. Also, I checked the logs for logins from the person whose messages accidentally got downloaded, and it doesn't show them logging in until several hours later. Oh, and there are no log entries for either of the two users in question before that, at least not for over 12 hours before that.
The user is running Outlook 2003, with POP3 + TLS access to the mailbox.
My dovecot.conf has nothing fancy in it:
base_dir = /var/run/dovecot/
ssl_cert_file = /etc/mail/certs/server.crt
ssl_key_file = /etc/mail/certs/server.key
protocols = imap imaps pop3 pop3s
disable_plaintext_auth = no
login_dir = /var/run/dovecot/login
syslog_facility = local0
first_valid_uid = 100
protocol imap { }
protocol pop3 {
pop3_lock_session = yes
pop3_uidl_format = %08Xv%08Xu
}
auth default {
mechanisms = plain
user = root
passdb shadow { }
userdb passwd { }
}
The accounts are all coming out of LDAP via nsswitch (and this is all happening on Slackware 10.2), but I'm fairly sure that's irrelevant since "getent passwd", etc. all show the right stuff.
Thanks for any help anyone can give...
- Logan
On Thursday 28 September 2006 16:25, Logan Shaw wrote:
they ended up on. So it would appear that, possibly, when this user connected to the server, they got someone else's messages! Messages that, in fact, came from an account they don't even have the password to! snip know empirically. So what I'm really hoping for is confirmation that this is/was a known problem, if in fact it is.
I think it is known, although I'm not sure how many of the details are known.
Some more information: snip The accounts are all coming out of LDAP via nsswitch (and this
Search the list archives (and the Red Hat bugzilla) for nss_ldap. There's your culprit.
Offlist mail to this address is discarded unless
"/dev/rob0" or "not-spam" is in Subject: header
On Thu, 28 Sep 2006, /dev/rob0 wrote:
On Thursday 28 September 2006 16:25, Logan Shaw wrote:
they ended up on. So it would appear that, possibly, when this user connected to the server, they got someone else's messages! Messages that, in fact, came from an account they don't even have the password to!
The accounts are all coming out of LDAP via nsswitch (and this
Search the list archives (and the Red Hat bugzilla) for nss_ldap. There's your culprit.
Aha, that would appear to be it. Thanks for the pointer.
To summarize what I've found since then, this is listed in Redhat's bugzilla:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=154314
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=154315
And it has been discussed before on this list:
http://www.dovecot.org/list/dovecot/2005-March/006345.html
http://www.dovecot.org/list/dovecot/2005-April/006859.html
But, I cannot find a bug related to this at the PADL nss_ldap bugzilla:
http://bugzilla.padl.com/
So, it would appear that it hasn't been reported as the PADL people, which would be helpful if it were to ever be fixed. Unfortunately, I'm not sure I know enough about the problem to create a useful bug report.
Still, I have to confess I'm a little confused about why dovecot behaves the way it does. When authenticating against the results of a getpwnam() call, why take the name that it returns when you already had the username in the first place? Maybe I am missing something, but I can't see any advantage in doing that.
Along the same lines, now suddenly I'm questioning whether an incorrect username returned from getpwnam() (and nss_ldap) really is the source of my problems. In the version I'm using (1.0.beta9), src/auth/userdb-passwd.c seems to already have the workaround that checks for bogus NSS data:
pw = getpwnam(auth_request->user);
if (pw == NULL) { /* ... snip ... */ }
if (strcasecmp(pw->pw_name, auth_request->user) != 0) {
/* try to catch broken NSS implementations (nss_ldap) */
i_fatal("BROKEN NSS IMPLEMENTATION: "
"getpwnam() lookup returned different user than was "
"requested (%s != %s).",
pw->pw_name, auth_request->user);
}
And yet, I haven't seen that message in the logs, and since it is a fatal error, it should prevent the session from proceding (and the wrong mail messages from downloading) anyway, no? So how can I be running this code and still experience the effects of an nss_ldap bug, even if there is one?
- Logan
On Fri, 29 Sep 2006, Peter Fern wrote:
Logan Shaw wrote:
So how can I be running this code and still experience the effects of an nss_ldap bug, even if there is one?
Is there a reason you're not just hitting LDAP directly?
The reason I started with dovecot going through getpwnam() and getspnam() is philosophical. Abstracting that so the configuration (and the implementation) is all in one place is the clean way to do things, and up until now there was no negative side to it.
But, I'm not so much of an idealist that I won't switch to LDAP if it's necessary to work around some bug that's causing mail to go to the wrong place. :-)
Still, I want to understand what's really going on. As I said, this problem happens rarely enough that it's hard for me to know empirically whether it's fixed, so the only route to knowing it's fixed is analytical. Now that I'm reading the exact code that dovecot uses to check the userdb, it's not clear to me that switching to LDAP will fix my problem. userdb-passwd.c explicitly checks for a wrong username back from getpwnam(), and the check doesn't appear to have fired[1], so how can it be that getpwnam() is returning wrong data?
- Logan
[1] Specifically, if getpwnam() returns a username that doesn't match what it was called with, dovecot calls i_fatal() whose output I assume will go to the log file. Since (a) the wrong messages got downloaded, and (b) I didn't see any "BROKEN NSS IMPLEMENTATION" message in the log file, I assume the check isn't firing.
On Fri, 2006-09-29 at 11:11 -0500, Logan Shaw wrote:
[1] Specifically, if getpwnam() returns a username that doesn't match what it was called with, dovecot calls i_fatal() whose output I assume will go to the log file. Since (a) the wrong messages got downloaded, and (b) I didn't see any "BROKEN NSS IMPLEMENTATION" message in the log file, I assume the check isn't firing.
I haven't before heard that this check wouldn't have caught the problem, but since I don't know what exactly the bug in nss_ldap is, I guess it's possible that sometimes the username is correct but the rest of the data (uid and home dir especially) isn't..
In any case, the only case when I've ever heard that user has had access to another user's mailbox accidentally is with nss_ldap, so I'm pretty sure that's the problem even if my check isn't working.
On Sun, 8 Oct 2006, Timo Sirainen wrote:
On Fri, 2006-09-29 at 11:11 -0500, Logan Shaw wrote:
[1] Specifically, if getpwnam() returns a username that doesn't match what it was called with, dovecot calls i_fatal() whose output I assume will go to the log file. Since (a) the wrong messages got downloaded, and (b) I didn't see any "BROKEN NSS IMPLEMENTATION" message in the log file, I assume the check isn't firing.
I haven't before heard that this check wouldn't have caught the problem, but since I don't know what exactly the bug in nss_ldap is, I guess it's possible that sometimes the username is correct but the rest of the data (uid and home dir especially) isn't..
Yeah, that's possible, but can it cause this problem? I don't think a wrong home directory can cause this problem in my case, because I have the inboxes stored under /var/mail and I've verified that the wrong messages were present in a mailbox in /var/mail. So the home directory isn't involved.
So the other possibility is that it has something to do with the numeric uid. That would mean that dovecot is taking the username the user logs in with over the POP session, calling getpwnam() to get the uid, then calling getpwuid() (or something) later on to get the username back again. But I don't see getpwuid() anywhere in the code except in src/deliver/deliver.c, and I don't think deliver.c is related.
So, I'm still confused. :-) Does dovecot translate to numeric uid and then back again?
In any case, the only case when I've ever heard that user has had access to another user's mailbox accidentally is with nss_ldap, so I'm pretty sure that's the problem even if my check isn't working.
Yeah, there are definitely a lot of things pointing that direction. For one thing, it would be really impressive if dovecot could just come up with another valid username out of thin air. So it seems like it has to be getting it from the name service. But how?
- Logan
On Fri, 2006-10-13 at 10:20 -0500, Logan Shaw wrote:
On Sun, 8 Oct 2006, Timo Sirainen wrote:
On Fri, 2006-09-29 at 11:11 -0500, Logan Shaw wrote:
[1] Specifically, if getpwnam() returns a username that doesn't match what it was called with, dovecot calls i_fatal() whose output I assume will go to the log file. Since (a) the wrong messages got downloaded, and (b) I didn't see any "BROKEN NSS IMPLEMENTATION" message in the log file, I assume the check isn't firing.
I haven't before heard that this check wouldn't have caught the problem, but since I don't know what exactly the bug in nss_ldap is, I guess it's possible that sometimes the username is correct but the rest of the data (uid and home dir especially) isn't..
Yeah, that's possible, but can it cause this problem? I don't think a wrong home directory can cause this problem in my case, because I have the inboxes stored under /var/mail and I've verified that the wrong messages were present in a mailbox in /var/mail. So the home directory isn't involved.
Umm. I didn't before notice that you said that the messages actually were in the wrong user's INBOX.
Are you using Dovecot's deliver? Doesn't look like that from the dovecot.conf you posted initially. So it looks like the problem is with whatever put those messages in the user's INBOX.. Perhaps still nss_ldap bug. :)
So, I'm still confused. :-) Does dovecot translate to numeric uid and then back again?
No.
On Fri, 13 Oct 2006, Timo Sirainen wrote:
On Fri, 2006-10-13 at 10:20 -0500, Logan Shaw wrote:
On Sun, 8 Oct 2006, Timo Sirainen wrote:
On Fri, 2006-09-29 at 11:11 -0500, Logan Shaw wrote:
[1] Specifically, if getpwnam() returns a username that doesn't match what it was called with, dovecot calls i_fatal() whose output I assume will go to the log file. Since (a) the wrong messages got downloaded, and (b) I didn't see any "BROKEN NSS IMPLEMENTATION" message in the log file, I assume the check isn't firing.
I haven't before heard that this check wouldn't have caught the problem, but since I don't know what exactly the bug in nss_ldap is, I guess it's possible that sometimes the username is correct but the rest of the data (uid and home dir especially) isn't..
Yeah, that's possible, but can it cause this problem? I don't think a wrong home directory can cause this problem in my case, because I have the inboxes stored under /var/mail and I've verified that the wrong messages were present in a mailbox in /var/mail. So the home directory isn't involved.
Umm. I didn't before notice that you said that the messages actually were in the wrong user's INBOX.
Sorry, that wasn't very clear. When I said "wrong messages", I meant "messages in question". The messages were in the correct mailbox under /var/mail.
What I was getting at is that I verified the messages were under /var/mail rather than under somebody's home directory. That's relevant because if the messages are in /var/mail then pw_dir can have the wrong value and it won't affect what mailbox is opened.
So, I'm still confused. :-) Does dovecot translate to numeric uid and then back again?
No.
Hmm, then I'm at a loss to understand how wrong data from nss_ldap (or from getpwnam(), that is) can cause what happened. dovecot doesn't care what values are in pw_uid or pw_dir, and dovecot checks that pw_name matches getpwnam()'s argument.
- Logan
If all of your users reside in LDAP, it would be safest to bypass nss_ldap and pam_ldap all together and have dovecot talk directly to the LDAP database.
We've had nothing but success since we made the change on our end. Not only did it eliminate the problems that we were seeing, it also makes the authentication path a bit more efficient (why have a 'monkey in the middle' when you can talk right to LDAP?).
IMHO, I've always seen pam_ldap/nss_ldap as a band-aid type of hack.. :)
-Rich
participants (5)
-
/dev/rob0
-
Logan Shaw
-
Peter Fern
-
Rich West
-
Timo Sirainen