[Dovecot] Dovecot-2.1.14 - pop3 processes always hangs forever - another follow-up
Hello,
With the same basic configs as in my previous messages, I tried with the maildir format instead of the mbox format.
That is, in both cases ("virtual" uids/gids vs a single "real" uid/gid), doveconf -n shows: mail_location = maildir:~/maildir instead of: mail_location = mbox:~/mboxes:INBOX=~/mboxes/inbox
Here too, with "virtual" uids/gids, connections thru pop hang forever, while imap connections behave flawlessly.
And with a single "real" uid/gid, connections thru both pop and imap behave as expected.
As a result, it could be inferred that the problems I am encountering with pop connections do not depend on the mailbox format in use, and that they are entirely related to "virtual" uids/gids.
To recap: with this very basic config
# 2.1.14: /_ETC/dovecot/dovecot.conf
# OS: Darwin 12.2.1 x86_64
auth_verbose = yes
disable_plaintext_auth = no
mail_debug = yes
mail_location = mbox:~/mboxes:INBOX=~/mboxes/inbox
passdb {
args = /_ETC/dovecot/db.conf
driver = sql
}
ssl = no
userdb {
driver = prefetch
}
a pop connection hangs forever after the pass command
$ telnet 127.0.0.1 110
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
+OK Dovecot ready.
user user2
+OK
pass pass2
and the log shows (here, when the user connects for the first time)
dovecot[2997]: pop3-login: Login: user=<user2>, method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, mpid=3113, secured, session=<GTOlMjfV3gB/AAAB>
dovecot[2997]: pop3(user2): Debug: Effective uid=100003, gid=100003, home=/_Data/Mailstores/100003
dovecot[2997]: pop3(user2): Debug: Home dir not found: /_Data/Mailstores/100003
dovecot[2997]: pop3(user2): Debug: fs: root=/_Data/Mailstores/100003/mboxes, index=, control=, inbox=/_Data/Mailstores/100003/mboxes/inbox, alt=
dovecot[2997]: pop3(user2): Debug: Namespace : /_Data/Mailstores/100003/mboxes doesn't exist yet, using default permissions
dovecot[2997]: pop3(user2): Debug: Namespace : Using permissions from /_Data/Mailstores/100003/mboxes: mode=0700 gid=-1
dovecot[2997]: pop3(user2): Debug: Namespace : /_Data/Mailstores/100003/mboxes/inbox doesn't exist yet, using default permissions
dovecot[2997]: pop3(user2): Debug: Namespace : Using permissions from /_Data/Mailstores/100003/mboxes: mode=0700 gid=-1
dovecot[2997]: pop3(user2): Debug: Namespace : /_Data/Mailstores/100003/mboxes/inbox doesn't exist yet, using default permissions
dovecot[2997]: pop3(user2): Debug: Namespace : Using permissions from /_Data/Mailstores/100003/mboxes: mode=0700 gid=-1
With exactly the same config, imap connections are fully functional.
Does someone have an explanation?
TIA, Axel
On Fri, 2013-02-15 at 17:32 +0100, Axel Luttgens wrote:
a pop connection hangs forever after the pass command
$ telnet 127.0.0.1 110 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. +OK Dovecot ready. user user2 +OK pass pass2
and the log shows (here, when the user connects for the first time)
dovecot[2997]: pop3-login: Login: user=<user2>, method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, mpid=3113, secured, session=<GTOlMjfV3gB/AAAB> dovecot[2997]: pop3(user2): Debug: Effective uid=100003, gid=100003, home=/_Data/Mailstores/100003 dovecot[2997]: pop3(user2): Debug: Home dir not found: /_Data/Mailstores/100003 dovecot[2997]: pop3(user2): Debug: fs: root=/_Data/Mailstores/100003/mboxes, index=, control=, inbox=/_Data/Mailstores/100003/mboxes/inbox, alt= dovecot[2997]: pop3(user2): Debug: Namespace : /_Data/Mailstores/100003/mboxes doesn't exist yet, using default permissions dovecot[2997]: pop3(user2): Debug: Namespace : Using permissions from /_Data/Mailstores/100003/mboxes: mode=0700 gid=-1 dovecot[2997]: pop3(user2): Debug: Namespace : /_Data/Mailstores/100003/mboxes/inbox doesn't exist yet, using default permissions dovecot[2997]: pop3(user2): Debug: Namespace : Using permissions from /_Data/Mailstores/100003/mboxes: mode=0700 gid=-1 dovecot[2997]: pop3(user2): Debug: Namespace : /_Data/Mailstores/100003/mboxes/inbox doesn't exist yet, using default permissions dovecot[2997]: pop3(user2): Debug: Namespace : Using permissions from /_Data/Mailstores/100003/mboxes: mode=0700 gid=-1
With exactly the same config, imap connections are fully functional.
Does someone have an explanation?
a) dtruss the pop3 process to see which syscall hangs. Maybe this works:
service pop3 { executable = /usr/bin/dtruss /usr/local/libexec/dovecot/pop3 }
It logs to Dovecot's error log.
b) get gdb backtrace:
gdb -p <pid of pop3 process> bt full
Le 16 févr. 2013 à 17:47, Timo Sirainen a écrit :
[...] a) dtruss the pop3 process to see which syscall hangs. Maybe this works:
service pop3 { executable = /usr/bin/dtruss /usr/local/libexec/dovecot/pop3 }
It logs to Dovecot's error log.
Hello Timo,
As usual, once again, many thanks for your reply. :-)
I already tried the dtruss way on hanging pop3 processes, but without any valuable output: they were just sleeping, and dtruss came without any output.
Anyway, yes, above proposal proved functional. Interestingly. ;-)
I tried five consecutive pop logins (RUN 1 to 5) with the same user ("user1", 100002/100002), yet without being able to discern some conclusive pattern.
Just to be sure, I turned flocking off for two other subsequent logins (RUN 6 and 7). Again without any "Aha" effect on my side.
In all cases, I had to send a KILL signal to the hanging pop3 process, hence the last empty line in the log ("pop3: Error: ").
Outputs are in Pop3Hangs.txt.zip.
b) get gdb backtrace:
gdb -p <pid of pop3 process> bt full
I thus tried an eigth run, still with flocking off.
Output is in Pop3HangsGdb.txt.zip.
Wishing you all the best, Axel
On 17.2.2013, at 0.43, Axel Luttgens <AxelLuttgens@swing.be> wrote:
b) get gdb backtrace:
gdb -p <pid of pop3 process> bt full
I thus tried an eigth run, still with flocking off.
Output is in Pop3HangsGdb.txt.zip.
#21 0x00007fff8f57b086 in gethostbyname () #22 0x000000010b75571c in my_hostdomain () at hostpid.c:39
Where the code is:
hent = gethostbyname(my_hostname);
The gethostbyname() call is used to figure out the current system's domain. Why it would be hanging there, I don't really know. That call anyway isn't fully required, you could just replace it with "hent = NULL;"
Le 17 févr. 2013 à 05:47, Timo Sirainen a écrit :
[...] The gethostbyname() call is used to figure out the current system's domain. Why it would be hanging there, I don't really know. That call anyway isn't fully required, you could just replace it with "hent = NULL;"
Yes, looks like those gethostbyname() require some investigations here... I'll have a look and provide my "findings" here.
In the meantime, I don't really like the idea of messing with a general purpose function, my_hostdomain(), just to avoid a local problem... Wouldn't you have a better idea? ;-)
And, BTW, imap processes may hang as well, for the same underlying causes. It is just a matter of selecting INBOX (up to now, I just performed a login/logout sequence).
TIA, Axel
On Sun, 2013-02-17 at 12:24 +0100, Axel Luttgens wrote:
Le 17 févr. 2013 à 05:47, Timo Sirainen a écrit :
[...] The gethostbyname() call is used to figure out the current system's domain. Why it would be hanging there, I don't really know. That call anyway isn't fully required, you could just replace it with "hent = NULL;"
Yes, looks like those gethostbyname() require some investigations here... I'll have a look and provide my "findings" here.
In the meantime, I don't really like the idea of messing with a general purpose function, my_hostdomain(), just to avoid a local problem... Wouldn't you have a better idea? ;-)
Solution for v2.2: http://hg.dovecot.org/dovecot-2.2/rev/a6b40687c0a4
Le 18 févr. 2013 à 06:21, Timo Sirainen <tss@iki.fi> a écrit :
On Sun, 2013-02-17 at 12:24 +0100, Axel Luttgens wrote:
Le 17 févr. 2013 à 05:47, Timo Sirainen a écrit :
[...] The gethostbyname() call is used to figure out the current system's domain. Why it would be hanging there, I don't really know. That call anyway isn't fully required, you could just replace it with "hent = NULL;"
Yes, looks like those gethostbyname() require some investigations here... I'll have a look and provide my "findings" here.
Very modest findings; anyway, here they are, should they be of some help for nailing the causes down.
First, having the db query returning only a virtual uid (eg, 100002), thus making use of a system group set with mail_gid (e.g., 999), those hangs do not occur anymore.
On the other hand, this quick one:
#include <syslog.h>
#include <netdb.h>
#define NULL ((void *)0)
int main( int argc, const char * argv[])
{
struct hostent *hent;
hent = gethostbyname("ALMba.local");
if ( hent != NULL )
{
syslog(LOG_NOTICE|LOG_MAIL, "name returned by gethostbyname(): %s\n", hent->h_name);
}
return(0);
}
compiled as ghbn and with perms set to:
-rwsr-sr-x 1 100002 100002 8816 18 fév 09:08 ghbn
doesn't hang and yields:
[ASLMessageID 385162] [Time 1361175967] [TimeNanoSec 451292000] [Level 5] [PID 5616] [UID 100002] [GID 100002] [ReadGID 80] [Host ALMba.local] [Sender ghbn] [Facility mail] [Message name returned by gethostbyname(): almba.local]
It could thus be inferred that the problems encountered here are related to virtual gids only when setting up child processes from within Dovecot. Don't know whether this impacts those calls to gethostbyname() only, or if other side-effects may be expected as well.
Now, would all this be a Mac OS X thing only? Or are such problems liable to occur on other platforms as well?
In the meantime, I don't really like the idea of messing with a general purpose function, my_hostdomain(), just to avoid a local problem... Wouldn't you have a better idea? ;-)
Solution for v2.2: http://hg.dovecot.org/dovecot-2.2/rev/a6b40687c0a4
Wow, that's a good idea. :-) This should indeed circumvent those hangs, but also avoid many rather costly system calls.
It seems it could be easily back ported to 2.1.x. I'll try and report back here.
Many thanks, Axel
Le 18 févr. 2013 à 10:28, Axel Luttgens a écrit :
[...]
It seems it could be easily back ported to 2.1.x. I'll try and report back here.
Hello Timo, Please find hereafter a proposal based on 2.1.4. Tested here, no hangs anymore with virtual uids/gids. I'm just wondering about the call to hostpid_init() at line 364 of master/main.c, the one related to a config reload; since provisions seem to have been made for taking a possible hostname change into account, wouldn't it be needed to unset the environment variables there? Axel --- src/lib/hostpid.c.original 2013-02-18 12:46:25.000000000 +0100 +++ src/lib/hostpid.c 2013-02-18 17:18:32.000000000 +0100 @@ -3,9 +3,12 @@ #include "lib.h" #include "hostpid.h" +#include <stdlib.h> #include <unistd.h> #include <netdb.h> +#define HOSTNAME_DISALLOWED_CHARS "/\r\n\t" + const char *my_hostname = NULL; const char *my_pid = NULL; @@ -15,13 +18,19 @@ { static char hostname[256], pid[MAX_INT_STRLEN]; - if (gethostname(hostname, sizeof(hostname)-1) == -1) - i_strocpy(hostname, "unknown", sizeof(hostname)); - hostname[sizeof(hostname)-1] = '\0'; - my_hostname = hostname; + /* Children should find MY_HOSTNAME_ENV set by master process. */ + /* Only master process should have to call gethostname() once. */ + my_hostname = getenv(MY_HOSTNAME_ENV); + if (my_hostname == NULL) { + if (gethostname(hostname, sizeof(hostname)-1) < 0) + i_fatal("gethostname() failed: %m"); + hostname[sizeof(hostname)-1] = '\0'; + my_hostname = hostname; + } - if (strchr(hostname, '/') != NULL) - i_fatal("Invalid system hostname: %s", hostname); + if (my_hostname[0] == '\0' || + strcspn(my_hostname, HOSTNAME_DISALLOWED_CHARS) != strlen(my_hostname)) + i_error("Invalid system hostname: '%s'", my_hostname); /* allow calling hostpid_init() multiple times to reset hostname */ i_free_and_null(my_domain); @@ -35,12 +44,17 @@ struct hostent *hent; const char *name; + /* Children should find MY_HOSTDOMAIN_ENV set by master process. */ + /* Only master process should have to call gethostbyname() once. */ if (my_domain == NULL) { - hent = gethostbyname(my_hostname); - name = hent != NULL ? hent->h_name : NULL; + name = getenv(MY_HOSTDOMAIN_ENV); if (name == NULL) { - /* failed, use just the hostname */ - name = my_hostname; + hent = gethostbyname(my_hostname); + name = hent != NULL ? hent->h_name : NULL; + if (name == NULL) { + /* failed, use just the hostname */ + name = my_hostname; + } } my_domain = i_strdup(name); } --- src/lib/hostpid.h.original 2013-02-18 13:13:44.000000000 +0100 +++ src/lib/hostpid.h 2013-02-18 13:21:24.000000000 +0100 @@ -11,5 +11,10 @@ hostname. */ const char *my_hostdomain(void); +/* When set, these environments override above my_hostname and static my_domain. */ +/* Master process normally sets these to child processes. */ +#define MY_HOSTNAME_ENV "DOVECOT_HOSTNAME" +#define MY_HOSTDOMAIN_ENV "DOVECOT_HOSTDOMAIN" + #endif --- src/master/service-process.c.original 2013-02-18 13:24:23.000000000 +0100 +++ src/master/service-process.c 2013-02-18 13:39:15.000000000 +0100 @@ -234,6 +234,8 @@ service->set->service_count)); } env_put(t_strdup_printf(MASTER_UID_ENV"=%u", uid)); + env_put(t_strdup_printf(MY_HOSTNAME_ENV"=%s", my_hostname)); + env_put(t_strdup_printf(MY_HOSTDOMAIN_ENV"=%s", my_hostdomain())); if (!service->set->master_set->version_ignore) env_put(MASTER_DOVECOT_VERSION_ENV"="PACKAGE_VERSION);
Le 18 févr. 2013 à 17:41, Axel Luttgens a écrit :
Le 18 févr. 2013 à 10:28, Axel Luttgens a écrit :
[...]
It seems it could be easily back ported to 2.1.x. I'll try and report back here.
Hello Timo,
Please find hereafter a proposal based on 2.1.14. [...]
Oops... big mistake wrt getenv(); sorry. I also noticed that I inadvertently backported a new behavior (i_error in 2.2 vs i_fatal in 2.1.14). Axel --- src/lib/hostpid.c.original 2013-02-18 12:46:25.000000000 +0100 +++ src/lib/hostpid.c 2013-02-19 12:40:47.000000000 +0100 @@ -3,9 +3,12 @@ #include "lib.h" #include "hostpid.h" +#include <stdlib.h> #include <unistd.h> #include <netdb.h> +#define HOSTNAME_DISALLOWED_CHARS "/\r\n\t" + const char *my_hostname = NULL; const char *my_pid = NULL; @@ -14,14 +17,26 @@ void hostpid_init(void) { static char hostname[256], pid[MAX_INT_STRLEN]; + + const char * name; - if (gethostname(hostname, sizeof(hostname)-1) == -1) - i_strocpy(hostname, "unknown", sizeof(hostname)); - hostname[sizeof(hostname)-1] = '\0'; - my_hostname = hostname; + /* Children should find MY_HOSTNAME_ENV set by master process. */ + /* Only master process should have to call gethostname() once. */ + name = getenv(MY_HOSTNAME_ENV); + if (name == NULL) { + if (gethostname(hostname, sizeof(hostname)-1) < 0) + i_fatal("gethostname() failed: %m"); + hostname[sizeof(hostname)-1] = '\0'; + my_hostname = hostname; + } + else{ + i_strocpy(hostname, name, sizeof(hostname)); + my_hostname = hostname; + } - if (strchr(hostname, '/') != NULL) - i_fatal("Invalid system hostname: %s", hostname); + if (my_hostname[0] == '\0' || + strcspn(my_hostname, HOSTNAME_DISALLOWED_CHARS) != strlen(my_hostname)) + i_fatal("Invalid system hostname: '%s'", my_hostname); /* allow calling hostpid_init() multiple times to reset hostname */ i_free_and_null(my_domain); @@ -35,12 +50,17 @@ struct hostent *hent; const char *name; + /* Children should find MY_HOSTDOMAIN_ENV set by master process. */ + /* Only master process should have to call gethostbyname() once. */ if (my_domain == NULL) { - hent = gethostbyname(my_hostname); - name = hent != NULL ? hent->h_name : NULL; + name = getenv(MY_HOSTDOMAIN_ENV); if (name == NULL) { - /* failed, use just the hostname */ - name = my_hostname; + hent = gethostbyname(my_hostname); + name = hent != NULL ? hent->h_name : NULL; + if (name == NULL) { + /* failed, use just the hostname */ + name = my_hostname; + } } my_domain = i_strdup(name); } --- src/lib/hostpid.h.original 2013-02-18 13:13:44.000000000 +0100 +++ src/lib/hostpid.h 2013-02-18 13:21:24.000000000 +0100 @@ -11,5 +11,10 @@ hostname. */ const char *my_hostdomain(void); +/* When set, these environments override above my_hostname and static my_domain. */ +/* Master process normally sets these to child processes. */ +#define MY_HOSTNAME_ENV "DOVECOT_HOSTNAME" +#define MY_HOSTDOMAIN_ENV "DOVECOT_HOSTDOMAIN" + #endif --- src/master/service-process.c.original 2013-02-18 13:24:23.000000000 +0100 +++ src/master/service-process.c 2013-02-18 13:39:15.000000000 +0100 @@ -234,6 +234,8 @@ service->set->service_count)); } env_put(t_strdup_printf(MASTER_UID_ENV"=%u", uid)); + env_put(t_strdup_printf(MY_HOSTNAME_ENV"=%s", my_hostname)); + env_put(t_strdup_printf(MY_HOSTDOMAIN_ENV"=%s", my_hostdomain())); if (!service->set->master_set->version_ignore) env_put(MASTER_DOVECOT_VERSION_ENV"="PACKAGE_VERSION);
On 19.2.2013, at 15.24, Axel Luttgens <AxelLuttgens@swing.be> wrote:
Please find hereafter a proposal based on 2.1.14. [...]
Oops... big mistake wrt getenv(); sorry. I also noticed that I inadvertently backported a new behavior (i_error in 2.2 vs i_fatal in 2.1.14).
Oops, thanks for noticing :) The i_error() change wasn't intentional.
Anyway, your changes need also the patch that prevents it from breaking: http://hg.dovecot.org/dovecot-2.2/rev/19e4448dee4f
Le 19 févr. 2013 à 14:54, Timo Sirainen a écrit :
[...]
Oops, thanks for noticing :) The i_error() change wasn't intentional.
Well, glad to have been of some (very, very little) help. ;-)
Anyway, your changes need also the patch that prevents it from breaking: http://hg.dovecot.org/dovecot-2.2/rev/19e4448dee4f
Trying to touch as little as possible the 2.1.x code, I kept the idea of a static storage for local array hostname in hostpid_init(), and thus just copied there the value fetched from the environment, if any. Isn't such a handling sufficient? Or worse, am I erring?
Axel
On 19.2.2013, at 17.45, Axel Luttgens <AxelLuttgens@swing.be> wrote:
Anyway, your changes need also the patch that prevents it from breaking: http://hg.dovecot.org/dovecot-2.2/rev/19e4448dee4f
Trying to touch as little as possible the 2.1.x code, I kept the idea of a static storage for local array hostname in hostpid_init(), and thus just copied there the value fetched from the environment, if any. Isn't such a handling sufficient? Or worse, am I erring?
Ah, yes, looks like your version works too.
participants (2)
-
Axel Luttgens
-
Timo Sirainen