[Dovecot] Hanging IMAP sessions on Mac OS X with dovecot 2.1.10 - worked fine with 2.0.15
Ok, here's a toughie:
Out of a whim (and because of the bad weather) I today decided to upgrade my completely functioning 2.0.15 installation on my Mac OS X 10.6.7 system. NB: It's not a Mac OS X "Server", as sold by Apple - I have compiled my dovecots myself for quite a while.
Anyhow: At first everything appeared to work fine after the upgrade. Up until I created a new user and -while testing- SELECTed his INBOX. The SELECT was "stuck". Though the process seemed to be alive I could only get rid of it with a "kill -9 <pid>".
The same happened when selecting a newly created mailbox on any of the other (otherwise functioning) accounts, so the problem had nothing to do with the new account, but rather with the fact, that it's INBOX was empty.
Using dtruss and gdb I found out, that the dovecot process was trying to obtain a GUID and in the course of doing so invoked gethostbyname(), which intern caused a lot of "mach message" handling and somewhere, deep down there, the process was stuck.
Well, that led my to believe, that there was something wrong with the -so called- "mach bootstrap context". I usually start dovecot from with a (home-brewn) startup-script, which invokes it (practically) like so:
sudo /usr/libexec/StartupItemContext dovecot
(again: all this was working fine under 2.0.15)
Now with 2.1.10, when I manually invoke dovecot with just
sudo dovecot
Everything appears to work fine - at least the sessions don't get stuck any more. But as soon as I logout (with dovecot still running in the background) it loses it's "mach bootstrap context" and finds itself unable to perform even the simplest tasks, like mapping a username to a uid, etc. pp. -- so starting it without the "/usr/libexec/StartupItemContext" in the background is out of the question.
Now -with all that said- here's my question:
What has changed with regards to "processual context" between 2.0.15 and 2.1.10 when the "imap" process is spawned/exec'd? Any environmental cleanups, closing of unknown fds, deletion/modification of environment variables, process-group-handlers, etc.?
It appears, that the imap process no longer "inherits" the StartupItemContext from the main process, so some change between 2.0.15 and 2.1.10 must have broken it ...
Any help is highly appreciated -
Clemens
PS: I google'd around a lot and searched the mailing-lists, of course. I only found a post of someone who ran into the same/similar problem dating back to Feb 21st 2012 under the subject "dovecot freezes when trying to get mail from maildir with mail", but it was quickly dismissed without ever getting resolved and that was that.
PS2: I intentionally didn't include any configs with this mail as they seem to be irrelevant, but of course I can generate the necessary output if needed.
On 22.9.2012, at 19.11, Clemens Schrimpe wrote:
Well, that led my to believe, that there was something wrong with the -so called- "mach bootstrap context". I usually start dovecot from with a (home-brewn) startup-script, which invokes it (practically) like so:
sudo /usr/libexec/StartupItemContext dovecot
(again: all this was working fine under 2.0.15)
Now with 2.1.10, when I manually invoke dovecot with just
sudo dovecot
Everything appears to work fine - at least the sessions don't get stuck any more. But as soon as I logout (with dovecot still running in the background) it loses it's "mach bootstrap context" and finds itself unable to perform even the simplest tasks, like mapping a username to a uid, etc. pp. -- so starting it without the "/usr/libexec/StartupItemContext" in the background is out of the question.
I don't really know about mach contexts or how they're supposd to work..
What has changed with regards to "processual context" between 2.0.15 and 2.1.10 when the "imap" process is spawned/exec'd? Any environmental cleanups, closing of unknown fds, deletion/modification of environment variables, process-group-handlers, etc.?
Not much I think. I just looked at the diff between 2.0 and 2.1 and don't really see anything I could blame. If you have time you could try bisecting with mercurial (basically try different versions from hg) to isolate the change that broke it. I don't really have time to debug this..
participants (2)
-
Clemens Schrimpe
-
Timo Sirainen