On Wed, 27 Jun 2007 23:15:32 +0300 Timo Sirainen <tss@iki.fi> wrote:
On Thu, 2007-06-21 at 16:49 +0900, Christian Balzer wrote:
You could try http://dovecot.org/patches/debug/mempool-accounting.diff and send USR1 signal to dovecot-auth after a while. It logs how much memory is used by all existing memory pools. Each auth request has its own pool, so if it's really leaking them it's probably logging a lot of lines. If not, then the leak is elsewhere.
I grabbed the Debian package source on a test machine (not gonna chance anything on the production servers), applied the patch, did add --enable-debug to the debian/rules file (and got the #define DEBUG in config.h), created the binary packages, installed, configured, started them, tested a few logins and... nothing gets logged in mail.* if I send a USR1 to dovecot-auth. Anything I'm missing?
Bug, fixed: http://hg.dovecot.org/dovecot-1.0/rev/a098e94cd318
Thanks, that fixed the silence of the auth-sheep.
This is the output after start-up:
Jul 2 13:59:54 engtest03 dovecot: auth(default): pool auth request handler: 104 / 4080 bytes Jul 2 13:59:54 engtest03 last message repeated 19 times Jul 2 13:59:54 engtest03 dovecot: auth(default): pool passwd_file: 56 / 10224 bytes Jul 2 13:59:54 engtest03 dovecot: auth(default): pool Environment: 224 / 2032 bytes Jul 2 13:59:54 engtest03 dovecot: auth(default): pool ldap_connection: 576 / 1008 bytes Jul 2 13:59:54 engtest03 dovecot: auth(default): pool auth: 1520 / 2032 bytes
Used memory of dovecot-auth after 1 login was 3148KB(RSS).
This is after a good trashing with rabid (from the postal package), with just 2 users though, using POP3 logins:
Jul 2 14:12:30 engtest03 dovecot: auth(default): pool auth request handler: 104 / 4080 bytes Jul 2 14:12:30 engtest03 last message repeated 128 times Jul 2 14:12:30 engtest03 dovecot: auth(default): pool passwd_file: 56 / 10224 bytes Jul 2 14:12:30 engtest03 dovecot: auth(default): pool Environment: 224 / 2032 bytes Jul 2 14:12:30 engtest03 dovecot: auth(default): pool ldap_connection: 576 / 1008 bytes Jul 2 14:12:30 engtest03 dovecot: auth(default): pool auth: 1520 / 2032 bytes
Note that the amount of auth request handler pools have grown to 128. After another short round of rabid the handler pools grew to 137 and the size of dovecot-auth to 5100KB. The number of handler pools never fell, nor did the memory footprint, obviously. :-p
At about 800k logins/day/node here it's obvious now why dovecot-auth explodes after less than a week with max size of 512MB.
But no matter, it is clearly leaking just as bad as 0.99 and I venture that his is the largest installation with LDAP as authentication backend. I wonder if this leak would be avoided by having LDAP lookups performed by worker processes as with SQL.
Then you'd only have multiple leaking worker processes.
Yes, I realize that. But I presume since these are designed to die off and be recreated on the fly the repercussions would be much better. ;) Of course now it looks like this is not LDAP related after all.
The same as 0.99. You could also kill -HUP dovecot when dovecot-auth is nearing the limit. That makes it a bit nicer, although not perfectly safe either (should fix this some day..).
If that leak can't be found I would very much appreciate a solution that at least avoids failed and/or delayed logins.
That would require that login processes don't fail logins if connection to dovecot-auth drops, but instead wait until they can connect back to it and try again. And maybe another alternative would be to just disconnect the client instead of giving login failure.
Anything that fixes this one way or the other would be nice. ^_^
Oh and HUP'ing the master is not an option here, I guess the system load triggers a race condition in dovecot because several times when doing so I got this:
Jun 22 15:08:58 mb11 dovecot: listen(143) failed: Interrupted system call
Which results in a killed off dovecot, including all active sessions.
The self terminating dovecot-auth is not nice, but at least more predictable and does recover by itself:
Jun 30 19:03:27 mb12 dovecot: auth(default): pool_system_malloc(): Out of memory Jun 30 19:03:27 mb12 dovecot: child 11110 (auth) returned error 83 (Out of memory) Jun 30 19:03:28 mb12 dovecot: pop3-login: Can't connect to auth server at default: Resource temporarily unavailable Jun 30 19:03:28 mb12 last message repeated 11 times
Of course the 12 users that tried to log in at this time are probably not amused or at least confused.
Regards,
Christian
Christian Balzer Network/Systems Engineer NOC chibi@gol.com Global OnLine Japan/Fusion Network Services http://www.gol.com/