[Dovecot] 0.99.10.x auth memory leak?
Hello,
running 0.99.10.6 (but seen it before that, too). Also (quite obviously) running 5 auth processes, given the fact that this is a pretty busy box and I didn't want to introduce any artificial bottlenecks. Alas with them eating up half of the free memory (which would go to a much better use as FS cache) I'm getting sorta concerned. If it's not a leak, it's caching something rather needlessly and inefficiently (the LDAP DB memory footprint for ALL users is less than this and the box below just serves half of those). This box sees about 0.5 million POP3/IMAP logins/day.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 31235 root 16 0 220m 216m 5560 S 0.0 10.7 25:01.54 dovecot-auth 31234 root 16 0 205m 202m 5560 S 0.0 10.0 24:08.84 dovecot-auth 31231 root 16 0 200m 196m 5560 S 0.7 9.7 23:25.37 dovecot-auth 31232 root 16 0 196m 192m 5560 S 0.0 9.5 23:10.44 dovecot-auth 31233 root 15 0 179m 175m 5560 S 0.3 8.6 22:13.07 dovecot-auth
So, I guess my questions to Timo are:
Think it's leaky and any idea where?
Given the load, would a single auth process be a bad idea? (it is a quite fast dual opteron box)
Regards,
Christian Balzer
Christian Balzer Network/Systems Engineer NOC chibi@gol.com Global OnLine Japan/Fusion Network Services http://www.gol.com/
On 22.7.2004, at 06:01, Christian Balzer wrote:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 31235 root 16 0 220m 216m 5560 S 0.0 10.7 25:01.54 dovecot-auth 31234 root 16 0 205m 202m 5560 S 0.0 10.0 24:08.84 dovecot-auth 31231 root 16 0 200m 196m 5560 S 0.7 9.7 23:25.37 dovecot-auth 31232 root 16 0 196m 192m 5560 S 0.0 9.5 23:10.44 dovecot-auth 31233 root 15 0 179m 175m 5560 S 0.3 8.6 22:13.07 dovecot-auth
So, I guess my questions to Timo are:
Think it's leaky and any idea where?
I didn't see any obvious leaks in the code. 1.0-test's dovecot-auth can be easily run standalon, so it's easier to check for leaks with it. I'll try to setup LDAP server and see if I can find any.
How soon does the memory go that high up? Do you restart the processes manually? Do they stay in around 200MB by themselves, or only because max. auth process size is 256MB (by default) and they restart themselves when they reach it (log should have "out of memory" errors)?
Given the load, would a single auth process be a bad idea? (it is a quite fast dual opteron box)
In that process list they were taking less than 1% CPU, so reducing them shouldn't make it slower. But I'd still leave two just in case one of them gets stuck for some reason.
Timo wrote:
On 22.7.2004, at 06:01, Christian Balzer wrote:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 31235 root 16 0 220m 216m 5560 S 0.0 10.7 25:01.54 dovecot-auth 31234 root 16 0 205m 202m 5560 S 0.0 10.0 24:08.84 dovecot-auth 31231 root 16 0 200m 196m 5560 S 0.7 9.7 23:25.37 dovecot-auth 31232 root 16 0 196m 192m 5560 S 0.0 9.5 23:10.44 dovecot-auth 31233 root 15 0 179m 175m 5560 S 0.3 8.6 22:13.07 dovecot-auth
So, I guess my questions to Timo are:
Think it's leaky and any idea where?
I didn't see any obvious leaks in the code. 1.0-test's dovecot-auth can be easily run standalon, so it's easier to check for leaks with it. I'll try to setup LDAP server and see if I can find any.
Indeed, this is of course running against a LDAP server.
How soon does the memory go that high up? It seems to grow gradually, a visible but not dramatic growth over time. This was after about 2 weeks of non-stop operation.
Do you restart the processes manually? I did today, the last time was due to the update to .6. After about 10 hours they have now grown by roughly 6MB respectively...
Do they stay in around 200MB by themselves, or only because max. auth process size is 256MB (by default) and they restart themselves when they reach it (log should have "out of memory" errors)?
Did not run them long enough to run into that barrier, but they were still growing, so most likely this would have happened eventually. And I'm "afraid" (read: eagerly awaiting it's arrival :) that the .7 Debian package will arrive in Woody before they have grown to that size again. ;)
Given the load, would a single auth process be a bad idea? (it is a quite fast dual opteron box)
In that process list they were taking less than 1% CPU, so reducing them shouldn't make it slower. But I'd still leave two just in case one of them gets stuck for some reason.
Well, this would be just a work-around anyway. So for the time being I'll leave things as they are in the hopes that by the next time things grow that large a real solution has surfaced. :)
Regards,
Christian Balzer
Christian Balzer Network/Systems Engineer NOC chibi@gol.com Global OnLine Japan/Fusion Network Services http://www.gol.com/
Timo Sirainen tss@iki.fi writes:
I didn't see any obvious leaks in the code. 1.0-test's dovecot-auth can be easily run standalon, so it's easier to check for leaks with it. I'll try to setup LDAP server and see if I can find any.
If you have a reasonably fast i386/Linux machine at hand, try running the process under "valgrind" supervision with leak checks enabled on the command line, see http://valgrind.kde.org/ (don't worry, it's a console application).
-- Matthias Andree
Encrypted mail welcome: my GnuPG key ID is 0x052E7D95 (PGP/MIME preferred)
Hello,
just as a follow-up to the original report, it's still leaking happily and thus I did set the memory limit to 64MB for the auth processes.
Just now two did hit that limit, got killed and re-spawned, so that solves it for the time being:
Aug 25 14:41:45 mb01 out of memory [1171Øùÿ¿out ] Aug 25 14:47:16 mb01 dovecot-auth: block_alloc(): Out of memory Aug 25 14:47:16 mb01 dovecot: child 1171 (auth) killed with signal 6 Aug 25 14:50:53 mb01 out of memory [1170Øúÿ¿out ] Aug 25 14:50:53 mb01 dovecot: child 1170 (auth) killed with signal 6
Of course a real fix would still be very much appreciated.
Regards,
Christian Balzer
Christian Balzer Network/Systems Engineer NOC chibi@gol.com Global OnLine Japan/Fusion Network Services http://www.gol.com/
And one more update.
The auth processes (5 of them) reached their 64MB limit (starting from about 8MB) after 6:20 hours of CPU time and roughly a week of real time. Using these values and the fact that there are about 510000 logins per day and server we wind up pretty precisely at 3 lost bytes per login. Maybe that's a sufficient hint to find what's leaky. ;)
Regards,
Christian balzer
Christian Balzer Network/Systems Engineer NOC chibi@gol.com Global OnLine Japan/Fusion Network Services http://www.gol.com/
participants (3)
-
Christian Balzer
-
Matthias Andree
-
Timo Sirainen