[Dovecot] Doveadm director flush/remove
I've got a couple more issues with the doveadm director interface:
- If I use "doveadm director remove" to disable a host with active users, the director seems to lose track of users mapped to that host. I guess I would expect it to tear down any active sessions by killing the login proxies, like I'd done 'doveadm direct add HOSTNAME 0 && doveadm director flush HOSTNAME' before removing it? Here's what I see with an active open connection:
[root@cc-popmap7 ~]# doveadm director status brandond Current: 10.142.0.179 (expires 2010-07-14 01:26:14) Hashed: 10.142.0.179 Initial config: 10.142.0.161 [root@cc-popmap7 ~]# doveadm director remove 10.142.0.179 [root@cc-popmap7 ~]# doveadm director status brandond Current: not assigned Hashed: 10.142.0.174 Initial config: 10.142.0.161
- "doveadm director flush" returns the wrong usage:
[root@cc-popmap7 ~]# doveadm director flush doveadm director remove [-a <director socket path>] <host>
- "doveadm director flush" all breaks the ring:
[root@cc-popmap7 ~]# doveadm director flush all Jul 14 01:26:33 cc-popmap7 dovecot: director: Error: Director 10.142.0.180:1234/right disconnected Jul 14 01:26:33 cc-popmap7 dovecot: director: Error: Director 10.142.0.180:1234/left disconnected Jul 14 01:26:33 oh-popmap7 dovecot: director: Error: director(10.142.0.162:1234/left): Invalid HOST-FLUSH args Jul 14 01:26:33 oh-popmap7 dovecot: director: Error: director(10.142.0.162:1234/right): Invalid HOST-FLUSH args
For some reason, flushing a host address only disconnects one side:
[root@cc-popmap7 ~]# doveadm director flush 10.142.0.160 Jul 14 01:28:23 cc-popmap7 dovecot: director: Error: Director 10.142.0.180:1234/right disconnected Jul 14 01:28:23 oh-popmap7 dovecot: director: Error: director(10.142.0.162:1234/left): Invalid HOST-FLUSH args
-Brad
On Wed, 2010-07-14 at 01:39 -0700, Brandon Davidson wrote:
I've got a couple more issues with the doveadm director interface:
- If I use "doveadm director remove" to disable a host with active users, the director seems to lose track of users mapped to that host.
Yes, that's what it was intended to do.
I guess I would expect it to tear down any active sessions by killing the login proxies,
Hmh.. I guess that would be nice, but also a bit annoying to do. It would require each login process to have a connection to director process, and currently there's no such connection (except for the notify fifo, but that's wrong way).
like I'd done 'doveadm direct add HOSTNAME 0 && doveadm director flush HOSTNAME' before removing it?
But that does almost the same thing as remove.
- "doveadm director flush" returns the wrong usage:
Fixed.
- "doveadm director flush" all breaks the ring:
Fixed.
Timo,
-----Original Message----- From: Timo Sirainen [mailto:tss@iki.fi]
Yes, that's what it was intended to do.
OK. I guess I had figured that removing it from the director would also kill any active proxy sessions, but that's obviously not the case.. it just removes the host from the list and any mappings from the hash.
Hmh.. I guess that would be nice, but also a bit annoying to do. It would require each login process to have a connection to director process, and currently there's no such connection (except for the notify fifo, but that's wrong way).
Maybe something as simple as killing any login proxies that are talking to the selected backend, or are proxying for users that are mapped to the selected backends? Or maybe the Directors don't know enough to do that? I'm thinking like 'doveadm kick' for proxy connections, since who/kick doesn't work on the Director, just backends.
While I'm making a wishlist... 'doveadm director status <ipaddr>' to show list of users mapped to a host? Or maybe just 'doveadm director status -v' to show list of users instead of just user count.
like I'd done 'doveadm direct add HOSTNAME 0 && doveadm director flush HOSTNAME' before removing it?
But that does almost the same thing as remove.
You're right. I see that FLUSH just does the 'remove any mappings' bit that REMOVE does, and ADD with a 0 count is effectively the same as removing it from the list. For some reason I was thinking of this as 'flush out (kill) any active proxy sessions'.
-Brad
On 14.7.2010, at 23.00, Brad Davidson wrote:
Hmh.. I guess that would be nice, but also a bit annoying to do. It would require each login process to have a connection to director process, and currently there's no such connection (except for the notify fifo, but that's wrong way).
Maybe something as simple as killing any login proxies that are talking to the selected backend, or are proxying for users that are mapped to the selected backends? Or maybe the Directors don't know enough to do that?
They don't know enough, and probably won't even haver permissions. In any case probably most login processes have connections to all servers (assuming you don't have *that* many of them), so it would be pretty much equivalent to "killall imap-login".
I'm thinking like 'doveadm kick' for proxy connections, since who/kick doesn't work on the Director, just backends.
doveadm kick is also a bit evil way to do it when using multiple connections per imap process.
While I'm making a wishlist... 'doveadm director status <ipaddr>' to show list of users mapped to a host? Or maybe just 'doveadm director status -v' to show list of users instead of just user count.
Not really possible, because director doesn't keep track of usernames, only the hashes of usernames. And a hash list probably wouldn't be very useful.
I suppose it could get a list of all users and then list all users whose hash matches what director has.. Hmm. I guess that would be usable too, yes. :)
On Wed, 2010-07-14 at 23:36 +0100, Timo Sirainen wrote:
While I'm making a wishlist... 'doveadm director status <ipaddr>' to show list of users mapped to a host? Or maybe just 'doveadm director status -v' to show list of users instead of just user count.
I suppose it could get a list of all users and then list all users whose hash matches what director has.. Hmm. I guess that would be usable too, yes. :)
See if this works: http://hg.dovecot.org/dovecot-2.0/rev/4138737f41e6
Timo,
-----Original Message----- From: Timo Sirainen [mailto:tss@iki.fi]
I suppose it could get a list of all users and then list all users whose hash matches what director has.. Hmm. I guess that would be usable too, yes. :)
See if this works: http://hg.dovecot.org/dovecot-2.0/rev/4138737f41e6
I get:
[root@cc-popmap7 ~]# doveadm director map doveadm(root): Error: User listing returned failure doveadm(root): Error: user listing failed
Our environment might be a little weird. We're using LDAP accounts via pam_ldap on the backend servers, so they are essentially local accounts (not virtual). I'm using passthrough auth with NOPASSWORD in the director proxy query. The accounts are also available on the directors, but since there's about 45k of them it would take quite a while to iterate and test hashes for all of them, if that's what it's trying to do.
Sounds like this might just not work in our environment due to the size of our account base. I do appreciate very much that you added the feature though!
-Brad
On Thu, 2010-07-15 at 13:15 -0700, Brad Davidson wrote:
[root@cc-popmap7 ~]# doveadm director map doveadm(root): Error: User listing returned failure
See what it says in logs.
Our environment might be a little weird. We're using LDAP accounts via pam_ldap on the backend servers, so they are essentially local accounts (not virtual). I'm using passthrough auth with NOPASSWORD in the director proxy query. The accounts are also available on the directors, but since there's about 45k of them it would take quite a while to iterate and test hashes for all of them, if that's what it's trying to do.
Are you using userdb passwd or userdb ldap? With userdb ldap you need to configure iterate_attrs and iterate_filter in your LDAP config. With passwd I think it should work directly..
On Thu, 2010-07-15 at 21:37 +0100, Timo Sirainen wrote:
On Thu, 2010-07-15 at 13:15 -0700, Brad Davidson wrote:
[root@cc-popmap7 ~]# doveadm director map doveadm(root): Error: User listing returned failure
See what it says in logs.
Oh, probably that you don't have a userdb at all.
Our environment might be a little weird. We're using LDAP accounts via pam_ldap on the backend servers, so they are essentially local accounts (not virtual). I'm using passthrough auth with NOPASSWORD in the director proxy query. The accounts are also available on the directors, but since there's about 45k of them it would take quite a while to iterate and test hashes for all of them, if that's what it's trying to do.
I doubt it's that slow. I think the speed only depends on how fast it can iterate through all the users in userdb. 45k accounts probably won't take that long.. I'm more concerned about setups with 100 times that :)
Timo,
-----Original Message----- From: Timo Sirainen [mailto:tss@iki.fi]
See what it says in logs.
It times out after a minute:
[root@cc-popmap7 ~]# time doveadm director map doveadm(root): Error: User listing returned failure doveadm(root): Error: user listing failed
real 1m0.028s user 0m0.088s sys 0m0.072s
Jul 15 13:46:24 cc-popmap7 dovecot: auth: Error: auth worker: Aborted request: Lookup timed out Jul 15 13:53:25 cc-popmap7 dovecot: auth: Error: getpwent() failed: No such file or directory
Are you using userdb passwd or userdb ldap? With userdb ldap you need to configure iterate_attrs and iterate_filter in your LDAP config. With passwd I think it should work directly..
userdb passwd. Our LDAP directory might not be optimally configured. The group that administers it only really cares about binds, iteration can be rather slow:
[root@cc-popmap7 ~]# time getent passwd | wc -l 51552
real 8m0.120s user 0m2.507s sys 0m1.093s
That comes out to just over 100 entries a second.
-Brad
On 15.7.2010, at 22.00, Brad Davidson wrote:
It times out after a minute:
[root@cc-popmap7 ~]# time doveadm director map doveadm(root): Error: User listing returned failure doveadm(root): Error: user listing failed
real 1m0.028s user 0m0.088s sys 0m0.072s
Jul 15 13:46:24 cc-popmap7 dovecot: auth: Error: auth worker: Aborted request: Lookup timed out Jul 15 13:53:25 cc-popmap7 dovecot: auth: Error: getpwent() failed: No such file or directory
I suppose it shouldn't abort if the query keep sending data.. I'll see if I can get this fixed.
userdb passwd. Our LDAP directory might not be optimally configured. The group that administers it only really cares about binds, iteration can be rather slow:
[root@cc-popmap7 ~]# time getent passwd | wc -l 51552
real 8m0.120s user 0m2.507s sys 0m1.093s
That comes out to just over 100 entries a second.
Maybe there could be a parameter to get the user list from a file (one username per line) instead of userdb.
On 15.7.2010, at 23.58, Timo Sirainen wrote:
userdb passwd. Our LDAP directory might not be optimally configured. The group that administers it only really cares about binds, iteration can be rather slow:
[root@cc-popmap7 ~]# time getent passwd | wc -l 51552
real 8m0.120s user 0m2.507s sys 0m1.093s
That comes out to just over 100 entries a second.
Maybe there could be a parameter to get the user list from a file (one username per line) instead of userdb.
Added -f parameter for this.
Timo,
On 7/15/10 4:12 PM, "Timo Sirainen" <tss@iki.fi> wrote:
Maybe there could be a parameter to get the user list from a file (one username per line) instead of userdb.
Added -f parameter for this.
Awesome! I dumped a userlist (one username per line) which it seems to read through quite quickly, unfortunately I get...
[root@cc-popmap7 ~]# doveadm director map -f userlist.txt Segmentation fault
(lots of pread/mmap snipped) pread(9, "user0\nuser1\nuser2\nuser3\nuser4"..., 8189, 393042) = 8189 mmap(NULL, 2101248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2acbc3aae000 pread(9, "user5\nuser6\nuser7\nuser8\nuser9"..., 8188, 401231) = 36 pread(9, "", 8152, 401267) = 0 --- SIGSEGV (Segmentation fault) @ 0 (0) ---
#1 0x00002b55977ec1d0 in auth_connection_close () from /usr/lib64/dovecot/libdovecot.so.0 #2 0x00002b55977ec258 in auth_master_deinit () from /usr/lib64/dovecot/libdovecot.so.0 #3 0x000000000040a059 in user_file_get_user_list () #4 0x000000000040a22f in cmd_director_map () #5 0x000000000040897d in doveadm_try_run_multi_word () #6 0x0000000000408aab in doveadm_try_run () #7 0x0000000000408e0f in main ()
-Brad
On 15.7.2010, at 23.58, Timo Sirainen wrote:
On 15.7.2010, at 22.00, Brad Davidson wrote:
It times out after a minute:
[root@cc-popmap7 ~]# time doveadm director map doveadm(root): Error: User listing returned failure doveadm(root): Error: user listing failed
real 1m0.028s user 0m0.088s sys 0m0.072s
Jul 15 13:46:24 cc-popmap7 dovecot: auth: Error: auth worker: Aborted request: Lookup timed out Jul 15 13:53:25 cc-popmap7 dovecot: auth: Error: getpwent() failed: No such file or directory
I suppose it shouldn't abort if the query keep sending data.. I'll see if I can get this fixed.
Also see if http://hg.dovecot.org/dovecot-2.0/rev/d13c1043096e fixes this or if there are other timeouts?
Dear All,
Can any one help me to give some clarification on LIST command
(IMAP4ev1), actually I want to use a single INBOX in server and and supporting CREATE command i.e. not a single user can make any hierarchy I mean the inbox is not editable it will be fixed. So in this case on LIST command I think server will replay only with hierarchy delaminater and inbox name only. Am I right in this case.
Regards, darshan
On 16.7.2010, at 8.32, Darshan Prajapati wrote:
Can any one help me to give some clarification on LIST command (IMAP4ev1), actually I want to use a single INBOX in server and and supporting CREATE command i.e. not a single user can make any hierarchy I mean the inbox is not editable it will be fixed. So in this case on LIST command I think server will replay only with hierarchy delaminater and inbox name only. Am I right in this case.
Well, that was a confusing explanation. I've no idea what you want. Try again with some examples or something..
BTW. Don't post new messages by replying to an existing message. In-Reply-To: header is preserved and it messes up threading.
Timo,
On 7/15/10 4:18 PM, "Timo Sirainen" <tss@iki.fi> wrote:
Jul 15 13:46:24 cc-popmap7 dovecot: auth: Error: auth worker: Aborted request: Lookup timed out Jul 15 13:53:25 cc-popmap7 dovecot: auth: Error: getpwent() failed: No such file or directory
Also see if http://hg.dovecot.org/dovecot-2.0/rev/d13c1043096e fixes this or if there are other timeouts?
Now I get:
Jul 16 01:50:44 cc-popmap7 dovecot: auth: Error: auth worker: Aborted request: Lookup timed out Jul 16 01:50:44 cc-popmap7 dovecot: master: Error: service(auth): child 1607 killed with signal 11 (core dumps disabled)
Should I try to grab a core, or do you have a good idea where this is coming from? Seems suspiciously similar to the crash with '-f userlist'.
-Brad
On 16.7.2010, at 9.58, Brandon Davidson wrote:
Also see if http://hg.dovecot.org/dovecot-2.0/rev/d13c1043096e fixes this or if there are other timeouts?
Now I get:
Jul 16 01:50:44 cc-popmap7 dovecot: auth: Error: auth worker: Aborted request: Lookup timed out Jul 16 01:50:44 cc-popmap7 dovecot: master: Error: service(auth): child 1607 killed with signal 11 (core dumps disabled)
Should I try to grab a core, or do you have a good idea where this is coming from?
I don't think that above change should have caused any crashes, so backtrace would be nice.
Timo,
On 7/16/10 4:23 AM, "Timo Sirainen" <tss@iki.fi> wrote:
Jul 16 01:50:44 cc-popmap7 dovecot: auth: Error: auth worker: Aborted request: Lookup timed out Jul 16 01:50:44 cc-popmap7 dovecot: master: Error: service(auth): child 1607 killed with signal 11 (core dumps disabled)
I don't think that above change should have caused any crashes, so backtrace would be nice.
Here's a stack trace. Standard null function pointer. No locals, I think I'd have to recompile to get additional information.
#0 0x0000000000000000 in ?? () #1 0x0000000000415a71 in auth_worker_destroy () #2 0x0000000000415416 in auth_worker_call_timeout () #3 0x00000038b3e5273d in io_loop_handle_timeouts_real () from /usr/lib64/dovecot/libdovecot.so.0 #4 0x00000038b3e52797 in io_loop_handle_timeouts () from /usr/lib64/dovecot/libdovecot.so.0 #5 0x00000038b3e53958 in io_loop_handler_run () from /usr/lib64/dovecot/libdovecot.so.0 #6 0x00000038b3e527dd in io_loop_run () from /usr/lib64/dovecot/libdovecot.so.0 #7 0x00000038b3e3b926 in master_service_run () from /usr/lib64/dovecot/libdovecot.so.0 #8 0x00000000004184b1 in main ()
-Brad
On 17.7.2010, at 9.05, Brandon Davidson wrote:
Here's a stack trace. Standard null function pointer. No locals, I think I'd have to recompile to get additional information.
#0 0x0000000000000000 in ?? () #1 0x0000000000415a71 in auth_worker_destroy () #2 0x0000000000415416 in auth_worker_call_timeout ()
Maybe this fixes it: http://hg.dovecot.org/dovecot-2.0/rev/cfd15170dff7
Timo,
On 7/17/10 11:06 AM, "Timo Sirainen" <tss@iki.fi> wrote:
Here's a stack trace. Standard null function pointer. No locals, I think I'd have to recompile to get additional information.
#0 0x0000000000000000 in ?? () #1 0x0000000000415a71 in auth_worker_destroy () #2 0x0000000000415416 in auth_worker_call_timeout ()
Maybe this fixes it: http://hg.dovecot.org/dovecot-2.0/rev/cfd15170dff7
Nope, still crashes with the same stack. I'll rebuild with -g and report back.
-Brad
Timo,
Maybe this fixes it: http://hg.dovecot.org/dovecot-2.0/rev/cfd15170dff7
Nope, still crashes with the same stack. I'll rebuild with -g and report back.
Here we go. Attached, hopefully Entourage won't mangle the line wrap.
-Brad
On Sat, 2010-07-17 at 18:33 -0700, Brandon Davidson wrote:
Timo,
Maybe this fixes it: http://hg.dovecot.org/dovecot-2.0/rev/cfd15170dff7
Nope, still crashes with the same stack. I'll rebuild with -g and report back.
Here we go. Attached, hopefully Entourage won't mangle the line wrap.
http://hg.dovecot.org/dovecot-2.0/rev/f178792fb820 fixes it?
Timo,
On 7/19/10 9:38 AM, "Timo Sirainen" <tss@iki.fi> wrote:
http://hg.dovecot.org/dovecot-2.0/rev/f178792fb820 fixes it?
It makes it further before crashing. Trace attached.
I still wonder why it's timing out in the first place. Didn't you change it to reset the timeout as long as it's still getting data from the userdb?
-Brad
On Tue, 2010-07-20 at 06:56 -0700, Brandon Davidson wrote:
Timo,
On 7/19/10 9:38 AM, "Timo Sirainen" <tss@iki.fi> wrote:
http://hg.dovecot.org/dovecot-2.0/rev/f178792fb820 fixes it?
It makes it further before crashing. Trace attached.
I still wonder why it's timing out in the first place. Didn't you change it to reset the timeout as long as it's still getting data from the userdb?
Did several fixes related to this in different parts of code. Now it should work? :)
Timo,
-----Original Message----- From: Timo Sirainen [mailto:tss@iki.fi]
Did several fixes related to this in different parts of code. Now it should work? :)
No more crashes! But it still does fail eventually:
[root@cc-popmap7 ~]# doveadm director map doveadm(root): Error: User listing returned failure doveadm(root): Error: user listing failed Jul 20 11:15:04 cc-popmap7 dovecot: auth: Error: getpwent() failed: No such file or directory
This might just be an artifact of our environment, I'm not sure. Dumping users to a file and then feeding that back in works great.
-Brad
participants (4)
-
Brad Davidson
-
Brandon Davidson
-
Darshan Prajapati
-
Timo Sirainen