On 26.9.2012, at 20.34, Kelsey Cummings wrote:
The following errors on the directors that started after this went unnoticed until this AM.
director: User bb host lookup failed: Timeout - queued for 30 secs (Ring synced for 36 secs) director: User cc host lookup failed: Timeout - queued for 48 secs (Ring synced for 66 secs, user refreshed 12 secs ago) director: User dd host lookup failed: Timeout - queued for 124 secs (Ring synced for 119 secs, weak user, user refreshed 155 secs ago) director: User ee host lookup failed: Timeout - queued for 79 secs (Ring synced for 119 secs, weak user, user refreshed 113 secs ago) ... User ff host lookup failed: Timeout - queued for 30 secs (Ring synced for 7427 secs, weak user, user refreshed 620 secs ago)
This continued, combined with occasional login timeouts (as reported by some internal imap clients.) The login delays/timeouts got bad enough that our load balancers dropped both the servers while I was investigating. They seem to be okay after being restarted.
After the first few minutes, did all the rest of the error messages contain "weak user" string? Did this happen to a lot of different users (few/some/most)? director_user_expire setting is the default 15 minutes?