Internal quota calculation error (redis), double mail delivery
Sometimes my redis connection times out, both for pop3, imap and lmtp. I have no idea why, when I do a "doveadm quota recalc -u" it works just fine? I'm thinking the connection is lost somehow and not retried?
Apr 06 15:42:00 lmtp(xxx@xxx.com): Error: redis: Lookup timed out in 30.000 secs Apr 06 15:42:00 lmtp(xxx@xxx.com): Error: Internal quota calculation error Apr 06 15:42:00 lmtp(xxx@xxx.com): Error: Internal quota calculation error Apr 06 15:42:00 lmtp(xxx@xxx.com): Info: xxxx: msgid=xxxx@xxxx.com: saved mail to INBOX Apr 06 15:49:23 lmtp(xxx@xxx.com): Info: xxxx: msgid=xxxx@xxxx.com: saved mail to INBOX <-- Same mail
This happens a lot, and it's really bad, because the mail is actually saved correctly, but the error is reported back to the Director, causing the delivery to be retried by the MTA - so the mail actually lands *twice* in the user's inbox.
Thanks
-- Tom
On 06 Apr 2016, at 16:54, Tom Sommer mail@tomsommer.dk wrote:
Sometimes my redis connection times out, both for pop3, imap and lmtp. I have no idea why, when I do a "doveadm quota recalc -u" it works just fine? I'm thinking the connection is lost somehow and not retried?
Apr 06 15:42:00 lmtp(xxx@xxx.com): Error: redis: Lookup timed out in 30.000 secs Apr 06 15:42:00 lmtp(xxx@xxx.com): Error: Internal quota calculation error Apr 06 15:42:00 lmtp(xxx@xxx.com): Error: Internal quota calculation error Apr 06 15:42:00 lmtp(xxx@xxx.com): Info: xxxx: msgid=xxxx@xxxx.com: saved mail to INBOX Apr 06 15:49:23 lmtp(xxx@xxx.com): Info: xxxx: msgid=xxxx@xxxx.com: saved mail to INBOX <-- Same mail
This happens a lot, and it's really bad, because the mail is actually saved correctly, but the error is reported back to the Director, causing the delivery to be retried by the MTA - so the mail actually lands *twice* in the user's inbox.
Shrink the timeout in Redis with timeout_msecs parameter. http://wiki2.dovecot.org/Dictionary
Actually it looks like the timeout handling code doesn't work at all. It simply keeps on logging timeouts until it succeeds or the connection gets disconnected. Do you see such repeated timeout logs, or does the connection also happen to get closed exactly at 30 seconds anyway? This fixes: https://github.com/dovecot/core/commit/2fb829dc7cc8c61dee925a15e51768a4b82e4...
Is there some firewall between Dovecot and Redis that closes idling connections and causes them to hang? Dovecot doesn't close idle Redis connections, although I suppose it would be useful too. The attached patch probably helps here by adding a new idle_timeout_msecs setting. I didn't finish testing it.
On 2016-04-06 20:27, Timo Sirainen wrote:
On 06 Apr 2016, at 16:54, Tom Sommer mail@tomsommer.dk wrote:
Sometimes my redis connection times out, both for pop3, imap and lmtp. I have no idea why, when I do a "doveadm quota recalc -u" it works just fine? I'm thinking the connection is lost somehow and not retried?
Apr 06 15:42:00 lmtp(xxx@xxx.com): Error: redis: Lookup timed out in 30.000 secs Apr 06 15:42:00 lmtp(xxx@xxx.com): Error: Internal quota calculation error Apr 06 15:42:00 lmtp(xxx@xxx.com): Error: Internal quota calculation error Apr 06 15:42:00 lmtp(xxx@xxx.com): Info: xxxx: msgid=xxxx@xxxx.com: saved mail to INBOX Apr 06 15:49:23 lmtp(xxx@xxx.com): Info: xxxx: msgid=xxxx@xxxx.com: saved mail to INBOX <-- Same mail
This happens a lot, and it's really bad, because the mail is actually saved correctly, but the error is reported back to the Director, causing the delivery to be retried by the MTA - so the mail actually lands *twice* in the user's inbox.
- Shrink the timeout in Redis with timeout_msecs parameter. http://wiki2.dovecot.org/Dictionary
Yea, but why is it taking 30 seconds anyway? I'll try and lower it, but I suspect I'll just get 10.000 secs errors instead?
- Actually it looks like the timeout handling code doesn't work at all. It simply keeps on logging timeouts until it succeeds or the connection gets disconnected. Do you see such repeated timeout logs, or does the connection also happen to get closed exactly at 30 seconds anyway? This fixes: https://github.com/dovecot/core/commit/2fb829dc7cc8c61dee925a15e51768a4b82e4...
I see a lot of them, but none repeated after eachother. It's more of a random pattern.
- Is there some firewall between Dovecot and Redis that closes idling connections and causes them to hang? Dovecot doesn't close idle Redis connections, although I suppose it would be useful too. The attached patch probably helps here by adding a new idle_timeout_msecs setting. I didn't finish testing it.
No firewall at all. I'm unsure if redis closes idle connections, that could be the cause of it.
What about the issue of lmtp saving the mail, but sending an error back to Director?
Thanks :)
Tom
On 06 Apr 2016, at 22:14, Tom Sommer mail@tomsommer.dk wrote:
On 2016-04-06 20:27, Timo Sirainen wrote:
On 06 Apr 2016, at 16:54, Tom Sommer mail@tomsommer.dk wrote:
Sometimes my redis connection times out, both for pop3, imap and lmtp. I have no idea why, when I do a "doveadm quota recalc -u" it works just fine? I'm thinking the connection is lost somehow and not retried? Apr 06 15:42:00 lmtp(xxx@xxx.com): Error: redis: Lookup timed out in 30.000 secs Apr 06 15:42:00 lmtp(xxx@xxx.com): Error: Internal quota calculation error Apr 06 15:42:00 lmtp(xxx@xxx.com): Error: Internal quota calculation error Apr 06 15:42:00 lmtp(xxx@xxx.com): Info: xxxx: msgid=xxxx@xxxx.com: saved mail to INBOX Apr 06 15:49:23 lmtp(xxx@xxx.com): Info: xxxx: msgid=xxxx@xxxx.com: saved mail to INBOX <-- Same mail This happens a lot, and it's really bad, because the mail is actually saved correctly, but the error is reported back to the Director, causing the delivery to be retried by the MTA - so the mail actually lands *twice* in the user's inbox.
- Shrink the timeout in Redis with timeout_msecs parameter. http://wiki2.dovecot.org/Dictionary
Yea, but why is it taking 30 seconds anyway? I'll try and lower it, but I suspect I'll just get 10.000 secs errors instead?
I don't know. Possibly a hanging TCP connection. That's why I was asking about the firewall. If it's not that, I don't really know. And yes, just shrinking this without the below fix won't help anything.
- Actually it looks like the timeout handling code doesn't work at all. It simply keeps on logging timeouts until it succeeds or the connection gets disconnected. Do you see such repeated timeout logs, or does the connection also happen to get closed exactly at 30 seconds anyway? This fixes: https://github.com/dovecot/core/commit/2fb829dc7cc8c61dee925a15e51768a4b82e4...
I see a lot of them, but none repeated after eachother. It's more of a random pattern.
Maybe something triggers after 30 seconds elsewhere that also aborts the lookup..
- Is there some firewall between Dovecot and Redis that closes idling connections and causes them to hang? Dovecot doesn't close idle Redis connections, although I suppose it would be useful too. The attached patch probably helps here by adding a new idle_timeout_msecs setting. I didn't finish testing it.
No firewall at all. I'm unsure if redis closes idle connections, that could be the cause of it.
Closing a connection is fine. That causes Dovecot to just reconnect.
What about the issue of lmtp saving the mail, but sending an error back to Director?
I guess the LMTP proxy_timeout in director is 30 seconds or lower. If backend's timeouts are smaller than that, director won't have to disconnect LMTP due to timeout and there won't be duplicates. Or just increase the director's proxy_timeout to several minutes.
On 2016-04-06 21:22, Timo Sirainen wrote:
Yea, but why is it taking 30 seconds anyway? I'll try and lower it, but I suspect I'll just get 10.000 secs errors instead?
I don't know. Possibly a hanging TCP connection. That's why I was asking about the firewall. If it's not that, I don't really know. And yes, just shrinking this without the below fix won't help anything.
Indeed. I have antivirus on the same server doing redis to the same redis-server, with no errors.
What about the issue of lmtp saving the mail, but sending an error back to Director?
I guess the LMTP proxy_timeout in director is 30 seconds or lower. If backend's timeouts are smaller than that, director won't have to disconnect LMTP due to timeout and there won't be duplicates. Or just increase the director's proxy_timeout to several minutes.
proxy_timeout is not set, I'm unsure of the default. I've raised it to 120 now.
This is the log from director:
Apr 06 15:42:00 lmtp(46187): Error: xxxx: Failed to send message to xxxx@xxx.xxx at xxx.xxx.com:24: 451 4.4.0 Remote server not answering (timeout while waiting for reply to DATA reply) <5vWKIVUIBVdrtAAA1VGibg> (1/1 at 30067 ms)
So yea, guess the timeout was 30 there as well.
participants (2)
-
Timo Sirainen
-
Tom Sommer