[Dovecot] Proxy login failures

Urban Loesch bind at enas.net
Tue Jan 10 14:22:27 EET 2012



On 09.01.2012 23:39, Timo Sirainen wrote:
> On 9.1.2012, at 22.23, Urban Loesch wrote:
>
>>>> I'm using two dovecot pop3/imap proxies in front of our dovecot servers.
>>>> Since some days I see many of the following errors in the logs of the two proxy-servers:
>>>>
>>>> dovecot: pop3-login: Error: proxy: Remote "IPV6-IP":110 disconnected: Connection closed: Connection reset by peer (state=0): user=<myuser>, method=PLAIN, rip=remote-ip, lip=localip
>>>>
>>>> When this happens the Client gets the following error from the proxy:
>>>> -ERR [IN-USE] Account is temporarily unavailable.
>>> The connection to remote server dies before authentication finishes. The reason for why that happens should be logged by the backend server. Sounds like it crashes. Check for ANY error messages in backend servers.
>>>
>>
>> I still did that, but I found nothing in the logs.
>
> It's difficult to guess then. At the very least there should be an "Info" message about a new connection at the time when this failure happened. If there's not even that, then maybe the problem is network related.

No, there is nothing.

>
>> The only thing I could think about is that all 7 backend servers are virtual servers (using technology from http://linux-vserver.org) and they all are running
>> on the same physical machine (DELL PER610 with 32GB RAM, RAID 10 SAS - load between 0.5 and 2.0, iowait about 1-5%). So they are sharing the same kernel.
>
> For testing, or what's the point in doing that? :) But the load is low enough that I doubt it has anything to do with it.

This because the hardware is fast enough for handling about 40.000 Mailaccounts (both IMAP and POP). That tells me that dovecot is a really good piece 
of software. Very performant in my eyes.

>
>> Also all servers are connected to a mysql server, running on a different machine in the same subnet. Could it be that either the kernel needs some tcp tuning ore perhaps the answers from the remote mysql server
>> could be to slow in some cases?
>
> MySQL server problem would show up with a different error message. TCP tuning is also unlikely to help, since the connection probably dies within a second. Actually it would be a good idea to log the duration. This patch adds it:
> http://hg.dovecot.org/dovecot-2.0/raw-rev/8438f66433a6
>

I installed the patch on my proxies and I got this:

...
Jan 10 09:30:45 imap2 dovecot: pop3-login: Error: proxy: Remote "IPV6-IP":110 disconnected: Connection closed: Connection reset by peer (state=0, 
duration=0s): user=<myuser>, method=PLAIN, rip=remote-ip, lip=local-ip

Jan 10 09:45:21 imap2 dovecot: pop3-login: Error: proxy: Remote "IPV6-IP":110 disconnected: Connection closed: Connection reset by peer (state=0, 
duration=1s): user=<myuser>, method=PLAIN, rip=remote-ip, lip=local-ip
...

As you can see the duration is between 0 and 1 seconds.

During this errors there was a tcpdump running on proxy #2 (imap2 in the above logs).
In the time range of "09:30:45:00 - 09:30:46:00" I got an error that the backend server has resetted the connection (RST Flag set).
The fact that dovecot on the backend server writes nothing in the log I think that the connection will be resetted on a lower level.

Here is what whireshark tells me about that:

No.     Source                Time                       Destination           Protocol Info
  101235 IPv6-Proxy-Server 2012-01-10 09:29:38.015073 IPv6-Backend-Server TCP      35341 > pop3 [SYN] Seq=0 Win=14400 Len=0 MSS=1440 SACK_PERM=1 
TSV=1925901864 TSER=0 WS=7
  101236 IPv6-Backend-Server 2012-01-10 09:29:38.015157 IPv6-Proxy-Server TCP      pop3 > 35341 [SYN, ACK] Seq=0 Ack=1 Win=14280 Len=0 MSS=1440 
SACK_PERM=1 TSV=309225565 TSER=1925901864 WS=7
  101248 IPv6-Proxy-Server 2012-01-10 09:29:38.233046 IPv6-Backend-Server POP      [TCP ACKed lost segment] [TCP Previous segment lost] C: UIDL
  101249 IPv6-Backend-Server 2012-01-10 09:29:38.233312 IPv6-Proxy-Server POP      S: +OK
  101250 IPv6-Proxy-Server 2012-01-10 09:29:38.233328 IPv6-Backend-Server TCP      35341 > pop3 [ACK] Seq=57 Ack=50 Win=14464 Len=0 TSV=1925901886 
TSER=309225587
  101263 IPv6-Proxy-Server 2012-01-10 09:29:38.452210 IPv6-Backend-Server POP      C: LIST
  101264 IPv6-Backend-Server 2012-01-10 09:29:38.452403 IPv6-Proxy-Server POP      S: +OK 0 messages:
  101265 IPv6-Proxy-Server 2012-01-10 09:29:38.452426 IPv6-Backend-Server TCP      35341 > pop3 [ACK] Seq=63 Ack=70 Win=14464 Len=0 TSV=1925901908 
TSER=309225609
  101324 IPv6-Proxy-Server 2012-01-10 09:29:38.671209 IPv6-Backend-Server POP      C: QUIT
  101325 IPv6-Backend-Server 2012-01-10 09:29:38.671566 IPv6-Proxy-Server POP      S: +OK Logging out.
  101326 IPv6-Proxy-Server 2012-01-10 09:29:38.671678 IPv6-Backend-Server TCP      35341 > pop3 [FIN, ACK] Seq=69 Ack=89 Win=14464 Len=0 
TSV=1925901930 TSER=309225631
  101327 IPv6-Backend-Server 2012-01-10 09:29:38.671759 IPv6-Proxy-Server TCP      pop3 > 35341 [ACK] Seq=89 Ack=70 Win=14336 Len=0 TSV=309225631 
TSER=1925901930

  134205 IPv6-Proxy-Server 2012-01-10 09:30:45.477314 IPv6-Backend-Server TCP      [TCP Port numbers reused] 35341 > pop3 [SYN] Seq=0 Win=14400 Len=0 
MSS=1440 SACK_PERM=1 TSV=1925908610 TSER=0 WS=7
  134206 IPv6-Backend-Server 2012-01-10 09:30:45.477458 IPv6-Proxy-Server TCP      pop3 > 35341 [SYN, ACK] Seq=0 Ack=1 Win=14280 Len=0 MSS=1440 
SACK_PERM=1 TSV=309232311 TSER=1925908610 WS=7
  134207 IPv6-Proxy-Server 2012-01-10 09:30:45.477499 IPv6-Backend-Server TCP      35341 > pop3 [ACK] Seq=1 Ack=1 Win=14464 Len=0 TSV=1925908610 
TSER=309232311
  134208 IPv6-Backend-Server 2012-01-10 09:30:45.477589 IPv6-Proxy-Server TCP      pop3 > 35341 [RST] Seq=1 Win=0 Len=0
  136052 IPv6-Backend-Server 2012-01-10 09:30:49.477950 IPv6-Proxy-Server TCP      pop3 > 35341 [SYN, ACK] Seq=0 Ack=1 Win=14280 Len=0 MSS=1440 
SACK_PERM=1 TSV=309232712 TSER=1925908610 WS=7
  136053 IPv6-Proxy-Server 2012-01-10 09:30:49.477978 IPv6-Backend-Server TCP      35341 > pop3 [RST] Seq=1 Win=0 Len=0
  138363 IPv6-Backend-Server 2012-01-10 09:30:55.877899 IPv6-Proxy-Server TCP      pop3 > 35341 [SYN, ACK] Seq=0 Ack=1 Win=14280 Len=0 MSS=1440 
SACK_PERM=1 TSV=309233352 TSER=1925908610 WS=7
  138364 IPv6-Proxy-Server 2012-01-10 09:30:55.877925 IPv6-Backend-Server TCP      35341 > pop3 [RST] Seq=1 Win=0 Len=0
  143154 IPv6-Backend-Server 2012-01-10 09:31:08.678005 IPv6-Proxy-Server TCP      pop3 > 35341 [SYN, ACK] Seq=0 Ack=1 Win=14280 Len=0 MSS=1440 
SACK_PERM=1 TSV=309234632 TSER=1925908610 WS=7
  152353 IPv6-Backend-Server 2012-01-10 09:31:32.678103 IPv6-Proxy-Server TCP      pop3 > 35341 [SYN, ACK] Seq=0 Ack=1 Win=14280 Len=0 MSS=1440 
SACK_PERM=1 TSV=309237032 TSER=1925908610 WS=7
  165891 IPv6-Backend-Server 2012-01-10 09:32:20.688324 IPv6-Proxy-Server TCP      pop3 > 35341 [SYN, ACK] Seq=0 Ack=1 Win=14280 Len=0 MSS=1440 
SACK_PERM=1 TSV=309241833 TSER=1925908610 WS=7


 From Seq-No. 101235 - 101327 the session looks ok for me. But on Seq-No. 134205 whireshark tells me that the TCP port with source port number "35341" 
will be reused and on Seq-No. 34208 (after the TCP Session has been established correctly - see Seq-No. 134205 to 134207) the backend server sends a 
RST Packet for the session and the proxy logs the error message, that the connection has been resetted by the peer.

I have no idea if dovecot is sendig the TCP reset or the kernel by himself.

About 1,5 hours ago I changed the kernel flag "/proc/sys/net/ipv4/tcp_tw_recycle" to "1" on the physical backend machine.
Since that I got no more error messages on the proxies.

Changing the default values in "tcp_fin-timeout" or "tcp_tw_reuse" have had no effect. Only "tcp_tw_recycle" seems to help.

Thanks
Urban

> These are the only explanations that I can think of for the error:
>
>   * Remote Dovecot crashes / kills the connection (it would log an error message)
>   * Remote Dovecot server is full of handling existing connections (It would log a warning)
>   * Network trouble, something in the middle disconnecting the connection
>   * Source/destination OS trouble, disconnecting the connection
>   * Some hang that results in eventual disconnection. The duration patch would show if this is the case.
>
>



More information about the dovecot mailing list