At Wed, 10 Aug 2011 20:37:46 +0300, Timo Sirainen wrote:
On 2.8.2011, at 5.25, SATOH Fumiyasu wrote:
Dovecot ignores EINPROGRESS on connect(2) for non-blocking fd. This is wrong. After that, read(2) to fd (or write(2) to fd) fails with ENOTCONN if the connection of fd is not completed.
The attached patch fixes this problem.
If you do that, then there's no point in making the socket non-blocking before connect().
Linux connect(2) manpage said:
EINPROGRESS
The socket is nonblocking and the connection
cannot be completed immediately. It is pos-
sible to select(2) or poll(2) for completion
by selecting the socket for writing. After
select(2) indicates writability, use get-
sockopt(2) to read the SO_ERROR option at
level SOL_SOCKET to determine whether con-
nect() completed successfully (SO_ERROR is
zero) or unsuccessfully (SO_ERROR is one of
the usual error codes listed here, explain-
ing the reason for the failure).
Solaris 10 connect(3SOCKET) manpage said:
EINPROGRESS The socket is non-blocking,
and the connection cannot
be completed immediately.
You can use select(3C) to
complete the connection by
selecting the socket for
writing.
Windows connect function document said (http://msdn.microsoft.com/en-us/library/ms737625%28v=vs.85%29.aspx):
With a nonblocking socket, the connection attempt cannot be completed immediately. In this case, connect will return SOCKET_ERROR, and WSAGetLastError will return WSAEWOULDBLOCK. In this case, there are three possible scenarios:
* Use the select function to determine the completion of the
connection request by checking to see if the socket is writeable.
* If the application is using WSAAsyncSelect to indicate interest
in connection events, then the application will receive an
FD_CONNECT notification indicating that the connect operation is
complete (successfully or not).
* If the application is using WSAEventSelect to indicate interest
in connection events, then the associated event object will be
signaled indicating that the connect operation is complete
(successfully or not).
On a high-load Solaris 10 box, dovecot-lda fails to query (I/O) to dovecot dict socket with ENOTCONN. My patch fixes this problem.
I think Linux/etc returns EAGAIN in such situation. Maybe the right fix is to just add EINPROGRESS check for net_connect_unix_with_retries()? (With some extra changes so that it actually sees that errno from net_connect_unix())
I think you MUST wait for the fd to complete connect() before read() from / write() to the fd in such situation.
-- -- Name: SATOH Fumiyasu (fumiyas @ osstech co jp) -- Business Home: http://www.OSSTech.co.jp/ -- Personal Home: http://www.SFO.jp/blog/