BUG?: Dsync over piped process (DSYNC_RUN_TYPE_CMD): EPIPE if X (done) is received in another write after F (finish)
Hi,
It appears that dsync closes the file handle once it gets the F (finish) response even if it hasn’t received the X (done) response. If the X (done) response gets delivered in another write this will generate EPIPE in the connected process.
This is much easier to replicate over SSL, however the trace is provided without SSL since its easier to see whats going on.
Thanks J. Nick Koston
# dovecot --version 2.2.24 (a82c823) dovecot-2.2.24-1.x86_64
F\n is in packet 1 X\n is in packet 2
STRACE of an the unexpected EPIPE: [pid 34753] read(4, "F\n", 524288) = 2 [pid 34753] write(1, "F\n", 2) = 2 [pid 34752] <... epoll_wait resumed> {{EPOLLIN, {u32=42323904, u64=42323904}}}, 3, 3350) = 1 [pid 34753] select(8, [0 4], NULL, NULL, {21600, 0} <unfinished ...> [pid 34752] read(10, "F\n", 1361) = 2 [pid 34752] write(9, "X\n", 2 <unfinished ...> [pid 34753] <... select resumed> ) = 1 (in [0], left {21599, 999925}) [pid 34752] <... write resumed> ) = 2 [pid 34753] read(0, <unfinished ...> [pid 34752] epoll_ctl(6, EPOLL_CTL_DEL, 10, {0, {u32=42323904, u64=42323904}} <unfinished ...> [pid 34753] <... read resumed> "X\n", 524288) = 2 [pid 34752] <... epoll_ctl resumed> ) = 0 [pid 34753] write(4, "X\n", 2 <unfinished ...> [pid 34752] close(9) = 0 [pid 34753] <... write resumed> ) = 2 [pid 34752] close(10 <unfinished ...> [pid 34753] select(8, [0 4], NULL, NULL, {21600, 0} <unfinished ...> [pid 34752] <... close resumed> ) = 0 [pid 34753] <... select resumed> ) = 2 (in [0 4], left {21599, 999998}) [pid 34752] epoll_wait(6, <unfinished ...> [pid 34753] read(4, "X\n", 524288) = 2 [pid 34753] write(1, "X\n", 2) = -1 EPIPE (Broken pipe)
STRACE where its in one packet:
F\nX\n is in a single packet
[pid 34833] select(8, [0 4], NULL, NULL, {21600, 0}) = 1 (in [4], left {21599, 999966}) [pid 34833] read(4, "F\nX\n", 524288) = 4 [pid 34833] write(1, "F\nX\n", 4) = 4 [pid 34833] select(8, [0 4], NULL, NULL, {21600, 0} <unfinished ...> [pid 34832] <... epoll_wait resumed> {{EPOLLIN, {u32=38678464, u64=38678464}}}, 3, 3889) = 1 [pid 34833] <... select resumed> ) = 1 (in [4], left {21599, 999572}) [pid 34832] read(10, "F\nX\n", 1361) = 4
On 19 May 2016, at 03:54, J. Nick Koston <nick@cpanel.net> wrote:
Hi,
It appears that dsync closes the file handle once it gets the F (finish) response even if it hasn’t received the X (done) response. If the X (done) response gets delivered in another write this will generate EPIPE in the connected process.
This is much easier to replicate over SSL, however the trace is provided without SSL since its easier to see whats going on.
Do you see this as causing some problem? I think this is intentional. The F(inish) tells the final response whether everything was OK or not, so all the work is done after that. The 'X' is described in code as:
/* notify remote that we're closing. this is mainly to avoid
"read() failed: EOF" errors on failing dsyncs */
o_stream_nsend_str(ibc->output,
t_strdup_printf("%c\n", items[ITEM_DONE].chr));
and on the receiving side:
if (line[0] == items[ITEM_DONE].chr) {
/* remote cleanly closed the connection, possibly because of
some failure (which it should have logged). we don't want to
log any stream errors anyway after this. */
dsync_ibc_stream_stop(ibc);
return DSYNC_IBC_RECV_RET_TRYAGAIN;
}
So I think it only matters in error handling.
participants (2)
-
J. Nick Koston
-
Timo Sirainen