[Patch] Fix hang in safe_sendfile on SmartOS
# HG changeset patch # User Sebastian Wiedenroth <sebastian.wiedenroth@skylime.net> # Date 1437050484 -7200 # Thu Jul 16 14:41:24 2015 +0200 # Node ID 7ef3a533b097e8e6590e754dc56ad308ab29233b # Parent e3640ccaa76d77a9658126d1f8f306480dad8af7 Fix hang in safe_sendfile on SmartOS The call to sendfile on SmartOS can fail with EOPNOTSUPP. This is a valid error code and documented in the man page. This error code needs to be handled or else dovecot will retry the sendfile call endlessly and hang. diff -r e3640ccaa76d -r 7ef3a533b097 src/lib/sendfile-util.c --- a/src/lib/sendfile-util.c Sat Jan 10 04:32:42 2015 +0200 +++ b/src/lib/sendfile-util.c Thu Jul 16 14:41:24 2015 +0200 @@ -116,7 +116,7 @@ if (errno == EINVAL) { /* most likely trying to read past EOF */ ret = 0; - } else if (errno == EAFNOSUPPORT) { + } else if (errno == EAFNOSUPPORT || errno == EOPNOTSUPP) { /* not supported, return Linux-like EINVAL so caller sees only consistent errnos. */ errno = EINVAL;
ping? If more information or a system to test the issue on is required please let me know, I’m sure I can provide those. If this list is not the correct place for patches I’m also happy to learn where else to send it. Thanks. Best regards, Sebastian
Am 16.07.2015 um 17:03 schrieb Sebastian Wiedenroth <sebastian.wiedenroth@skylime.net>:
# HG changeset patch # User Sebastian Wiedenroth <sebastian.wiedenroth@skylime.net> # Date 1437050484 -7200 # Thu Jul 16 14:41:24 2015 +0200 # Node ID 7ef3a533b097e8e6590e754dc56ad308ab29233b # Parent e3640ccaa76d77a9658126d1f8f306480dad8af7 Fix hang in safe_sendfile on SmartOS
The call to sendfile on SmartOS can fail with EOPNOTSUPP. This is a valid error code and documented in the man page. This error code needs to be handled or else dovecot will retry the sendfile call endlessly and hang.
diff -r e3640ccaa76d -r 7ef3a533b097 src/lib/sendfile-util.c --- a/src/lib/sendfile-util.c Sat Jan 10 04:32:42 2015 +0200 +++ b/src/lib/sendfile-util.c Thu Jul 16 14:41:24 2015 +0200 @@ -116,7 +116,7 @@ if (errno == EINVAL) { /* most likely trying to read past EOF */ ret = 0; - } else if (errno == EAFNOSUPPORT) { + } else if (errno == EAFNOSUPPORT || errno == EOPNOTSUPP) { /* not supported, return Linux-like EINVAL so caller sees only consistent errnos. */ errno = EINVAL;
On 07/16/2015 06:03 PM, Sebastian Wiedenroth wrote:
Fix hang in safe_sendfile on SmartOS
The call to sendfile on SmartOS can fail with EOPNOTSUPP. This is a valid error code and documented in the man page. This error code needs to be handled or else dovecot will retry the sendfile call endlessly and hang.
Committed .. However, I think a more important bug is that it hangs. It's definitely not supposed to hang. Which process was it that was hanging How can I reproduce that? I can only get it to disconnect the IMAP client.
Am 07.09.2015 um 21:34 schrieb Timo Sirainen <tss@iki.fi>:
On 07/16/2015 06:03 PM, Sebastian Wiedenroth wrote:
Fix hang in safe_sendfile on SmartOS
The call to sendfile on SmartOS can fail with EOPNOTSUPP. This is a valid error code and documented in the man page. This error code needs to be handled or else dovecot will retry the sendfile call endlessly and hang.
Committed .. However, I think a more important bug is that it hangs. It's definitely not supposed to hang. Which process was it that was hanging How can I reproduce that? I can only get it to disconnect the IMAP client.
Thanks!
It was the managesieve process that was hanging. To trigger it we used sieve-connect [1] like this: sieve-connect -u demo@example.com mailbox.example.com
To find the issue we used a dtrace script [2]. With an unpatched version it would show:
CPU ID FUNCTION:NAME
0 4623 safe_sendfile:return 7 4 fffffd7fffdff8b8 154 -> 77
0 4623 safe_sendfile:return 7 4 fffffd7fffdff8b8 9223372036854775807 -> 77
0 4623 safe_sendfile:return 7 4 fffffd7fffdff8b8 9223372036854775807 -> 77
and then just repeat the last line as it retried the call forever. This is where it hangs, spinning on the cpu.
After looking at the code I confirmed the issue with a dtrace one-liner that tracks the sendfilev syscall:
dtrace -n 'syscall::sendfilev:return {printf("%d %x\n", arg0, errno)}'
This showed that the call returned with EOPNOTSUPP (0x7a): 0 6155 sendfilev:return -1 7a
The man page lists this as a valid error code and handling it the same way as EAFNOSUPPORT fixed the issue for us. There are a few more error codes in the man page that currently are not handled by dovecot. This might be something to look into in the future.
I hope this answer provides the details you’re looking for.
Best regards, Sebastian
[1] https://github.com/philpennock/sieve-connect [2] https://gist.github.com/wiedi/4b4ebe5f92ac5b54951b
On 07 Sep 2015, at 23:19, Sebastian Wiedenroth <sebastian.wiedenroth@skylime.net> wrote:
Am 07.09.2015 um 21:34 schrieb Timo Sirainen <tss@iki.fi>:
On 07/16/2015 06:03 PM, Sebastian Wiedenroth wrote:
Fix hang in safe_sendfile on SmartOS
The call to sendfile on SmartOS can fail with EOPNOTSUPP. This is a valid error code and documented in the man page. This error code needs to be handled or else dovecot will retry the sendfile call endlessly and hang.
Committed .. However, I think a more important bug is that it hangs. It's definitely not supposed to hang. Which process was it that was hanging How can I reproduce that? I can only get it to disconnect the IMAP client.
Thanks!
It was the managesieve process that was hanging. To trigger it we used sieve-connect [1] like this: sieve-connect -u demo@example.com mailbox.example.com
Thanks. I did find a bug in Pigeonhole with this when issuing a GET command :)
Also I see now why it's looping, more or less. sendfile() is still indicating that it's sending some data (by updating s_offset) even though it's returning a failure. I wonder if reverting the earlier EOPNOTSUPP change and applying this patch causes it to assert-crash instead of going to infinite loop? http://hg.dovecot.org/dovecot-2.2/rev/f6dd24658fb1
participants (2)
-
Sebastian Wiedenroth
-
Timo Sirainen