On Saturday, November 2, 2019 10:40:45 PM EET Stephan Bosch stephan@rename-it.nl wrote:
I think we already know about this issue (tracked as DOP-942). It apparently happens because qmail masks the SIGCHLD signal while Dovecot doesn't unmask it before waiting for children.
You're right. Isn't this a simple fix - a call to sigprocmask ?
This should not be a problem in qmail-lspawn/qmail-local as their code has not been changed for years. So there should have been a breaking change in Dovecot/Pigeonhole.
What follows is some more debugging from me.
The chain of processes that lead to dovecot-lda and sendmail execution look like this:
PID 32457 qmail-lspawn 6923 qmail-local 6924 sh -c vlelivermail 6925 sh -c preline /usr/local/bin/vpop-lda 6926 vpop-lda(perl script) -> deliver -> doveconf -> deliver 6927 grep 6928 pwd 6929 sendmail -> qmail-inject 6930 qmail-queue
Just before executing sendmail, dovecot-lda is registering a signal handler for SIGCHLD:
22:55:15.294647 epoll_create(128) = 12 22:55:15.294668 fcntl(12, F_GETFD) = 0 22:55:15.294687 fcntl(12, F_SETFD, FD_CLOEXEC) = 0 22:55:15.294708 epoll_ctl(12, EPOLL_CTL_ADD, 5, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=47455536, u64=94699781365040}}) = 0 22:55:15.294740 rt_sigaction(SIGCHLD, {sa_handler=0x7f15404b3640, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_SIGINFO, sa_restorer=0x7f1540229840}, NULL, 8) = 0 22:55:15.294763 pipe([13, 14]) = 0 22:55:15.294788 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f15401ec2d0) = 6929
Later dovecot-lda checks non-blockingly whether sendmail has exited and waits for epoll IO event:
22:55:15.295175 wait4(6929, 0x562102d4fee4, WNOHANG, NULL) = 0 22:55:15.295211 epoll_wait(12, [], 1, 30000) = 0 22:55:45.325580 kill(6929, SIGTERM) = 0 22:55:45.325769 epoll_wait(12, [], 1, 5000) = 0 22:55:50.331108 kill(6929, SIGKILL) = 0
I'm not sure why it is waiting for IO event on FD 5, which is a PIPE FD initialized much earlier, with no actual reads/writes on it:
22:55:15.286845 pipe([5, 6]) = 0 22:55:15.286870 fcntl(5, F_GETFL) = 0 (flags O_RDONLY) 22:55:15.286889 fcntl(5, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 22:55:15.286909 fcntl(6, F_GETFL) = 0x1 (flags O_WRONLY) 22:55:15.286928 fcntl(6, F_SETFL, O_WRONLY|O_NONBLOCK) = 0
The pipe used for communication with the sendmail child is this:
22:55:15.294763 pipe([13, 14]) = 0 22:55:15.294788 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f15401ec2d0) = 6929 22:55:15.294928 close(13) = 0 22:55:15.294969 fcntl(14, F_GETFL) = 0x1 (flags O_WRONLY) 22:55:15.295006 fcntl(14, F_SETFL, O_WRONLY|O_NONBLOCK) = 0 22:55:15.295039 lseek(14, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) 22:55:15.295065 getsockname(14, 0x7ffd59156a40, [28]) = -1 ENOTSOCK (Socket operation on non-socket) 22:55:15.295109 write(14, "X-Sieve: Pigeonhole Sieve 0.5.4 ()\nX-Sieve-Redirected-From: ...", 1461) = 1461 22:55:15.295150 close(14) = 0 22:55:15.295175 wait4(6929, 0x562102d4fee4, WNOHANG, NULL) = 0 22:55:15.295211 epoll_wait(12, [], 1, 30000) = 0 22:55:45.325580 kill(6929, SIGTERM) = 0
This is the process creation/waiting and signal mask debugging:
qmail-lspawn (PID 32457)
22:55:15.227742 rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
...
22:55:15.227970 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f8a117de7d0) = 6923
...
22:55:15.228237 rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
22:55:15.228267 select(4, [0 3], NULL, NULL, NULL) = 1 (in [3])
22:55:50.334325 rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
22:55:50.334367 read(3, "Execution of Sieve filters was aborted due to temporary failure\n", 128) = 64
22:55:50.334406 rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
22:55:50.334440 select(4, [0 3], NULL, NULL, NULL) = ? ERESTARTNOHAND (To be restarted if no handler)
22:55:50.335625 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=6923, si_uid=1002, si_status=111, si_utime=0, si_stime=0} ---
22:55:50.335649 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 111}], WNOHANG, NULL) = 6923
22:55:50.335683 close(4) = 0
22:55:50.335710 wait4(-1, 0x7fff05f68d8c, WNOHANG, NULL) = -1 ECHILD (No child processes)
22:55:50.335742 rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call)
22:55:50.335773 rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
22:55:50.335801 rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
22:55:50.335827 select(4, [0 3], NULL, NULL, NULL) = 1 (in [3])
22:55:50.335874 rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
22:55:50.335899 read(3, "", 128) = 0
22:55:50.335925 write(1, "\0ZExecution of Sieve filters was aborted due to temporary failure\n\0", 67) = 67
22:55:50.335974 close(3) = 0
22:55:50.336006 rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
22:55:50.336034 select(1, [0], NULL, NULL, NULL
qmail-local (PID 6923) 22:55:15.229053 execve("bin/qmail-local", ["bin/qmail-local", "--", "krustev.net", "/home/krustev/mail/domains/krustev.net", "krustev.net-tester", "-", "tester", "krustev.net", "krustev@krustev.net", "./Maildir/"], 0x7fff05f69560 /* 1 var */) = 0 ... 22:55:15.230313 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fde0346a7d0) = 6924 22:55:15.230420 wait4(6924, [{WIFEXITED(s) && WEXITSTATUS(s) == 111}], 0, NULL) = 6924 22:55:50.335462 exit_group(111) = ? 22:55:50.335572 +++ exited with 111 +++
sh -> vlelivermail (PID 6924) 22:55:15.230511 execve("/bin/sh", ["/bin/sh", "-c", " /var/lib/vpopmail/bin/vdelivermail '' bounce-no-mailbox"], 0x564c0be3ac30 /* 19 vars */) = 0 .. 22:55:15.233866 rt_sigprocmask(SIG_BLOCK, NULL, [CHLD], 8) = 0 ... 22:55:15.233997 execve("/var/lib/vpopmail/bin/vdelivermail", ["/var/lib/vpopmail/bin/vdelivermail", "", "bounce-no-mailbox"], 0x55e1e61bcde0 /* 22 vars */) = 0 ... 22:55:15.237461 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f7bea310a10) = 6925 22:55:15.237581 wait4(6925, [{WIFEXITED(s) && WEXITSTATUS(s) == 75}], 0, NULL) = 6925 22:55:50.335255 lseek(0, -1337, SEEK_CUR) = 0 22:55:50.335283 exit_group(111) = ? 22:55:50.335385 +++ exited with 111 +++
sh -> preline (PID 6925) 22:55:15.237676 execve("/bin/sh", ["/bin/sh", "-c", "preline /usr/local/bin/vpop-lda"], 0x20a8b30 /* 23 vars */) = 0 ... 22:55:15.240248 rt_sigprocmask(SIG_BLOCK, NULL, [CHLD], 8) = 0 ... 22:55:15.241235 execve("/var/lib/qmail/bin/preline", ["preline", "/usr/local/bin/vpop-lda"], 0x564b8bbbbc50 /* 23 vars */) = 0 ... 22:55:15.242411 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f68e199c7d0) = 6926 ... 22:55:15.243005 wait4(6926, [{WIFEXITED(s) && WEXITSTATUS(s) == 75}], 0, NULL) = 6926 22:55:50.335131 exit_group(75) = ? 22:55:50.335203 +++ exited with 75 +++
vpop-lda->deliver->doveconf->deliver (PID 6926 22:55:15.242860 execve("/usr/local/bin/vpop-lda", ["/usr/local/bin/vpop-lda"], 0x7ffe55a89810 /* 23 vars */) = 0 ... 22:55:15.257985 rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD], 8) = 0 22:55:15.258010 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fd5e8098490) = 6927 ... 22:55:15.258305 wait4(6927, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 6927 ... 22:55:15.260774 rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0 ... 22:55:15.261905 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fd5e8098490) = 6928 ... 22:55:15.263580 wait4(6928, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 6928 ... 22:55:15.263650 execve("/usr/lib/dovecot/deliver", ["/usr/lib/dovecot/deliver", "-e", "-a", "tester@krustev.net"], 0x563ae02171e0 /* 23 vars */) = 0 ... 22:55:15.267914 execve("/usr/bin/doveconf", ["/usr/bin/doveconf", "-f", "service=lda", "-c", "/etc/dovecot/dovecot.conf", "-m", "lda", "-e", "/usr/lib/dovecot/deliver", "-e", "-a", "tester@krustev.net"], 0x55ad578460b0 /* 25 vars */) = 0 ... 22:55:15.283482 execve("/usr/lib/dovecot/deliver", ["/usr/lib/dovecot/deliver", "-e", "-a", "tester@krustev.net"], 0x5620b4330c40 /* 61 vars */) = 0 ... 22:55:15.294788 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f15401ec2d0) = 6929 ... 22:55:15.295175 wait4(6929, 0x562102d4fee4, WNOHANG, NULL) = 0 22:55:15.295211 epoll_wait(12, [], 1, 30000) = 0 22:55:45.325580 kill(6929, SIGTERM) = 0 22:55:45.325769 epoll_wait(12, [], 1, 5000) = 0 22:55:50.331108 kill(6929, SIGKILL) = 0 22:55:50.332867 wait4(6929, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 6929
grep (PID 6927)
22:55:15.258651 execve("/bin/grep", ["grep", "-ql", "Spam", ".dovecot.sieve"], 0x563ae02171e0 /* 23 vars */) = 0
...
22:55:15.260681 exit_group(1) = ?
22:55:15.260751 +++ exited with 1 +++
pwd (PID 6928) 22:55:15.262492 execve("/bin/pwd", ["pwd"], 0x563ae02171e0 /* 23 vars */) = 0 ... 22:55:15.263532 exit_group(0) = ? 22:55:15.263587 +++ exited with 0 +++
22:55:15.295268 execve("/usr/sbin/sendmail", ["/usr/sbin/sendmail", "-i", "-f", "tester@krustev.net", "--", "krustev@krustev.net"], NULL) = 0
...
22:55:15.296055 execve("bin/qmail-inject", ["bin/qmail-inject", "-a", "-f", "tester@krustev.net", "--", "krustev@krustev.net"], 0x55741b1890e0 /* 2 vars */) = 0
...
22:55:15.297264 vfork() = 6930
...
22:55:15.297930 wait4(6930, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 6930
22:55:15.305391 exit_group(0) = ?
22:55:15.305472 +++ exited with 0 +++
Best regards
Delian