Hi Timo, sorry for late answer, but I'm trying to look deeper at problem. Stracing both director and real lmtp server, I've noticed that both of them seem to be stuck waiting for input on socket fd. Director's lmtp seem to be waiting some input from real server, but mail message is not yet sent completely and there no real reason to wait for it.
I hope this help.
Luca
Director lmtp's strace (fd 11 is socket from postfix server, fd 13 is socket to dovecot real server):
08:33:04.418921 read(13, "250-xxxxxxxi"..., 970) = 104
08:33:04.418978 write(13, "RCPT TO:xxxxxxx@unict.it\r\n", 28) = 28
08:33:04.419034 read(13, 0x815aa06, 866) = -1 EAGAIN (Resource
temporarily unavailable)
08:33:04.419076 gettimeofday({1285741984, 419092}, NULL) = 0
08:33:04.419120 epoll_wait(10, {{EPOLLIN, {u32=134706928,
u64=134706928}}}, 103, 964) = 1
08:33:04.447847 gettimeofday({1285741984, 447864}, NULL) = 0
08:33:04.447892 read(13, "250 2.1.5 OK\r\n", 866) = 14
08:33:04.447941 write(13, "DATA\r\n", 6) = 6
08:33:04.447991 read(13, 0x815aa14, 852) = -1 EAGAIN (Resource
temporarily unavailable)
08:33:04.448033 gettimeofday({1285741984, 448049}, NULL) = 0
08:33:04.448077 epoll_wait(10, {{EPOLLIN, {u32=134706928,
u64=134706928}}}, 103, 935) = 1
08:33:04.449844 gettimeofday({1285741984, 449860}, NULL) = 0
08:33:04.449888 read(13, "354 OK\r\n", 852) = 8
08:33:04.449952 setsockopt(13, SOL_TCP, TCP_CORK, [1], 4) = 0
08:33:04.449995 write(13, "Return-Path: 08:33:04.483268 fcntl64(13, F_GETFL) = 0x2 (flags O_RDWR)
08:33:04.483373 fcntl64(13, F_SETFL, O_RDWR|O_NONBLOCK) = 0
08:33:04.483478 write(5, "\350I\0\0}\320\r\0\347\3\0\0"..., 12) = 12
08:33:04.483610 fcntl64(13, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
08:33:04.483716 fcntl64(13, F_SETFL, O_RDWR|O_NONBLOCK) = 0
08:33:04.483822 fcntl64(13, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
08:33:04.483929 fcntl64(13, F_SETFL, O_RDWR|O_NONBLOCK) = 0
08:33:04.484050 getsockname(13, {sa_family=AF_INET, sin_port=htons(24),
sin_addr=inet_addr("10.194.1.31")}, [16]) = 0
08:33:04.484219 fstat64(13, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
08:33:04.484384 _llseek(13, 0, 0xbfa785b0, SEEK_CUR) = -1 ESPIPE
(Illegal seek)
08:33:04.484499 getsockname(13, {sa_family=AF_INET, sin_port=htons(24),
sin_addr=inet_addr("10.194.1.31")}, [16]) = 0
08:33:04.484670 epoll_ctl(12, EPOLL_CTL_ADD, 13,
{EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=155558960, u64=155558960}}) = 0
08:33:04.484788 gettimeofday({1285741984, 484825}, NULL) = 0
08:33:04.485029 write(13, "220 snip.it"..., 54) = 54
08:33:04.485387 gettimeofday({1285741984, 485425}, NULL) = 0
08:33:04.486757 read(13, "LHLO mailgw-lb\r\nMAIL FROM: On 09/13/2010 02:06 PM, Timo Sirainen wrote: On Fri, 2010-09-10 at 09:18 +0200, Luca Palazzo wrote: we are getting a lot of this message from dovecot director. They seem to
be too many.
Backend servers do not log anything and postfix logs a lot of these errors.
It seems that dovecot strategy handgling data output timeout when
proxying LMTP is or too aggressive or may have some other problem. I guess you could try if simply increasing the timeout helps. You need
to recompile though: src/lmtp/commands.c line 36: #define LMTP_PROXY_DEFAULT_TIMEOUT_MSECS (1000*30) Another symptom, maybe related to duplicate uid patch, is that we are
receiving blank email. What patch do you still have? I think v2.0.2 fixed the duplicates?c"..., 4096)
= 4096
08:33:04.451232 writev(13, [{"amp/bg-box-title.gif); backgroun"...,
4436}, {"cellpadding=3D\"0\" border=3D\"0\" w"..., 1160}], 2) = -1
EAGAIN (Resource temporarily unavailable)
08:33:04.451302 gettimeofday({1285741984, 451319}, NULL) = 0
08:33:04.451347 read(11, "=\r\ni e bar - Annuario Qualit&agr"..., 4096)
= 4096
08:33:04.451429 writev(13, [{"amp/bg-box-title.gif); backgroun"...,
4436}, {"cellpadding=3D\"0\" border=3D\"0\" w"..., 3033}], 2) = -1
EAGAIN (Resource temporarily unavailable)
08:33:04.451498 gettimeofday({1285741984, 451515}, NULL) = 0
08:33:04.451543 writev(13, [{"amp/bg-box-title.gif); backgroun"...,
4436}, {"cellpadding=3D\"0\" border=3D\"0\" w"..., 3033}], 2) = -1
EAGAIN (Resource temporarily unavailable)
08:33:04.451612 gettimeofday({1285741984, 451629}, NULL) = 0
08:33:04.451656 read(11, "or:#666666;padding:1px 10px 1px "..., 4096) = 3242
08:33:04.451723 writev(13, [{"amp/bg-box-title.gif); backgroun"...,
4436}, {"cellpadding=3D\"0\" border=3D\"0\" w"..., 3033}], 2) = -1
EAGAIN (Resource temporarily unavailable)
08:33:04.451792 gettimeofday({1285741984, 451858}, NULL) = 0
08:33:04.451887 gettimeofday({1285741984, 451904}, NULL) = 0
08:33:04.451931 writev(13, [{"amp/bg-box-title.gif); backgroun"...,
4436}, {"cellpadding=3D\"0\" border=3D\"0\" w"..., 3033}], 2) = 7469
08:33:04.452016 setsockopt(13, SOL_TCP, TCP_CORK, [0], 4) = 0
08:33:04.452060 read(13, 0x815aa1c, 844) = -1 EAGAIN (Resource
temporarily unavailable)
... and after 60secs timeout...
08:34:04.460271 setsockopt(11, SOL_TCP, TCP_CORK, [1], 4) = 0
08:34:04.460327 write(11, "451 4.4.0 Remote server not answ"..., 61) = 61
08:34:04.460391 setsockopt(11, SOL_TCP, TCP_CORK, [0], 4) = 0
08:34:04.460443 epoll_ctl(10, EPOLL_CTL_DEL, 13, {0, {u32=134706928,
u64=134706928}}) = 0
08:34:04.460511 close(13) = 0
08:34:04.460662 read(11, "adding:1px 10px 1px 10px;font-we"..., 854) = 854
08:34:04.460722 gettimeofday({1285742044, 460739}, NULL) = 0
08:34:04.460766 read(11, "2px;\" align=3D\"right\">&nb"..., 4096) =
1789
08:34:04.461087 gettimeofday({1285742044, 461103}, NULL) = 0
08:34:04.461130 gettimeofday({1285742044, 461146}, NULL) = 0
08:34:04.461173 gettimeofday({1285742044, 461189}, NULL) = 0
08:34:04.461223 epoll_ctl(10, EPOLL_CTL_ADD, 11,
{EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=134583344, u64=134583344}}) = 0
08:34:04.461271 gettimeofday({1285742044, 461288}, NULL) = 0
08:34:04.461333 write(2, "\1\0022997 Disconnect from 10.194.1."..., 48) = 48
08:34:04.461423 epoll_ctl(10, EPOLL_CTL_DEL, 11, {0, {u32=134583344,
u64=134583344}}) = 0
08:34:04.461514 setsockopt(11, SOL_TCP, TCP_CORK, [1], 4) = 0
08:34:04.461561 write(11, "221 2.0.0 Client quit\r\n", 23) = 23
08:34:04.461614 close(11) = 0
Server lmtp's strace:
http://www.viadeo.com/profile/00"..., 116) = 116
08:33:04.522152 read(13, "t-profile&utm_content=3DV1_100420"..., 4096) =
4096
08:33:04.522358 read(13, ";\" valign=3D\"top\">></td> =\r\nso"..., 4096)
= 2057
08:33:04.522852 read(13, 0x9536f31, 2039) = -1 EAGAIN (Resource
temporarily unavailable)
... and after 60secs timeout...
08:34:04.548338 read(13, ""..., 2039) = 0
08:34:04.548465 write(2, "\1\00218920 Disconnect from 10.194.1."..., 55)
= 55
08:34:04.548654 epoll_ctl(12, EPOLL_CTL_DEL, 13, {0, {u32=155558960,
u64=155558960}}) = 0
08:34:04.548784 close(13) = 0