[Dovecot] Problem with lmtp director proxy
Hi,
I have a timeout problem only when I have heavy load in my system. I
have two director servers directing to 4 backend servers.
The problem is when my smtp relays tries to deliver mail to my users
via lmtp (proxied with director).
In the smtp logs I have:
Jun 12 11:41:18 xenon13 postfix/lmtp[4248]: 4433E5D5A0: to=<myuser@um.es>, relay=pop.um.es[155.54.212.106]:24, delay=31, delays=0.41/0.06/0/30, dsn=4.4.0, status=deferred (host pop.um.es[155.54.212.106] said: 451 4.4.0 Remote server not answering (timeout while waiting for reply to DATA reply) (in reply to end of DATA command))
If I look for this connection in the director servers I have:
Jun 12 11:40:39 myotis41 dovecot: lmtp(6595): Connect from 155.54.212.167 Jun 12 11:40:39 myotis41 dovecot: lmtp(6595): Debug: auth input: user=user1 proxy host=155.54.211.163 proxy_refresh=450 Jun 12 11:40:39 myotis41 dovecot: lmtp(6595): Debug: auth input: user=user2 proxy host=155.54.211.163 proxy_refresh=450 .... (more users, a total of 34 recipients) Jun 12 11:40:39 myotis41 dovecot: lmtp(6595): Debug: auth input: user=myuser proxy host=155.54.211.164 proxy_refresh=450 ... Jun 12 11:41:09 myotis41 dovecot: lmtp(6595): Disconnect from 155.54.212.167: Client quit (in reset)
and in one of the final server (the one for the user in question):
Jun 12 11:40:38 myotis34 dovecot: lmtp(16824): Connect from 155.54.211.186 Jun 12 11:40:38 myotis34 dovecot: lmtp(16824, <user1>): wJ9BD7YM10 +4QQAAG5O5Qg: sieve: msgid=<182283367.48.1339494011054.JavaMail.tomcat@sakai-prod4>: stored mail into mailbox 'INBOX' ... Jun 12 11:41:10 myotis34 dovecot: lmtp(16824, <myuser>): wJ9BD7YM10+4QQA AG5O5Qg: sieve: msgid=<182283367.48.1339494011054.JavaMail.tomcat@sakai-prod4>: stored mail into mailbox 'INBOX' ... Jun 12 11:41:11 myotis34 dovecot: lmtp(16824): Disconnect from 155.54.211.186: Connection closed (in reset)
So the mail seems to be correctly delivered in about 30 seconds. All my
postfix timeouts are bigger than this time:
lmtp_connect_timeout = 0s lmtp_connection_cache_time_limit = 2s lmtp_connection_reuse_time_limit = 300s lmtp_data_done_timeout = 600s lmtp_data_init_timeout = 120s lmtp_data_xfer_timeout = 180s lmtp_lhlo_timeout = 300s lmtp_mail_timeout = 300s lmtp_pix_workaround_delay_time = 10s lmtp_pix_workaround_threshold_time = 500s lmtp_quit_timeout = 300s lmtp_rcpt_timeout = 300s lmtp_rset_timeout = 20s lmtp_sasl_auth_cache_time = 90d lmtp_starttls_timeout = 300s lmtp_tls_session_cache_timeout = 3600s lmtp_xforward_timeout = 300s
So... why do I have this error?
As a side effect this mail was delivered twice in the user's mailbox,
this is one and the other when postfix retries again.
Any help? Thank you
-- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información _o) y las Comunicaciones Aplicadas (ATICA) / \\ http://www.um.es/atica _(___V Tfo: 868887590 Fax: 868888337
On Tue, Jun 12, 2012 at 12:23:28PM +0200, Angel L. Mateo wrote:
I have two director servers directing to 4 backend servers.
Which dovecot version are you running on your directors and backends?
We're running 2.0.14 plus the below linked patches and have not since this problem since applying the last one.
http://hg.dovecot.org/dovecot-2.0/raw-rev/8de8752b2e94
http://hg.dovecot.org/dovecot-2.0/rev/71084b799a6c
-jf
El 12/06/12 12:38, Jan-Frode Myklebust escribió:
On Tue, Jun 12, 2012 at 12:23:28PM +0200, Angel L. Mateo wrote:
I have two director servers directing to 4 backend servers.
Which dovecot version are you running on your directors and backends?
2.1.5
We're running 2.0.14 plus the below linked patches and have not since this problem since applying the last one.
http://hg.dovecot.org/dovecot-2.0/raw-rev/8de8752b2e94 http://hg.dovecot.org/dovecot-2.0/rev/71084b799a6c
I have checked if those patchs are included:
- The first one seems not to apply, because it's for lmtp-proxy.c and this file seems completely different than the one in the patch
- The second is already applied
-- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información _o) y las Comunicaciones Aplicadas (ATICA) / \\ http://www.um.es/atica _(___V Tfo: 868887590 Fax: 868888337
Hi,
I have checked in almost every error I had that the error is produced
whenever happens a timeout of 30 seconds between opening the connection between the director and backend server and the final delivery of the message in the user's mailbox.
When I have mails with just a few of recipients, I have no problem
because this 30 seconds timeout is never reached. But when I have mails with more recipients and my storage has workload it is sometimes reached.
But I haven't found any configuration for this 30 seconds timeout. What
could it be this option? Because I have configured proxy_timeout=120 in proxy configuration:
pass_attrs = irisMailbox=userdb_mail,homeDirectory=userdb_home,uidNumber=userdb_uid,gidNumber=userdb_gid,=proxy=y,=proxy_timeout=120,irisMailHost=host
Looking for this timeout in the code, I have found these defines:
director/director.c:#define DIRECTOR_RECONNECT_TIMEOUT_MSECS (30*1000) director/director.c:#define DIRECTOR_USER_MOVE_TIMEOUT_MSECS (30*1000) director/director-connection.c:#define DIRECTOR_CONNECTION_SEND_USERS_TIMEOUT_MSECS (30*1000) director/director-connection.c:#define DIRECTOR_CONNECTION_DONE_TIMEOUT_MSECS (30*1000) director/director-request.c:#define DIRECTOR_REQUEST_TIMEOUT_SECS 30
lmtp/commands.c:#define LMTP_PROXY_DEFAULT_TIMEOUT_MSECS (1000*30)
Could it be one of these timeouts? In this case... is there any way to
configure it without changing code?
-- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información _o) y las Comunicaciones Aplicadas (ATICA) / \\ http://www.um.es/atica _(___V Tfo: 868887590 Fax: 868888337
On Wed, 2012-06-13 at 08:58 +0200, Angel L. Mateo wrote:
I have checked in almost every error I had that the error is produced whenever happens a timeout of 30 seconds between opening the connection between the director and backend server and the final delivery of the message in the user's mailbox.
When I have mails with just a few of recipients, I have no problem because this 30 seconds timeout is never reached. But when I have mails with more recipients and my storage has workload it is sometimes reached.
Ah, so it's not really a bug. I thought it might be because there had been such problems before.
But I haven't found any configuration for this 30 seconds timeout. What could it be this option? Because I have configured proxy_timeout=120 in proxy configuration:
pass_attrs = irisMailbox=userdb_mail,homeDirectory=userdb_home,uidNumber=userdb_uid,gidNumber=userdb_gid,=proxy=y,=proxy_timeout=120,irisMailHost=host
This should work..
lmtp/commands.c:#define LMTP_PROXY_DEFAULT_TIMEOUT_MSECS (1000*30)
This is the default, but proxy_timeout should override it.
What do you get in logs with auth_debug=yes?
El 13/06/12 14:59, Timo Sirainen escribió:
On Wed, 2012-06-13 at 08:58 +0200, Angel L. Mateo wrote:
I have checked in almost every error I had that the error is produced whenever happens a timeout of 30 seconds between opening the connection between the director and backend server and the final delivery of the message in the user's mailbox.
When I have mails with just a few of recipients, I have no problem because this 30 seconds timeout is never reached. But when I have mails with more recipients and my storage has workload it is sometimes reached.
Ah, so it's not really a bug. I thought it might be because there had been such problems before.
But I haven't found any configuration for this 30 seconds timeout. What could it be this option? Because I have configured proxy_timeout=120 in proxy configuration:
pass_attrs = irisMailbox=userdb_mail,homeDirectory=userdb_home,uidNumber=userdb_uid,gidNumber=userdb_gid,=proxy=y,=proxy_timeout=120,irisMailHost=host
This should work..
lmtp/commands.c:#define LMTP_PROXY_DEFAULT_TIMEOUT_MSECS (1000*30)
This is the default, but proxy_timeout should override it.
But then, why timeout is reached after only 30 seconds? Could it be
other define timeout like DIRECTOR_CONNECTION_DONE_TIMEOUT_MSECS?
What do you get in logs with auth_debug=yes?
I've got always auth_debug=yes. In the director server, logs are:
Jun 12 11:40:39 myotis41 dovecot: lmtp(6595): Connect from 155.54.212.167 Jun 12 11:40:39 myotis41 dovecot: lmtp(6595): Debug: auth input: user=user1 proxy host=155.54.211.163 proxy_refresh=450 Jun 12 11:40:39 myotis41 dovecot: lmtp(6595): Debug: auth input: user=user2 proxy host=155.54.211.163 proxy_refresh=450 .... (more users, a total of 34 recipients) Jun 12 11:40:39 myotis41 dovecot: lmtp(6595): Debug: auth input: user=myuser proxy host=155.54.211.164 proxy_refresh=450 ... Jun 12 11:41:09 myotis41 dovecot: lmtp(6595): Disconnect from 155.54.212.167: Client quit (in reset)
but I have checked with newer errors, all I see in logs are "Connect
from" and "Disconnect from" messages. The logs "lmtp...Debug:" are not produced any more (maybe because director has this information yet?)
At backend servers are:
Jun 12 11:40:38 myotis34 dovecot: lmtp(16824): Connect from 155.54.211.186 Jun 12 11:40:38 myotis34 dovecot: lmtp(16824, <user1>): wJ9BD7YM10 +4QQAAG5O5Qg: sieve: msgid=<182283367.48.1339494011054.JavaMail.tomcat@sakai-prod4>: stored mail into mailbox 'INBOX' ... Jun 12 11:41:10 myotis34 dovecot: lmtp(16824, <myuser>): wJ9BD7YM10+4QQA AG5O5Qg: sieve: msgid=<182283367.48.1339494011054.JavaMail.tomcat@sakai-prod4>: stored mail into mailbox 'INBOX'
On Wed, 2012-06-13 at 16:57 +0200, Angel L. Mateo wrote:
Jun 12 11:40:39 myotis41 dovecot: lmtp(6595): Connect from 155.54.212.167 Jun 12 11:40:39 myotis41 dovecot: lmtp(6595): Debug: auth input: user=user1 proxy host=155.54.211.163 proxy_refresh=450
That says proxy_refresh, not proxy_timeout.
but I have checked with newer errors, all I see in logs are "Connect from" and "Disconnect from" messages. The logs "lmtp...Debug:" are not produced any more (maybe because director has this information yet?)
Director shouldn't affect it. There should still be auth input lines logged. doveconf -n?
El 13/06/12 17:17, Timo Sirainen escribió:
On Wed, 2012-06-13 at 16:57 +0200, Angel L. Mateo wrote:
Jun 12 11:40:39 myotis41 dovecot: lmtp(6595): Connect from 155.54.212.167 Jun 12 11:40:39 myotis41 dovecot: lmtp(6595): Debug: auth input: user=user1 proxy host=155.54.211.163 proxy_refresh=450
That says proxy_refresh, not proxy_timeout.
but I have checked with newer errors, all I see in logs are "Connect from" and "Disconnect from" messages. The logs "lmtp...Debug:" are not produced any more (maybe because director has this information yet?)
Director shouldn't affect it. There should still be auth input lines logged. doveconf -n?
Ok, you were right. I was looking for logs at my log repository, which
doesn't receive debug log. Nevertheless, the only auth lines I have found at the ones above, with the proxy_refresh=450. I haven't found any line with a timeout log in the proxies neither the backends
So, at director servers the only logs I have are the one I have already
sent. At the backend server, I have more logs, that are:
Jun 12 11:40:38 myotis34 dovecot: lmtp(16824): Debug: none: root=, index=, control=, inbox=, alt= Jun 12 11:40:38 myotis34 dovecot: lmtp(16824): Connect from 155.54.211.186 Jun 12 11:40:38 myotis34 dovecot: lmtp(16824): Debug: auth input: user1 home=<user1 home> uid=261853 gid=1001 Jun 12 11:40:38 myotis34 dovecot: lmtp(16824): Debug: auth input: <user2> home=<user2's home> uid=262339 gid=1001
... (more recipients for the same message)
Jun 12 11:40:38 myotis34 dovecot: lmtp(16824): Debug: auth input: <myuser> home=<myuser's home> uid=255606 gid=1001 ... (more recipients for the same message)
Jun 12 11:41:08 myotis34 dovecot: lmtp(16824, <myuser>): Debug: Effective uid=255606, gid=1001, home=<myuser's home> Jun 12 11:41:08 myotis34 dovecot: lmtp(16824, <myuser>): Debug: maildir++: root=<myuser's home>/Maildir, index=/var/indexes/<myuser>, control=, inbox=<myuser's home>/Maildir, alt= Jun 12 11:41:08 myotis34 dovecot: lmtp(16824, <myuser>): Debug: sieve: include: sieve_global_dir is not set; it is currently not possible to include `:global' scripts. Jun 12 11:41:08 myotis34 dovecot: lmtp(16824, <myuser>): Debug: wJ9BD7YM10+4QQAAG5O5Qg: sieve: using sieve path for user's script: <myuser's home>/.dovecot.sieve Jun 12 11:41:08 myotis34 dovecot: lmtp(16824, <myuser>): Debug: wJ9BD7YM10+4QQAAG5O5Qg: sieve: opening script <myuser's home>/.dovecot.sieve Jun 12 11:41:08 myotis34 dovecot: lmtp(16824, <myuser>): Debug: wJ9BD7YM10+4QQAAG5O5Qg: sieve: script binary <myuser's home>/.dovecot.svbin successfully loaded Jun 12 11:41:08 myotis34 dovecot: lmtp(16824, <myuser>): Debug: wJ9BD7YM10+4QQAAG5O5Qg: sieve: binary save: not saving binary <myuser's home>/.dovecot.svbin, because it is already stored Jun 12 11:41:08 myotis34 dovecot: lmtp(16824, <myuser>): Debug: wJ9BD7YM10+4QQAAG5O5Qg: sieve: executing script from <myuser's home>/.dovecot.svbin Jun 12 11:41:10 myotis34 dovecot: lmtp(16824, <myuser>): wJ9BD7YM10+4QQAAG5O5Qg: sieve: msgid=<182283367.48.1339494011054.JavaMail.tomcat@sakai-prod4>: stored mail into mailbox 'INBOX' Jun 12 11:41:11 myotis34 dovecot: lmtp(16824): Disconnect from 155.54.211.186: Connection closed (in reset)
I have attached output of doveconf -n.
What I have observed is that problem ocurrs when I have mails with lot
of recipients, and happens to all recipients which data ends more than 30 seconds after the connection was established. Maybe this timeout has to be counted since the begining of the data command, not the establishment of the tcp connection, or, if this is another timer, the first should be greater.
On 13.6.2012, at 20.11, Angel L. Mateo wrote:
Ok, you were right. I was looking for logs at my log repository, which doesn't receive debug log. Nevertheless, the only auth lines I have found at the ones above, with the proxy_refresh=450. I haven't found any line with a timeout log in the proxies neither the backends
The backend logs don't matter. Director adds the proxy_refresh. You haven't shown in your logs what auth process logs as debug messages. This is what is supposed to happen:
Jun 16 02:19:11 auth: Debug: master out: PASS 1 user=director proxy proxy_timeout=1000
"master out" must return proxy_timeout=1000. If it doesn't, then the problem is with your auth settings.
Jun 16 02:19:11 lmtp(11845): Debug: auth input: user=director proxy proxy_timeout=1000 host=1.2.3.4 proxy_refresh=450
Director adds proxy_refresh, but preserves proxy_timeout.
El 16/06/12 01:22, Timo Sirainen escribió:
On 13.6.2012, at 20.11, Angel L. Mateo wrote:
Ok, you were right. I was looking for logs at my log repository, which doesn't receive debug log. Nevertheless, the only auth lines I have found at the ones above, with the proxy_refresh=450. I haven't found any line with a timeout log in the proxies neither the backends
The backend logs don't matter. Director adds the proxy_refresh. You haven't shown in your logs what auth process logs as debug messages. This is what is supposed to happen:
Jun 16 02:19:11 auth: Debug: master out: PASS 1 user=director proxy proxy_timeout=1000
I don't have any log like this.
"master out" must return proxy_timeout=1000. If it doesn't, then the problem is with your auth settings.
Jun 16 02:19:11 lmtp(11845): Debug: auth input: user=director proxy proxy_timeout=1000 host=1.2.3.4 proxy_refresh=450
Director adds proxy_refresh, but preserves proxy_timeout.
I can find these logs, but they don't include any proxy_timeout option,
all of them are like:
Jun 18 08:26:26 myotis41 dovecot: lmtp(640): Debug: auth input: user=<user> proxy host=155.54.211.164 proxy_refresh=450
But I have found, I think, the problem... I had configured 2 user backends:
!include auth-master.conf.ext !include auth-ldap.conf.ext
The first for master password, and the other, to get users from a ldap
directory. In my auth-ldap.conf.ext I changed the ldap driver for passdb to static (I can't check user password in the director for other reasons), so I had:
passdb { driver = static args = proxy=y nopassword=y }
userdb { driver = prefetch }
userdb { driver = ldap args = /etc/dovecot/dovecot-ldap.conf.ext }
so, although in the dovecot-ldap.conf.ext I have:
pass_attrs = irisMailbox=userdb_mail,homeDirectory=userdb_home,uidNumber=userdb_uid,gidNumber=userdb_gid,=proxy=y,=proxy_timeout=120,irisMailHost=host
it seems that it isn't used and proxy_timeout it's not defined.
So I have changed passdb definition to be:
passdb { driver = static args = proxy=y nopassword=y proxy_timeout=120 }
and now logs are like:
Jun 18 08:46:18 myotis40 dovecot: lmtp(11276): Debug: auth input: user=<user> proxy proxy_timeout=120 host=155.54.211.169 proxy_refresh=450
Is this correct?
On 18.6.2012, at 9.52, Angel L. Mateo wrote:
El 16/06/12 01:22, Timo Sirainen escribió:
On 13.6.2012, at 20.11, Angel L. Mateo wrote:
Ok, you were right. I was looking for logs at my log repository, which doesn't receive debug log. Nevertheless, the only auth lines I have found at the ones above, with the proxy_refresh=450. I haven't found any line with a timeout log in the proxies neither the backends
The backend logs don't matter. Director adds the proxy_refresh. You haven't shown in your logs what auth process logs as debug messages. This is what is supposed to happen:
Jun 16 02:19:11 auth: Debug: master out: PASS 1 user=director proxy proxy_timeout=1000
I don't have any log like this.
Then you don't have auth_debug=yes.
Jun 18 08:46:18 myotis40 dovecot: lmtp(11276): Debug: auth input: user=<user> proxy proxy_timeout=120 host=155.54.211.169 proxy_refresh=450
Is this correct?
Yeah.
El 18/06/12 15:50, Timo Sirainen escribió:
On 18.6.2012, at 9.52, Angel L. Mateo wrote:
El 16/06/12 01:22, Timo Sirainen escribió:
On 13.6.2012, at 20.11, Angel L. Mateo wrote:
Ok, you were right. I was looking for logs at my log repository, which doesn't receive debug log. Nevertheless, the only auth lines I have found at the ones above, with the proxy_refresh=450. I haven't found any line with a timeout log in the proxies neither the backends
The backend logs don't matter. Director adds the proxy_refresh. You haven't shown in your logs what auth process logs as debug messages. This is what is supposed to happen:
Jun 16 02:19:11 auth: Debug: master out: PASS 1 user=director proxy proxy_timeout=1000
I don't have any log like this.
Then you don't have auth_debug=yes.
I had this option. Relooking I have found these logs. I didn't see them
before because of the format and because they aren't related with lmtp. I have them in the form:
Jun 18 12:18:30 myotis41 dovecot: auth: Debug: master out: PASS#01160#011user=<myuser>#011proxy#011proxy_timeout=150
Jun 18 08:46:18 myotis40 dovecot: lmtp(11276): Debug: auth input: user=<user> proxy proxy_timeout=120 host=155.54.211.169 proxy_refresh=450
Is this correct?
Yeah.
Anyway, with the last change (defining the proxy_timeout at the static
passdb definition, default 30 seconds timeout hasn't been applied anymore.
participants (3)
-
Angel L. Mateo
-
Jan-Frode Myklebust
-
Timo Sirainen