Hi,

I can confirm that in the bad behavior, the 5 second delay occurs at each AUTH in our case. I think the configuration we have kill auth process at each end of AUTH (and fork a new one for next next AUTH). I think the "disable" flag is local to the process that is killed / respawned.

A collegue of mine, Laurent Guerby, has found a workaround for us and it's findings seems very valuable for this ticket.

We now know how to not having the troubles in our setup : by manual removing the auth-penalty socket with "rm" (or by rebooting the whole VM because the socket is in a tmpfs). The rm is needed only one time.

I think now a clean reproducer of the problem is :

(I go through a fairly similar one but not this exact one)

The socket open retries loop happens if the socket exists but noone "listen" on it.
The socket open retries loop don't happen if the socket does not exists at all.

(I think there is 1 try, the system returns non existing error, this bails out)

Regards,
Ludovic

On 06/01/2019 20:04, Stephan Bosch wrote:

Op 20/12/2018 om 18:09 schreef Ludovic Pouzenc:
Hi,

I hit a bizare problem with dovecot 2.2.7 on debian 9 with LMTP enabled and auth/penalty disabled as documented here :
  https://wiki.dovecot.org/Authentication/Penalty

Use case : I run a swaks command to send an email to an exim4 that tries to make a callout to dovecot-lmtp.
At RCPT TO: swaks hangs 5.0<something-small> seconds then process normally (exim is waiting for callout completion).

with strace, I see 5 second delay with many tries to socket connection.

The sources confirm this behavior, but only for the first connection attempt (i.e. the first 5 seconds of reconnections), which happens at auth startup (which then takes 5 seconds longer). After that, it will mark auth-penalty as disabled and it will not try connecting to it again. Is that what you're referring to?


Regards,

Stephan.





(PID 9652 was an auth process freshly forked)

[pid  9652] socket(AF_UNIX, SOCK_STREAM, 0) = 14
[pid  9652] fcntl(14, F_GETFL)          = 0x2 (flags O_RDWR)
[pid  9652] fcntl(14, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid  9652] connect(14, {sa_family=AF_UNIX, sun_path="anvil-auth-penalty"}, 110) = -1 ECONNREFUSED (Connection refused)
[pid  9652] close(14)                   = 0
[pid  9652] nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
[pid  9652] socket(AF_UNIX, SOCK_STREAM, 0) = 14
[pid  9652] fcntl(14, F_GETFL)          = 0x2 (flags O_RDWR)
[pid  9652] fcntl(14, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid  9652] connect(14, {sa_family=AF_UNIX, sun_path="anvil-auth-penalty"}, 110) = -1 ECONNREFUSED (Connection refused)
[pid  9652] close(14)                   = 0
[pid  9652] nanosleep({tv_sec=0, tv_nsec=50000000}, NULL) = 0
[pid  9652] socket(AF_UNIX, SOCK_STREAM, 0) = 14
[pid  9652] fcntl(14, F_GETFL)          = 0x2 (flags O_RDWR)
[pid  9652] fcntl(14, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid  9652] connect(14, {sa_family=AF_UNIX, sun_path="anvil-auth-penalty"}, 110) = -1 ECONNREFUSED (Connection refused)
[pid  9652] close(14)                   = 0
[pid  9652] nanosleep({tv_sec=0, tv_nsec=90000000}, NULL) = 0


with ddd I get to:
src/lib-master/anvil-client.c: int anvil_client_connect(struct anvil_client *client, bool retry) {
//[...]
fd = retry ? net_connect_unix_with_retries(client->path, 5000) : net_connect_unix(client->path);
//[...]

and retry is forced to TRUE by the caller and net_connect_unix_with_retries retries in case of ECONNREFUSED.


How I get into it :

ssh -X root@telegraphe-test

apt install dovecot-dbg
cd /dev/shm
apt source dovecot
cd dovecot-2.2.27
pidof dovecot
ddd /usr/sbin/dovecot
set follow-fork-mode child
attach 11833
cont

# Send an e-mail from swaks to exim that use transport=smtp in LMTP mode to dovecot-lmtp, press interrupt while exim is blocking after RCPT TO:


(gdb) bt
#0  0x00007f81528af270 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f81528d8b84 in usleep (useconds=<optimized out>) at ../sysdeps/posix/usleep.c:32
#2  0x00007f815329368e in net_connect_unix_with_retries (path=0x562dc200ce10 <error: Cannot access memory at address 0x562dc200ce10>, msecs=msecs@entry=5000) at net.c:347
#3  0x00007f815320ce0d in anvil_client_connect (client=0x562dc200cda0, retry=retry@entry=true) at anvil-client.c:136
#4  0x0000562dc1429f5d in auth_penalty_init (path=0x562dc1451846 <error: Cannot access memory at address 0x562dc1451846>) at auth-penalty.c:37
#5  0x0000562dc1424166 in main_preinit () at main.c:202
#6  main (argc=<optimized out>, argv=<optimized out>) at main.c:396

(gdb) bt full
#1  0x00007f81528d8b84 in usleep (useconds=<optimized out>) at ../sysdeps/posix/usleep.c:32
         ts = {tv_sec = 0, tv_nsec = 80000000}
#2  0x00007f815329368e in net_connect_unix_with_retries (path=0x562dc200ce10 <error: Cannot access memory at address 0x562dc200ce10>, msecs=msecs@entry=5000) at net.c:347
         start = {tv_sec = 1545319486, tv_usec = 241864}
         now = {tv_sec = 1545319490, tv_usec = 198061}
         fd = -1
#3  0x00007f815320ce0d in anvil_client_connect (client=0x562dc200cda0, retry=retry@entry=true) at anvil-client.c:136
         fd = <optimized out>
         __FUNCTION__ = <error reading variable __FUNCTION__ (Cannot access memory at address 0x7f81532a8260)>
#4  0x0000562dc1429f5d in auth_penalty_init (path=0x562dc1451846 <error: Cannot access memory at address 0x562dc1451846>) at auth-penalty.c:37
No locals.
#5  0x0000562dc1424166 in main_preinit () at main.c:202
         mod_set = <error reading variable mod_set (Cannot access memory at address 0x7ffda885db80)>
#6  main (argc=<optimized out>, argv=<optimized out>) at main.c:396
         c = <optimized out>
Backtrace stopped: Cannot access memory at address 0x7ffda885dbf8
(gdb)

Code I've read :

src/auth/main.c: static void main_preinit(void)

         if (!worker)
                 auth_penalty = auth_penalty_init(AUTH_PENALTY_ANVIL_PATH);
         auth_request_stats_init();

src/auth/auth-penalty.c: struct auth_penalty *auth_penalty_init(const char *path)

         if (anvil_client_connect(penalty->client, TRUE) < 0)
                 penalty->disabled = TRUE;
         else {

src/lib-master/anvil-client.c: int anvil_client_connect(struct anvil_client *client, bool retry)

         fd = retry ? net_connect_unix_with_retries(client->path, 5000) :
                 net_connect_unix(client->path);

src/lib/net.c : int net_connect_unix_with_retries(const char *path, unsigned int msecs) {
         struct timeval start, now;
         int fd;
              if (gettimeofday(&start, NULL) < 0)
                 i_panic("gettimeofday() failed: %m");
              do {
                 fd = net_connect_unix(path);
                 if (fd != -1 || (errno != EAGAIN && errno != ECONNREFUSED))  // <------------------ HERE, I go into ECONNREFUSED case during 5 seconds
                         break;
                      /* busy. wait for a while. */
                 usleep(((rand() % 10) + 1) * 10000);
                 if (gettimeofday(&now, NULL) < 0)
                         i_panic("gettimeofday() failed: %m");
         } while (timeval_diff_msecs(&now, &start) < (int)msecs);
         return fd;
}

Additionnal environment infos (mailboxes are NFS-backed) :

lsb_release -a
No LSB modules are available.
Distributor ID:    Debian
Description:    Debian GNU/Linux 9.2 (stretch)
Release:    9.2
Codename:    stretch


mount | grep ^/
/dev/sda1 on / type ext4 (rw,relatime,errors=remount-ro,data=ordered)

mount | grep nfs
nfs1:/mail/maildir on /var/maildir type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.16.2.28,mountvers=3,mountport=635,mountproto=udp,local_lock=none,addr=172.16.2.28)
nfs1:/mail on /var/mail type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.16.2.28,mountvers=3,mountport=635,mountproto=udp,local_lock=none,addr=172.16.2.28)
cifs1:/homepermanents/info/lpouzenc on /home/lpouzenc type nfs (rw,nosuid,nodev,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.16.2.29,mountvers=3,mountport=635,mountproto=udp,local_lock=none,addr=172.16.2.29)



telegraphe-test:~# apt-cache policy dovecot-core
dovecot-core:
   Installé : 1:2.2.27-3+deb9u2
   Candidat : 1:2.2.27-3+deb9u2
  Table de version :
  *** 1:2.2.27-3+deb9u2 500
         500http://miroir/debian  stretch/main amd64 Packages
         500http://security.debian.org/debian-security  stretch/updates/main amd64 Packages
         100 /var/lib/dpkg/status

telegraphe-test:~# dovecot -n
# 2.2.27 (c0f36b0): /etc/dovecot/dovecot.conf
# Pigeonhole version 0.4.16 (fed8554)
doveconf: Warning: service auth { client_limit=10000 } is lower than required under max. load (10240)
# OS: Linux 4.9.0-4-amd64 x86_64 Debian 9.2 nfs
auth_verbose = yes
default_client_limit = 10000
default_process_limit = 2048
lda_mailbox_autocreate = yes
lda_mailbox_autosubscribe = yes
mail_debug = yes
mail_fsync = always
mail_location = maildir:/var/mail/maildir/%u
managesieve_notify_capability = mailto
managesieve_sieve_capability = fileinto reject envelope encoded-character vacation subaddress comparator-i;ascii-numeric relational regex imap4flags copy include variables body enotify environment mailbox date index ihave duplicate mime foreverypart extracttext
mmap_disable = yes
namespace inbox {
   inbox = yes
   location =
   mailbox Drafts {
     special_use = \Drafts
   }
   mailbox Junk {
     special_use = \Junk
   }
   mailbox Sent {
     special_use = \Sent
   }
   mailbox "Sent Messages" {
     special_use = \Sent
   }
   mailbox Trash {
     special_use = \Trash
   }
   prefix =
   separator = /
}
passdb {
   driver = pam
}
plugin {
   sieve =file:/var/mail/mailconf/%1u/%u/sieve;active=/var/mail/mailconf/%1u/%u/.dovecot.sieve
   sieve_before = /var/mail/mailconf/global/sieve/before.d
   sieve_global = /var/mail/mailconf/global/sieve
   sieve_vacation_dont_check_recipient = yes
}
protocols = " imap lmtp sieve sieve"
service anvil {
   unix_listener anvil-auth-penalty {
     mode = 00
   }
}
service lmtp {
   inet_listener lmtp {
     port = 24
   }
}
service managesieve {
   process_limit = 16
}
ssl_cert = </etc/ssl/certs/telegraphe7_cert.pem
ssl_key =  # hidden, use -P to show it
syslog_facility = local0
userdb {
   driver = passwd
}
verbose_proctitle = yes
protocol lmtp {
   auth_username_format = %n
   mail_plugins = " sieve"
}
protocol imap {
   mail_max_userip_connections = 20
}



Could you see a config tuning that keeps auth-penalty out of the way (it seems to behaves badly here, because we have perdition IMAP proxy in front, and it keeps track only of his IP and adds 2/4/8/15s delays to "random" legit clients.

Could it be saved with a dedicated config entry to disable auth-penalty ?

Is there solution to have auth-penalty working while having perdition IMAP proxy ?

Regards,
Ludovic Pouzenc
-- 
Ludovic POUZENC - Ingénieur Système
IMT Mines Albi  (http://www.imt-mines-albi.fr)
81013 ALBI CT Cedex 09 - Tel : 05 63 49 33 56
-- 
Ludovic POUZENC - Ingénieur Système
IMT Mines Albi  (http://www.imt-mines-albi.fr)
81013 ALBI CT Cedex 09 - Tel : 05 63 49 33 56