[Bug Report] anvil panic: anvil-connection.c:727: unreached triggered by duplicate imap-login handshake with pid=-1
Environment:
FreeBSD 14.4-RELEASE Dovecot 2.4.3 (e278b1e09b) Pigeonhole 2.4.3 (0bf9ee48) Postfix 3.11.1, rspamd 4.0.0
Description:
The dovecot anvil process is crashing repeatedly with a panic in anvil-connection.c at line 727. The panic is triggered by i_unreached() in anvil_connection_destroy(). Total of ~18 crashes in the last 24 hours, happens to multiple users. The anvil process restarts successfully and works as expected other than the crashes. No other symptoms that I can see that appear related.
Log messages:
dovecot[87542]: anvil: Panic: file anvil-connection.c: line 727: unreached dovecot[87542]: anvil: Fatal: master: service(anvil): child 88408 killed with signal 6 (core not dumped) dovecot[87542]: anvil: Error: connect limit: disconnection for unknown (pid=88756, user=<username>, service=imap, ip=X.X.X.X, ...) dovecot[87542]: anvil: Warning: conn unix:anvil (uid=0): Handshake with duplicate service=imap-login pid=-1 - replacing the old connection
Analysis (from claude):
In anvil_connection_destroy(), when conn->added_to_hash is true, the code asserts that the connection must still be present in anvil_connections_hash. However, when a duplicate handshake is received for the same (service, pid) key, the old connection appears to be evicted from the hash table without clearing its added_to_hash flag. When the old connection is subsequently destroyed, hash_table_lookup_full() fails to find it, triggering i_unreached().
The pid=-1 in the duplicate handshake warning is also notable — this does not appear to be a valid process ID and may indicate that the handshake is being sent before the imap-login process has fully initialized.
Workaround:
Setting service_restart_request_count = unlimited and service_process_min_avail = 2 on the imap-login service seems to have reduces process turnover and appears to have reduced the frequency of the crash, but I’ve only got a couple of hours since that change so too early to say for sure.
Suggested fix:
When a duplicate handshake causes an existing connection to be evicted from anvil_connections_hash, the evicted connection's added_to_hash flag should be cleared to prevent the assertion failure in anvil_connection_destroy().
Appreciate any insight, and happy to provide additional info that would be helpful.
—Jeff
participants (1)
-
Jeff Aitken