We have started seeing index corruption ever since we upgraded (we believe) our imap servers from SL6 to Centos 7. Mail/Indexes are stored on Netapps mounted via NFS. We have 2 lvs servers running surealived in dr/wlc, 2 directors and 6 backend imap/pop servers.
Most of the core dumps I've looked at for different users are like "Backtrace 2" with some variations on folder path.
This latest crash (Backtrace 1) is different from others I've seen. It is also leaving 0byte files in the users .Drafts/tmp folder.
# ls -s /var/spool/mail/15/00/user1/.Drafts/tmp | awk '{print $1}'
|sort | uniq -c
9692 0
1 218600
I believe the number of cores here is different from the number of tmp
files because this is when we moved the user to our debug server so we
could get the core dumps.
# ls -la /home/u/user1/core.* |wc -l
8437
Any help/insight would be greatly appreciated.
Thanks, William
OS Info: CentOS Linux release 7.5.1804 (Core) 3.10.0-862.14.4.el7.x86_64
NFS: # mount -t nfs |grep mail/15 172.16.255.14:/vol/vol1/mail/15 on /var/spool/mail/15 type nfs (rw,nosuid,nodev,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nordirplus,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.16.255.14,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=172.16.255.14)
Dovecot Info: dovecot -n # 2.1.17: /etc/dovecot/dovecot.conf # OS: Linux 3.10.0-862.14.4.el7.x86_64 x86_64 CentOS Linux release 7.5.1804 (Core) auth_failure_delay = 0 auth_master_user_separator = * auth_username_format = %Ln auth_verbose = yes auth_verbose_passwords = sha1 auth_worker_max_count = 64 login_log_format_elements = user=<%u> session=%{session} method=%m rip=%r lip=%l mpid=%e %c login_trusted_networks = 172.16.0/24 mail_debug = yes mail_fsync = always mail_log_prefix = "%s(%u): session=%{session} " mail_plugins = zlib maildir_very_dirty_syncs = yes mmap_disable = yes passdb { args = /etc/dovecot/master-users driver = passwd-file master = yes } passdb { args = imap driver = pam } plugin { lazy_expunge = DELETED_MESSAGES. mail_log_events = delete expunge flag_change mail_log_fields = uid box msgid from flags size quota = fs:User quota stats_refresh = 30 secs stats_track_cmds = yes } protocols = imap pop3 service anvil { client_limit = 10000 } service auth { client_limit = 10000 vsz_limit = 1 G } service doveadm { inet_listener { port = 1842 } unix_listener doveadm-server { mode = 0666 } } service imap-login { inet_listener imap { port = 143 } inet_listener imaps { port = 993 ssl = yes } process_limit = 7000 process_min_avail = 32 vsz_limit = 256 M } service imap-postlogin { executable = script-login -d /etc/dovecot/bin/foo-imap-postlogin user = $default_internal_user } service imap { executable = imap imap-postlogin process_limit = 7000 vsz_limit = 1492 M } service pop3-login { inet_listener pop3 { port = 110 } inet_listener pop3s { port = 995 ssl = yes } process_limit = 2000 process_min_avail = 32 vsz_limit = 256 M } service pop3-postlogin { executable = script-login -d /etc/dovecot/bin/foo-pop3-postlogin user = $default_internal_user } service pop3 { executable = pop3 pop3-postlogin process_limit = 2000 } shutdown_clients = no ssl = required ssl_ca =
Backtrace 1: Reading symbols from /usr/libexec/dovecot/imap...Reading symbols from /usr/lib/debug/usr/libexec/dovecot/imap.debug...done. Program terminated with signal 6, Aborted. #0 0x00007fbee47e0277 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
Thread 1 (Thread 0x7fbee532f840 (LWP 9449)):
#0 0x00007fbee47e0277 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
resultvar = 0
pid = 9449
selftid = 9449
#1 0x00007fbee47e1968 in __GI_abort () at abort.c:90
save_stage = 2
act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction =
0x0}, sa_mask = {__val = {0
Backtrace 2: Reading symbols from /usr/libexec/dovecot/imap...Reading symbols from /usr/lib/debug/usr/libexec/dovecot/imap.debug...done. Program terminated with signal 6, Aborted. #0 0x00007f437288c277 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
Thread 1 (Thread 0x7f43733d3840 (LWP 6725)):
#0 0x00007f437288c277 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
resultvar = 0
pid = 6725
selftid = 6725
#1 0x00007f437288d968 in __GI_abort () at abort.c:90
save_stage = 2
act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction =
0x0}, sa_mask = {__val = {0
On 09.10.2018 22:16, William Taylor wrote:
We have started seeing index corruption ever since we upgraded (we believe) our imap servers from SL6 to Centos 7. Mail/Indexes are stored on Netapps mounted via NFS. We have 2 lvs servers running surealived in dr/wlc, 2 directors and 6 backend imap/pop servers.
Most of the core dumps I've looked at for different users are like "Backtrace 2" with some variations on folder path.
This latest crash (Backtrace 1) is different from others I've seen. It is also leaving 0byte files in the users .Drafts/tmp folder.
# ls -s /var/spool/mail/15/00/user1/.Drafts/tmp | awk '{print $1}'
|sort | uniq -c 9692 0 1 218600I believe the number of cores here is different from the number of tmp files because this is when we moved the user to our debug server so we could get the core dumps. # ls -la /home/u/user1/core.* |wc -l
8437Any help/insight would be greatly appreciated.
Thanks, William
OS Info: CentOS Linux release 7.5.1804 (Core) 3.10.0-862.14.4.el7.x86_64
NFS: # mount -t nfs |grep mail/15 172.16.255.14:/vol/vol1/mail/15 on /var/spool/mail/15 type nfs (rw,nosuid,nodev,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nordirplus,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.16.255.14,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=172.16.255.14)
Dovecot Info: dovecot -n # 2.1.17: /etc/dovecot/dovecot.conf
Hi!
Thank you for your report, however, 2.1.17 is VERY old version of dovecot and this problem is very likely fixed in a more recent version.
Aki
On Wed, 2018-10-10 at 09:37 +0300, Aki Tuomi wrote:
On 09.10.2018 22:16, William Taylor wrote:
...
Dovecot Info: dovecot -n # 2.1.17: /etc/dovecot/dovecot.conf
Hi!
Thank you for your report, however, 2.1.17 is VERY old version of dovecot and this problem is very likely fixed in a more recent version.
Aki
Like RHEL 7, CentOS 7.5 should run 2.2.10 -- which is well hung either. http://mirror.centos.org/centos/7/os/x86_64/Packages/
Martin
On Wed, Oct 10, 2018 at 09:37:46AM +0300, Aki Tuomi wrote:
On 09.10.2018 22:16, William Taylor wrote:
We have started seeing index corruption ever since we upgraded (we believe) our imap servers from SL6 to Centos 7. Mail/Indexes are stored on Netapps mounted via NFS. We have 2 lvs servers running surealived in dr/wlc, 2 directors and 6 backend imap/pop servers.
Most of the core dumps I've looked at for different users are like "Backtrace 2" with some variations on folder path.
This latest crash (Backtrace 1) is different from others I've seen. It is also leaving 0byte files in the users .Drafts/tmp folder.
# ls -s /var/spool/mail/15/00/user1/.Drafts/tmp | awk '{print $1}'
|sort | uniq -c 9692 0 1 218600I believe the number of cores here is different from the number of tmp files because this is when we moved the user to our debug server so we could get the core dumps. # ls -la /home/u/user1/core.* |wc -l
8437Any help/insight would be greatly appreciated.
Thanks, William
OS Info: CentOS Linux release 7.5.1804 (Core) 3.10.0-862.14.4.el7.x86_64
NFS: # mount -t nfs |grep mail/15 172.16.255.14:/vol/vol1/mail/15 on /var/spool/mail/15 type nfs (rw,nosuid,nodev,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nordirplus,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.16.255.14,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=172.16.255.14)
Dovecot Info: dovecot -n # 2.1.17: /etc/dovecot/dovecot.conf
Hi!
Thank you for your report, however, 2.1.17 is VERY old version of dovecot and this problem is very likely fixed in a more recent version.
Aki
I realize it is an older release.
Are you saying that there is a bug in this version that affects RHEL 7.5 but not RHEL 6 or just use the newest version and maybe the problem goes away?
On 10/10/18 7:26 AM, Aki Tuomi wrote:
Are you saying that there is a bug in this version that affects RHEL 7.5 but not RHEL 6 or just use the newest version and maybe the problem goes away?
We have very limited interest in figuring out problems with (very) old dovecot versions. At minimum you need to show this problem with 2.2.36 or 2.3.2.1.
A thing you should make sure is that you are not accessing the user with two different servers concurrently.
The directors appear to be working fine so, no, users aren't hitting multiple back end servers.
To be clear, we don't suspect Dovecot as much - our deployment had been stable for years - but rather behavior changes between the RHEL6 and RHLE7 environment, particularly with regards to NFSv3. But we've have been at a loss to find a smoking gun.
For various reasons achieving stability (again) on the current version is very important while we continue to plan Dovecot and storage backend upgrades. Corruption leading to crashes is very infrequent percentage wise but it's enough to negatively impact performance and impact users -- out of 5+ million sessions/day we're seeing ~5 instances whereas on 6 it would have been one every few months.
Has anyone else experienced any NFS/locking issues transitioning from RHEL6 to 7 with Netapp storage? Grasping at straws - perhaps compiler and/or system library issues interacting with Dovecot?
-K
On 10.10.2018 19:12, William Taylor wrote:
OS Info:
CentOS Linux release 7.5.1804 (Core) 3.10.0-862.14.4.el7.x86_64
NFS: # mount -t nfs |grep mail/15 172.16.255.14:/vol/vol1/mail/15 on /var/spool/mail/15 type nfs (rw,nosuid,nodev,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nordirplus,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.16.255.14,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=172.16.255.14)
Dovecot Info: dovecot -n # 2.1.17: /etc/dovecot/dovecot.conf
Hi!
Thank you for your report, however, 2.1.17 is VERY old version of dovecot and this problem is very likely fixed in a more recent version.
Aki
I realize it is an older release.
Are you saying that there is a bug in this version that affects RHEL 7.5 but not RHEL 6 or just use the newest version and maybe the problem goes away?
I can see from my CentOS 7 installation that it comes with 2.2.10-8.el7 package. Did you install 2.1.17 specifically somehow?
I'm using dovecot 2.3.3 as packaged by the developers in CentOS 7 myself.
Good luck, Reio
participants (5)
-
Aki Tuomi
-
Kelsey Cummings
-
Martin Johannes Dauser
-
Reio Remma
-
William Taylor