On 19.06.2017 20:36, antonello.cioffi@uniparthenope.it wrote:
Hi everybody
I'm finding on my server a lot of a lot of errors like this:
Jun 19 14:22:45 posta2 kernel: [885017.412902] BUG: soft lockup - CPU#2 stuck for 22s! [dovecot-lda:11955] Jun 19 14:22:45 posta2 kernel: [885017.412906] Modules linked in: ocfs2(E) jbd2 quota_tree dm_service_time dm_multipath ocfs2_dlmfs(E) ocfs2_stack_o2cb(E) ocfs2_dlm( E) ocfs2_nodemanager(E) ocfs2_stackglue(E) configfs cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf fuse nls_iso8859_1 nls_cp437 vfat fat loop shpchp iTCO_wdt bnx2 pci_hotplug rtc_cmos ipv6 ipv6_lib cdc_ether usbnet ioatdma dca sg tpm_tis i2c_i801 serio_raw mii i7core_edac edac_core pcspkr iTCO_vendor _support tpm mptctl tpm_bios button ext3 jbd mbcache dm_mirror dm_region_hash dm_log linear ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect i2c_core syscop yarea uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif processor thermal_sys hwmon scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh_rdac scsi_dh dm_snapshot dm_m od mptsas mptscsih mptbase scsi_transport_sas scsi_mod Jun 19 14:22:45 posta2 kernel: [885017.412971] Supported: Yes Jun 19 14:22:45 posta2 kernel: [885017.412973] CPU 2 Jun 19 14:22:45 posta2 kernel: [885017.412975] Modules linked in: ocfs2(E) jbd2 quota_tree dm_service_time dm_multipath ocfs2_dlmfs(E) ocfs2_stack_o2cb(E) ocfs2_dlm( E) ocfs2_nodemanager(E) ocfs2_stackglue(E) configfs cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf fuse nls_iso8859_1 nls_cp437 vfat fat loop shpchp iTCO_wdt bnx2 pci_hotplug rtc_cmos ipv6 ipv6_lib cdc_ether usbnet ioatdma dca sg tpm_tis i2c_i801 serio_raw mii i7core_edac edac_core pcspkr iTCO_vendor _support tpm mptctl tpm_bios button ext3 jbd mbcache dm_mirror dm_region_hash dm_log linear ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect i2c_core syscop yarea uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif processor thermal_sys hwmon scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh_rdac scsi_dh dm_snapshot dm_m od mptsas mptscsih mptbase scsi_transport_sas scsi_mod Jun 19 14:22:45 posta2 kernel: [885017.413027] Supported: Yes Jun 19 14:22:45 posta2 kernel: [885017.413029] Jun 19 14:22:45 posta2 kernel: [885017.413032] Pid: 11955, comm: dovecot-lda Tainted: G E 3.0.101-0.46-default #1 IBM BladeCenter HS22 -[7870H5G]-/68Y813 8 Jun 19 14:22:45 posta2 kernel: [885017.413037] RIP: 0010:[<ffffffff81257b07>] [<ffffffff81257b07>] find_next_zero_bit+0x67/0xc0 Jun 19 14:22:45 posta2 kernel: [885017.413046] RSP: 0018:ffff88006cc855f0 EFLAGS: 00000287 Jun 19 14:22:45 posta2 kernel: [885017.413049] RAX: 0000000000006f30 RBX: ffffffff8118b348 RCX: 0000000000000010 Jun 19 14:22:45 posta2 kernel: [885017.413051] RDX: 0001000000000000 RSI: 0000000000006f30 RDI: 0000000000006f00 Jun 19 14:22:45 posta2 kernel: [885017.413054] RBP: 000000000000000f R08: 0000000000000f00 R09: ffff88006cc857e8 Jun 19 14:22:45 posta2 kernel: [885017.413057] R10: 000000000000000b R11: ffff88063c1fa678 R12: ffffffff8146d1ee Jun 19 14:22:45 posta2 kernel: [885017.413059] R13: 0000000000000000 R14: ffff880655a918c0 R15: 0000000001f9f800 Jun 19 14:22:45 posta2 kernel: [885017.413063] FS: 00007f7387952700(0000) GS:ffff88067f240000(0000) knlGS:0000000000000000 Jun 19 14:22:45 posta2 kernel: [885017.413066] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 19 14:22:45 posta2 kernel: [885017.413068] CR2: 00007f7386fad9b0 CR3: 000000011f6cc000 CR4: 00000000000007e0 Jun 19 14:22:45 posta2 kernel: [885017.413071] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 19 14:22:45 posta2 kernel: [885017.413074] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jun 19 14:22:45 posta2 kernel: [885017.413077] Process dovecot-lda (pid: 11955, threadinfo ffff88006cc84000, task ffff88003f3284c0) Jun 19 14:22:45 posta2 kernel: [885017.413079] Stack: Jun 19 14:22:45 posta2 kernel: [885017.413085] ffffffffa052f86e 0000000000000000 ffff88006cc857e8 00007e000000000b Jun 19 14:22:45 posta2 kernel: [885017.413090] 6843000000006a00 ffff88063c1fa67a 0000000000000000 000000001f400000 Jun 19 14:22:45 posta2 kernel: [885017.413095] 0000000000007e00 0000000000007e00 ffff880653aaccb8 ffff880656f7f000 Jun 19 14:22:45 posta2 kernel: [885017.413100] Call Trace: Jun 19 14:22:45 posta2 kernel: [885017.413143] [<ffffffffa052f86e>] ocfs2_block_group_find_clear_bits+0x6e/0x180 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.413224] [<ffffffffa052fa23>] ocfs2_cluster_group_search+0xa3/0x1f0 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.413297] [<ffffffffa0530369>] ocfs2_search_chain+0x139/0x730 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.413367] [<ffffffffa0531ce8>] ocfs2_claim_suballoc_bits+0x398/0x520 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.413447] [<ffffffffa0531f0f>] __ocfs2_claim_clusters+0x9f/0x340 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.413519] [<ffffffffa050f8ff>] ocfs2_local_alloc_new_window+0x1cf/0x320 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.413581] [<ffffffffa050fdd0>] ocfs2_local_alloc_slide_window+0x380/0x5e0 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.413642] [<fffffffbyefa05101d3>] ocfs2_reserve_local_alloc_bits+0x1a3/0x330 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.413706] [<ffffffffa053396b>] ocfs2_reserve_clusters_with_limit+0xab/0x330 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.413780] [<ffffffffa0534d15>] ocfs2_lock_allocators+0xc5/0x290 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.413847] [<ffffffffa04e30fd>] ocfs2_write_begin_nolock+0x89d/0x11d0 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.413883] [<ffffffffa04e3b47>] ocfs2_write_begin+0x117/0x240 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.413900] [<ffffffff810fa1a2>] generic_perform_write+0xc2/0x1c0 Jun 19 14:22:45 posta2 kernel: [885017.413907] [<ffffffff810fa301>] generic_file_buffered_write+0x61/0xa0 Jun 19 14:22:45 posta2 kernel: [885017.413936] [<ffffffffa050055f>] ocfs2_file_aio_write+0x92f/0x960 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.413966] [<ffffffff8115d478>] do_sync_write+0xc8/0x110 Jun 19 14:22:45 posta2 kernel: [885017.413972] [<ffffffff8115daae>] vfs_write+0xce/0x140 Jun 19 14:22:45 posta2 kernel: [885017.413977] [<ffffffff8115dc23>] sys_write+0x53/0xa0 Jun 19 14:22:45 posta2 kernel: [885017.413983] [<ffffffff8146c812>] system_call_fastpath+0x16/0x1b Jun 19 14:22:45 posta2 kernel: [885017.413992] [<00007f7386c54300>] 0x7f7386c542ff Jun 19 14:22:45 posta2 kernel: [885017.413994] Code: 3f 77 61 48 c7 c0 ff ff ff ff 44 89 c1 4a 8d 34 07 48 d3 e0 48 09 c2 48 83 fa ff 74 0b 48 f7 d2 48 0f bc c2 48 8 d 34 38 48 89 f0 <c3> 0f 1f 84 00 00 00 00 00 48 8b 10 48 83 fa ff 75 e0 48 83 c0 Jun 19 14:22:45 posta2 kernel: [885017.414037] Call Trace: Jun 19 14:22:45 posta2 kernel: [885017.414070] [<ffffffffa052f86e>] ocfs2_block_group_find_clear_bits+0x6e/0x180 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.414141] [<ffffffffa052fa23>] ocfs2_cluster_group_search+0xa3/0x1f0 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.414213] [<ffffffffa0530369>] ocfs2_search_chain+0x139/0x730 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.414286] [<ffffffffa0531ce8>] ocfs2_claim_suballoc_bits+0x398/0x520 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.414359] [<ffffffffa0531f0f>] __ocfs2_claim_clusters+0x9f/0x340 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.414359] [<ffffffffa0531f0f>] __ocfs2_claim_clusters+0x9f/0x340 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.414430] [<ffffffffa050f8ff>] ocfs2_local_alloc_new_window+0x1cf/0x320 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.414492] [<ffffffffa050fdd0>] ocfs2_local_alloc_slide_window+0x380/0x5e0 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.414553] [<ffffffffa05101d3>] ocfs2_reserve_local_alloc_bits+0x1a3/0x330 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.414617] [<ffffffffa053396b>] ocfs2_reserve_clusters_with_limit+0xab/0x330 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.414691] [<ffffffffa0534d15>] ocfs2_lock_allocators+0xc5/0x290 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.414758] [<ffffffffa04e30fd>] ocfs2_write_begin_nolock+0x89d/0x11d0 [ocfs2] Jun 19 14:22:45 posta2 kernel: [885017.414793] [<ffffffffa04e3b47>] ocfs2_write_begin+0x117/0x240 [ocfs2] Jun 19 14:22:46 posta2 kernel: [885017.414809] [<ffffffff810fa1a2>] generic_perform_write+0xc2/0x1c0 Jun 19 14:22:46 posta2 kernel: [885017.414815] [<ffffffff810fa301>] generic_file_buffered_write+0x61/0xa0 Jun 19 14:22:46 posta2 kernel: [885017.414844] [<ffffffffa050055f>] ocfs2_file_aio_write+0x92f/0x960 [ocfs2] Jun 19 14:22:46 posta2 kernel: [885017.414872] [<ffffffff8115d478>] do_sync_write+0xc8/0x110 Jun 19 14:22:46 posta2 kernel: [885017.414878] [<ffffffff8115daae>] vfs_write+0xce/0x140 Jun 19 14:22:46 posta2 kernel: [885017.414883] [<ffffffff8115dc23>] sys_write+0x53/0xa0 Jun 19 14:22:46 posta2 kernel: [885017.414888] [<ffffffff8146c812>] system_call_fastpath+0x16/0x1b Jun 19 14:22:46 posta2 kernel: [885017.414895] [<00007f7386c54300>] 0x7f7386c542ff
The machine is a SUSE Linux Enterprise Server 11 SP 3
posta2:/var/core # dovecot -n # 2.2.30.2 (c0c463e): /usr/local/etc/dovecot/dovecot.conf # Pigeonhole version 0.4.18 (29cc74d) # OS: Linux 3.0.101-0.46-default x86_64 SUSE Linux Enterprise Server 11 (x86_64) auth_mechanisms = plain login auth_username_format = %Ln auth_verbose = yes default_internal_user = vmail default_login_user = nobody disable_plaintext_auth = no first_valid_uid = 100 hostname = mail.xxxx.it lda_mailbox_autocreate = yes lda_mailbox_autosubscribe = yes mail_debug = yes mail_gid = 100 mail_location = maildir:%h mail_plugins = " quota" mail_uid = 1002 managesieve_notify_capability = mailto managesieve_sieve_capability = fileinto reject envelope encoded-character vacation subaddress comparator-i;ascii-numeric relational regex imap4flags copy include variables body enotify environment mailbox date index ihave duplicate mime foreverypart extracttext spamtest spamtestplus passdb { args = /usr/local/etc/dovecot/dovecot-people-ldap.conf.ext driver = ldap } plugin { last_login_dict = redis:host=127.0.0.1:port=6379 mail_log_events = delete undelete expunge copy mailbox_delete mailbox_rename mail_log_fields = uid box msgid size quota = maildir:User quota quota_warning = storage=95%% quota-warning 95 %u quota_warning2 = storage=80%% quota-warning 80 %u sieve = file:~/sieve;active=~/.dovecot.sieve sieve_before = /usr/local/etc/dovecot/sieve/ sieve_dir = ~/.sieve sieve_extensions = +spamtest +spamtestplus +relational +comparator-i;ascii-numeric } postmaster_address = postmaster@uniparthenope.it protocols = imap pop3 lmtp sieve service auth { unix_listener /var/spool/postfix/private/auth { group = postfix mode = 0666 user = postfix } } service imap-login { inet_listener imap { port = 143 } inet_listener imaps { port = 993 ssl = yes } service_count = 0 vsz_limit = 256 M } service managesieve-login { inet_listener sieve { port = 4190 } service_count = 1 vsz_limit = 64 M } service pop3-login { inet_listener pop3 { port = 110 } inet_listener pop3s { port = 995 ssl = yes } service_count = 0 vsz_limit = 256 M } service quota-status { client_limit = 1 executable = quota-status -p postfix inet_listener { port = 12340 } } service quota-warning { executable = script /usr/local/bin/quota-warning.sh unix_listener quota-warning { user = vmail } user = vmail } ssl_ca =
I've upgraded dovecot from 2.2.18 to 2.2.30.2 but errors are still present
posta2:/var/core # dovecot --version 2.2.30.2 (c0c463e)bye
Errors appear only with dovecot-lda process so I tend to exsclude disks or ocfs failure
Is there someone who can help me?
Hello
To us it looks like the crash is happening inside your operating systems kernel. That would also be the primary target when looking for a resolution. There might not be anything that can be done by Dovecot.
br, Teemu
Best regards
Thanks