[Dovecot] Bug report v2.0.13 - CentOS x86_64 - NFS - mbox
Greetings to all.
It's my first post to the list. We just completed a migration from qpopper to dovecot for our IMAP and POP3 services. We have a rather large mail environment (we are the biggest provider in Greece).
So, here are the details:
- Keep getting errors like these in our production environment
Jul 22 00:18:21 pop01 dovecot: master: Error: service(pop3): child 4078 killed with signal 11 (core dumps disabled) Jul 22 00:19:31 pop03 dovecot: master: Error: service(pop3): child 18849 killed with signal 11 (core dumps disabled)
dovecot -n output
/opt/dovecot/sbin/dovecot -n # 2.0.13: /opt/dovecot/etc/dovecot/dovecot.conf # OS: Linux 2.6.18-92.1.22.el5 x86_64 CentOS release 5.5 (Final) auth_cache_negative_ttl = 10 mins auth_cache_size = 5 M auth_cache_ttl = 10 mins auth_verbose = yes default_client_limit = 5000 default_process_limit = 500 disable_plaintext_auth = no first_valid_uid = 200 listen = * log_timestamp = "%Y-%m-%d %H:%M:%S " login_greeting = <COMPANY> ready mail_access_groups = mail otemail disk root mail_fsync = always mail_location = mbox:INDEX=/var/index/dovecot/%2.16Hn/%2.254Hn/%u mail_nfs_storage = yes mbox_lock_timeout = 2 mins mbox_min_index_size = 200 k mbox_read_locks = dotlock_try fcntl mbox_write_locks = dotlock_try fcntl passdb { args = /opt/dovecot/etc/dovecot/dovecot-ldap.conf.ext driver = ldap } protocols = imap pop3 service auth-worker { user = dovenull } service imap-login { inet_listener imap { port = 143 } inet_listener imaps { port = 993 ssl = yes } } service pop3-login { inet_listener pop3 { port = 110 } inet_listener pop3s { port = 995 ssl = yes } } ssl = no userdb { args = /opt/dovecot/etc/dovecot/dovecot-ldap.conf.ext driver = ldap } verbose_proctitle = yes protocol imap { imap_client_workarounds = delay-newmail tb-extra-mailbox-sep mail_max_userip_connections = 100 } protocol pop3 { mail_max_userip_connections = 100 pop3_client_workarounds = outlook-no-nuls oe-ns-eoh pop3_fast_size_lookups = yes pop3_lock_session = yes pop3_reuse_xuidl = yes pop3_uidl_format = %08Xu%08Xv }
I enabled core dumps in one of our backend servers and here is the relevant gdb trace:
[root@pop08 ~]# gdb /opt/dovecot/libexec/dovecot/pop3
All traces of the crashes are identical, that is #0 0x00002b52e1027e54 in istream_raw_mbox_get_start_offset () from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 #1 0x00002b52e102b759 in ?? () from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 #2 0x00002b52e100a2c0 in index_mail_expunge () from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 #3 0x0000000000405e9c in client_update_mails () #4 0x00000000004061c1 in client_command_execute () #5 0x00000000004045b9 in client_handle_input () #6 0x00002b52e12df698 in io_loop_call_io () from /opt/dovecot/lib/dovecot/libdovecot.so.0 #7 0x00002b52e12e09d5 in io_loop_handler_run () from /opt/dovecot/lib/dovecot/libdovecot.so.0 #8 0x00002b52e12df62d in io_loop_run () from /opt/dovecot/lib/dovecot/libdovecot.so.0 #9 0x00002b52e12cdf13 in master_service_run () from /opt/dovecot/lib/dovecot/libdovecot.so.0 #10 0x0000000000403994 in main ()
We have mboxes over NFS and we also have an ldap user backend. For now, I do not have a scenario that reproduces the problem. Any idea, or input are highly appreciated. Of course I can provide any information requested (without exposing restricted company or client data) to help trace the problem and lead to the solution.
Thanks and keep up the good work!
Regards,
Kostas Zorbadelos
On 07/22/2011 01:02 PM, Kostas Zorbadelos wrote:
Hello,
since I saw no action on this, here is a newer update we discovered today.
After setting pop3_lock_session = no the core dumps went away. We will leave it like that and watch it for the next few days. If we set pop3_lock_session = yes, the problem is reproduced.
If I can do anything else to help debug the problem, please let me know.
Regards,
Kostas
Greetings to all.
It's my first post to the list. We just completed a migration from qpopper to dovecot for our IMAP and POP3 services. We have a rather large mail environment (we are the biggest provider in Greece).
So, here are the details:
- Keep getting errors like these in our production environment
Jul 22 00:18:21 pop01 dovecot: master: Error: service(pop3): child 4078 killed with signal 11 (core dumps disabled) Jul 22 00:19:31 pop03 dovecot: master: Error: service(pop3): child 18849 killed with signal 11 (core dumps disabled)
dovecot -n output
/opt/dovecot/sbin/dovecot -n # 2.0.13: /opt/dovecot/etc/dovecot/dovecot.conf # OS: Linux 2.6.18-92.1.22.el5 x86_64 CentOS release 5.5 (Final) auth_cache_negative_ttl = 10 mins auth_cache_size = 5 M auth_cache_ttl = 10 mins auth_verbose = yes default_client_limit = 5000 default_process_limit = 500 disable_plaintext_auth = no first_valid_uid = 200 listen = * log_timestamp = "%Y-%m-%d %H:%M:%S " login_greeting =<COMPANY> ready mail_access_groups = mail otemail disk root mail_fsync = always mail_location = mbox:INDEX=/var/index/dovecot/%2.16Hn/%2.254Hn/%u mail_nfs_storage = yes mbox_lock_timeout = 2 mins mbox_min_index_size = 200 k mbox_read_locks = dotlock_try fcntl mbox_write_locks = dotlock_try fcntl passdb { args = /opt/dovecot/etc/dovecot/dovecot-ldap.conf.ext driver = ldap } protocols = imap pop3 service auth-worker { user = dovenull } service imap-login { inet_listener imap { port = 143 } inet_listener imaps { port = 993 ssl = yes } } service pop3-login { inet_listener pop3 { port = 110 } inet_listener pop3s { port = 995 ssl = yes } } ssl = no userdb { args = /opt/dovecot/etc/dovecot/dovecot-ldap.conf.ext driver = ldap } verbose_proctitle = yes protocol imap { imap_client_workarounds = delay-newmail tb-extra-mailbox-sep mail_max_userip_connections = 100 } protocol pop3 { mail_max_userip_connections = 100 pop3_client_workarounds = outlook-no-nuls oe-ns-eoh pop3_fast_size_lookups = yes pop3_lock_session = yes pop3_reuse_xuidl = yes pop3_uidl_format = %08Xu%08Xv }
I enabled core dumps in one of our backend servers and here is the relevant gdb trace:
[root@pop08 ~]# gdb /opt/dovecot/libexec/dovecot/pop3
/core.9273 GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-32.el5_6.2) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or laterhttp://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /opt/dovecot/libexec/dovecot/pop3...(no debugging symbols found)...done. Reading symbols from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0...(no debugging symbols found)...done. Loaded symbols for /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 Reading symbols from /opt/dovecot/lib/dovecot/libdovecot.so.0...(no debugging symbols found)...done. Loaded symbols for /opt/dovecot/lib/dovecot/libdovecot.so.0 Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/librt.so.1 Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done. Loaded symbols for /lib64/libpthread.so.0 Core was generated by `dovecot/pop3'. Program terminated with signal 11, Segmentation fault. #0 0x00002b52e1027e54 in istream_raw_mbox_get_start_offset () from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 (gdb) bt full #0 0x00002b52e1027e54 in istream_raw_mbox_get_start_offset () from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 No symbol table info available. #1 0x00002b52e102b759 in ?? () from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 No symbol table info available. #2 0x00002b52e100a2c0 in index_mail_expunge () from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 No symbol table info available. #3 0x0000000000405e9c in client_update_mails () No symbol table info available. #4 0x00000000004061c1 in client_command_execute () No symbol table info available. #5 0x00000000004045b9 in client_handle_input () No symbol table info available. #6 0x00002b52e12df698 in io_loop_call_io () from /opt/dovecot/lib/dovecot/libdovecot.so.0 No symbol table info available. #7 0x00002b52e12e09d5 in io_loop_handler_run () from /opt/dovecot/lib/dovecot/libdovecot.so.0 No symbol table info available. #8 0x00002b52e12df62d in io_loop_run () from /opt/dovecot/lib/dovecot/libdovecot.so.0 No symbol table info available. #9 0x00002b52e12cdf13 in master_service_run () from /opt/dovecot/lib/dovecot/libdovecot.so.0 No symbol table info available. #10 0x0000000000403994 in main () No symbol table info available. (gdb) All traces of the crashes are identical, that is #0 0x00002b52e1027e54 in istream_raw_mbox_get_start_offset () from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 #1 0x00002b52e102b759 in ?? () from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 #2 0x00002b52e100a2c0 in index_mail_expunge () from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 #3 0x0000000000405e9c in client_update_mails () #4 0x00000000004061c1 in client_command_execute () #5 0x00000000004045b9 in client_handle_input () #6 0x00002b52e12df698 in io_loop_call_io () from /opt/dovecot/lib/dovecot/libdovecot.so.0 #7 0x00002b52e12e09d5 in io_loop_handler_run () from /opt/dovecot/lib/dovecot/libdovecot.so.0 #8 0x00002b52e12df62d in io_loop_run () from /opt/dovecot/lib/dovecot/libdovecot.so.0 #9 0x00002b52e12cdf13 in master_service_run () from /opt/dovecot/lib/dovecot/libdovecot.so.0 #10 0x0000000000403994 in main ()
We have mboxes over NFS and we also have an ldap user backend. For now, I do not have a scenario that reproduces the problem. Any idea, or input are highly appreciated. Of course I can provide any information requested (without exposing restricted company or client data) to help trace the problem and lead to the solution.
Thanks and keep up the good work!
Regards,
Kostas Zorbadelos
With a quick test I can't reproduce pop3_lock_session=yes causing a crash. I guess it needs something else besides what I tested. It would be helpful if your Dovecot binaries weren't stripped of debug symbols. I could then ask for some more information from the core dumps with gdb.
On Wed, 2011-08-10 at 13:07 +0300, Kostas Zorbadelos wrote:
On 07/22/2011 01:02 PM, Kostas Zorbadelos wrote:
Hello,
since I saw no action on this, here is a newer update we discovered today.
After setting pop3_lock_session = no the core dumps went away. We will leave it like that and watch it for the next few days. If we set pop3_lock_session = yes, the problem is reproduced.
If I can do anything else to help debug the problem, please let me know.
Regards,
Kostas
Greetings to all.
It's my first post to the list. We just completed a migration from qpopper to dovecot for our IMAP and POP3 services. We have a rather large mail environment (we are the biggest provider in Greece).
So, here are the details:
- Keep getting errors like these in our production environment
Jul 22 00:18:21 pop01 dovecot: master: Error: service(pop3): child 4078 killed with signal 11 (core dumps disabled) Jul 22 00:19:31 pop03 dovecot: master: Error: service(pop3): child 18849 killed with signal 11 (core dumps disabled)
dovecot -n output
/opt/dovecot/sbin/dovecot -n # 2.0.13: /opt/dovecot/etc/dovecot/dovecot.conf # OS: Linux 2.6.18-92.1.22.el5 x86_64 CentOS release 5.5 (Final) auth_cache_negative_ttl = 10 mins auth_cache_size = 5 M auth_cache_ttl = 10 mins auth_verbose = yes default_client_limit = 5000 default_process_limit = 500 disable_plaintext_auth = no first_valid_uid = 200 listen = * log_timestamp = "%Y-%m-%d %H:%M:%S " login_greeting =<COMPANY> ready mail_access_groups = mail otemail disk root mail_fsync = always mail_location = mbox:INDEX=/var/index/dovecot/%2.16Hn/%2.254Hn/%u mail_nfs_storage = yes mbox_lock_timeout = 2 mins mbox_min_index_size = 200 k mbox_read_locks = dotlock_try fcntl mbox_write_locks = dotlock_try fcntl passdb { args = /opt/dovecot/etc/dovecot/dovecot-ldap.conf.ext driver = ldap } protocols = imap pop3 service auth-worker { user = dovenull } service imap-login { inet_listener imap { port = 143 } inet_listener imaps { port = 993 ssl = yes } } service pop3-login { inet_listener pop3 { port = 110 } inet_listener pop3s { port = 995 ssl = yes } } ssl = no userdb { args = /opt/dovecot/etc/dovecot/dovecot-ldap.conf.ext driver = ldap } verbose_proctitle = yes protocol imap { imap_client_workarounds = delay-newmail tb-extra-mailbox-sep mail_max_userip_connections = 100 } protocol pop3 { mail_max_userip_connections = 100 pop3_client_workarounds = outlook-no-nuls oe-ns-eoh pop3_fast_size_lookups = yes pop3_lock_session = yes pop3_reuse_xuidl = yes pop3_uidl_format = %08Xu%08Xv }
I enabled core dumps in one of our backend servers and here is the relevant gdb trace:
[root@pop08 ~]# gdb /opt/dovecot/libexec/dovecot/pop3
/core.9273 GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-32.el5_6.2) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or laterhttp://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /opt/dovecot/libexec/dovecot/pop3...(no debugging symbols found)...done. Reading symbols from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0...(no debugging symbols found)...done. Loaded symbols for /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 Reading symbols from /opt/dovecot/lib/dovecot/libdovecot.so.0...(no debugging symbols found)...done. Loaded symbols for /opt/dovecot/lib/dovecot/libdovecot.so.0 Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/librt.so.1 Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done. Loaded symbols for /lib64/libpthread.so.0 Core was generated by `dovecot/pop3'. Program terminated with signal 11, Segmentation fault. #0 0x00002b52e1027e54 in istream_raw_mbox_get_start_offset () from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 (gdb) bt full #0 0x00002b52e1027e54 in istream_raw_mbox_get_start_offset () from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 No symbol table info available. #1 0x00002b52e102b759 in ?? () from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 No symbol table info available. #2 0x00002b52e100a2c0 in index_mail_expunge () from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 No symbol table info available. #3 0x0000000000405e9c in client_update_mails () No symbol table info available. #4 0x00000000004061c1 in client_command_execute () No symbol table info available. #5 0x00000000004045b9 in client_handle_input () No symbol table info available. #6 0x00002b52e12df698 in io_loop_call_io () from /opt/dovecot/lib/dovecot/libdovecot.so.0 No symbol table info available. #7 0x00002b52e12e09d5 in io_loop_handler_run () from /opt/dovecot/lib/dovecot/libdovecot.so.0 No symbol table info available. #8 0x00002b52e12df62d in io_loop_run () from /opt/dovecot/lib/dovecot/libdovecot.so.0 No symbol table info available. #9 0x00002b52e12cdf13 in master_service_run () from /opt/dovecot/lib/dovecot/libdovecot.so.0 No symbol table info available. #10 0x0000000000403994 in main () No symbol table info available. (gdb) All traces of the crashes are identical, that is #0 0x00002b52e1027e54 in istream_raw_mbox_get_start_offset () from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 #1 0x00002b52e102b759 in ?? () from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 #2 0x00002b52e100a2c0 in index_mail_expunge () from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 #3 0x0000000000405e9c in client_update_mails () #4 0x00000000004061c1 in client_command_execute () #5 0x00000000004045b9 in client_handle_input () #6 0x00002b52e12df698 in io_loop_call_io () from /opt/dovecot/lib/dovecot/libdovecot.so.0 #7 0x00002b52e12e09d5 in io_loop_handler_run () from /opt/dovecot/lib/dovecot/libdovecot.so.0 #8 0x00002b52e12df62d in io_loop_run () from /opt/dovecot/lib/dovecot/libdovecot.so.0 #9 0x00002b52e12cdf13 in master_service_run () from /opt/dovecot/lib/dovecot/libdovecot.so.0 #10 0x0000000000403994 in main ()
We have mboxes over NFS and we also have an ldap user backend. For now, I do not have a scenario that reproduces the problem. Any idea, or input are highly appreciated. Of course I can provide any information requested (without exposing restricted company or client data) to help trace the problem and lead to the solution.
Thanks and keep up the good work!
Regards,
Kostas Zorbadelos
Quoting Timo Sirainen tss@iki.fi:
With a quick test I can't reproduce pop3_lock_session=yes causing a crash. I guess it needs something else besides what I tested. It would be helpful if your Dovecot binaries weren't stripped of debug symbols. I could then ask for some more information from the core dumps with gdb.
Hi Timo,
indeed it is a bug that I could not reproduce myself. Having debug symbols and producing the stack trace is the next logical step and I will work on this tomorrow. Since --enable-debug does not work in your configure script, can you direct me as to what is needed? Is there an option in configure or do I need to mess with the makefiles?
On the other hand, I have found two different bugs. Having pop3_lock_session=yes we have the situation described here and also of course delays in local deliveries in case a client has an active pop session. And I can tell you we have a lot of abusing clients that keep hitting our pop servers continuously, or keep connections open for a VERY long time.
To address that, we put pop3_lock_session=no. In this case, there is an fcntl lock leak somewhere. The good news is that we have reproduced that and I will send relevant information in a different mail. I also read the following thread, from a while back:
http://www.dovecot.org/list/dovecot/2009-February/037098.html
Regards,
Kostas
On Wed, 2011-08-10 at 13:07 +0300, Kostas Zorbadelos wrote:
On 07/22/2011 01:02 PM, Kostas Zorbadelos wrote:
Hello,
since I saw no action on this, here is a newer update we discovered today.
After setting pop3_lock_session = no the core dumps went away. We will leave it like that and watch it for the next few days. If we set pop3_lock_session = yes, the problem is reproduced.
If I can do anything else to help debug the problem, please let me know.
Regards,
Kostas
Greetings to all.
It's my first post to the list. We just completed a migration
from qpopper to dovecot for our IMAP and POP3 services. We have a rather large mail environment (we are the biggest provider in Greece).So, here are the details:
- Keep getting errors like these in our production environment
Jul 22 00:18:21 pop01 dovecot: master: Error: service(pop3):
child 4078 killed with signal 11 (core dumps disabled) Jul 22 00:19:31 pop03 dovecot: master: Error: service(pop3):
child 18849 killed with signal 11 (core dumps disabled)
dovecot -n output
/opt/dovecot/sbin/dovecot -n # 2.0.13: /opt/dovecot/etc/dovecot/dovecot.conf # OS: Linux 2.6.18-92.1.22.el5 x86_64 CentOS release 5.5 (Final) auth_cache_negative_ttl = 10 mins auth_cache_size = 5 M auth_cache_ttl = 10 mins auth_verbose = yes default_client_limit = 5000 default_process_limit = 500 disable_plaintext_auth = no first_valid_uid = 200 listen = * log_timestamp = "%Y-%m-%d %H:%M:%S " login_greeting =<COMPANY> ready mail_access_groups = mail otemail disk root mail_fsync = always mail_location = mbox:INDEX=/var/index/dovecot/%2.16Hn/%2.254Hn/%u mail_nfs_storage = yes mbox_lock_timeout = 2 mins mbox_min_index_size = 200 k mbox_read_locks = dotlock_try fcntl mbox_write_locks = dotlock_try fcntl passdb { args = /opt/dovecot/etc/dovecot/dovecot-ldap.conf.ext driver = ldap } protocols = imap pop3 service auth-worker { user = dovenull } service imap-login { inet_listener imap { port = 143 } inet_listener imaps { port = 993 ssl = yes } } service pop3-login { inet_listener pop3 { port = 110 } inet_listener pop3s { port = 995 ssl = yes } } ssl = no userdb { args = /opt/dovecot/etc/dovecot/dovecot-ldap.conf.ext driver = ldap } verbose_proctitle = yes protocol imap { imap_client_workarounds = delay-newmail tb-extra-mailbox-sep mail_max_userip_connections = 100 } protocol pop3 { mail_max_userip_connections = 100 pop3_client_workarounds = outlook-no-nuls oe-ns-eoh pop3_fast_size_lookups = yes pop3_lock_session = yes pop3_reuse_xuidl = yes pop3_uidl_format = %08Xu%08Xv }
I enabled core dumps in one of our backend servers and here is
the relevant gdb trace:[root@pop08 ~]# gdb
/opt/dovecot/libexec/dovecot/pop3/core.9273 GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-32.el5_6.2) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or
laterhttp://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /opt/dovecot/libexec/dovecot/pop3...(no
debugging symbols found)...done. Reading symbols from
/opt/dovecot/lib/dovecot/libdovecot-storage.so.0...(no debugging
symbols found)...done. Loaded symbols for /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 Reading symbols from
/opt/dovecot/lib/dovecot/libdovecot.so.0...(no debugging symbols
found)...done. Loaded symbols for /opt/dovecot/lib/dovecot/libdovecot.so.0 Reading symbols from /lib64/libdl.so.2...(no debugging symbols
found)...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/librt.so.1...(no debugging symbols
found)...done. Loaded symbols for /lib64/librt.so.1 Reading symbols from /lib64/libc.so.6...(no debugging symbols
found)...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging
symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib64/libpthread.so.0...(no debugging
symbols found)...done. Loaded symbols for /lib64/libpthread.so.0 Core was generated by `dovecot/pop3'. Program terminated with signal 11, Segmentation fault. #0 0x00002b52e1027e54 in istream_raw_mbox_get_start_offset ()
from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 (gdb) bt full #0 0x00002b52e1027e54 in istream_raw_mbox_get_start_offset ()
from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 No symbol table info available. #1 0x00002b52e102b759 in ?? () from
/opt/dovecot/lib/dovecot/libdovecot-storage.so.0 No symbol table info available. #2 0x00002b52e100a2c0 in index_mail_expunge () from
/opt/dovecot/lib/dovecot/libdovecot-storage.so.0 No symbol table info available. #3 0x0000000000405e9c in client_update_mails () No symbol table info available. #4 0x00000000004061c1 in client_command_execute () No symbol table info available. #5 0x00000000004045b9 in client_handle_input () No symbol table info available. #6 0x00002b52e12df698 in io_loop_call_io () from
/opt/dovecot/lib/dovecot/libdovecot.so.0 No symbol table info available. #7 0x00002b52e12e09d5 in io_loop_handler_run () from
/opt/dovecot/lib/dovecot/libdovecot.so.0 No symbol table info available. #8 0x00002b52e12df62d in io_loop_run () from
/opt/dovecot/lib/dovecot/libdovecot.so.0 No symbol table info available. #9 0x00002b52e12cdf13 in master_service_run () from
/opt/dovecot/lib/dovecot/libdovecot.so.0 No symbol table info available. #10 0x0000000000403994 in main () No symbol table info available. (gdb)All traces of the crashes are identical, that is #0 0x00002b52e1027e54 in istream_raw_mbox_get_start_offset ()
from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 #1 0x00002b52e102b759 in ?? () from
/opt/dovecot/lib/dovecot/libdovecot-storage.so.0 #2 0x00002b52e100a2c0 in index_mail_expunge () from
/opt/dovecot/lib/dovecot/libdovecot-storage.so.0 #3 0x0000000000405e9c in client_update_mails () #4 0x00000000004061c1 in client_command_execute () #5 0x00000000004045b9 in client_handle_input () #6 0x00002b52e12df698 in io_loop_call_io () from
/opt/dovecot/lib/dovecot/libdovecot.so.0 #7 0x00002b52e12e09d5 in io_loop_handler_run () from
/opt/dovecot/lib/dovecot/libdovecot.so.0 #8 0x00002b52e12df62d in io_loop_run () from
/opt/dovecot/lib/dovecot/libdovecot.so.0 #9 0x00002b52e12cdf13 in master_service_run () from
/opt/dovecot/lib/dovecot/libdovecot.so.0 #10 0x0000000000403994 in main ()We have mboxes over NFS and we also have an ldap user backend.
For now, I do not have a scenario that reproduces the problem. Any idea, or input are highly
appreciated. Of course I can provide any information requested (without exposing restricted company or
client data) to help trace the problem and lead to the solution.Thanks and keep up the good work!
Regards,
Kostas Zorbadelos
On 08/15/2011 11:17 AM, kzorba@otenet.gr wrote:
Quoting Timo Sirainen tss@iki.fi:
With a quick test I can't reproduce pop3_lock_session=yes causing a crash. I guess it needs something else besides what I tested. It would be helpful if your Dovecot binaries weren't stripped of debug symbols. I could then ask for some more information from the core dumps with gdb.
Timo, all
here is a backtrace with debuging symbols in the executables:
[root@pop08 ]# gdb /opt/dovecot-debug/libexec/dovecot/pop3 core.2929 GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-32.el5_6.2) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /opt/dovecot-debug/libexec/dovecot/pop3...done. BFD: Warning: /var/mail7/folders/P/K/U/amihal/core.2929 is truncated: expected core file size >= 569344, found: 565248. Reading symbols from /opt/dovecot-debug/lib/dovecot/libdovecot-storage.so.0...done. Loaded symbols for /opt/dovecot-debug/lib/dovecot/libdovecot-storage.so.0 Reading symbols from /opt/dovecot-debug/lib/dovecot/libdovecot.so.0...done. Loaded symbols for /opt/dovecot-debug/lib/dovecot/libdovecot.so.0 Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/librt.so.1 Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done. Loaded symbols for /lib64/libpthread.so.0 Core was generated by `dovecot/pop3'. Program terminated with signal 11, Segmentation fault. #0 istream_raw_mbox_get_start_offset (stream=0x0) at istream-raw-mbox.c:498 498 istream-raw-mbox.c: No such file or directory. in istream-raw-mbox.c (gdb) bt #0 istream_raw_mbox_get_start_offset (stream=0x0) at istream-raw-mbox.c:498 #1 0x00002b4114afc769 in mbox_mail_get_special (_mail=0x1bad8fa0, field=<value optimized out>, value_r=0x7fff9600fa88) at mbox-mail.c:198 #2 0x00002b4114adb2d0 in index_mail_expunge (mail=0x0) at index-mail.c:1503 #3 0x0000000000405eac in client_update_mails (client=0x1bac4a00) at pop3-commands.c:255 #4 0x00000000004061d1 in cmd_quit (client=0x1bac4a00, name=0x1baccbd4 "", args=0x406a0c "") at pop3-commands.c:274 #5 client_command_execute (client=0x1bac4a00, name=0x1baccbd4 "", args=0x406a0c "") at pop3-commands.c:773 #6 0x00000000004045c9 in client_handle_input (client=0x1bac4a00) at pop3-client.c:628 #7 0x00002b4114db0698 in io_loop_call_io (io=0x1bac01d0) at ioloop.c:384 #8 0x00002b4114db19d5 in io_loop_handler_run (ioloop=<value optimized out>) at ioloop-epoll.c:213 #9 0x00002b4114db062d in io_loop_run (ioloop=0x1baa8610) at ioloop.c:405 #10 0x00002b4114d9ef13 in master_service_run (service=0x1baa84e0, callback=0x1baccbe4) at master-service.c:478 #11 0x00000000004039a4 in main (argc=1, argv=0x1baa8370) at main.c:252 (gdb) bt full #0 istream_raw_mbox_get_start_offset (stream=0x0) at istream-raw-mbox.c:498 rstream = <value optimized out> __FUNCTION__ = "istream_raw_mbox_get_start_offset" #1 0x00002b4114afc769 in mbox_mail_get_special (_mail=0x1bad8fa0, field=<value optimized out>, value_r=0x7fff9600fa88) at mbox-mail.c:198 mail = <value optimized out> mbox = 0x1bac97d0 offset = 0 #2 0x00002b4114adb2d0 in index_mail_expunge (mail=0x0) at index-mail.c:1503 value = <value optimized out> guid_128 = "\240\267\252\033\000\000\000\000\001\000\000\000\000\000\000" #3 0x0000000000405eac in client_update_mails (client=0x1bac4a00) at pop3-commands.c:255 search_args = 0x0 ctx = 0x1bace150 mail = 0x1bad8fa0 msgnum = 0 bit = 464309220 ret = <value optimized out> #4 0x00000000004061d1 in cmd_quit (client=0x1bac4a00, name=0x1baccbd4 "", args=0x406a0c "") at pop3-commands.c:274 No locals. #5 client_command_execute (client=0x1bac4a00, name=0x1baccbd4 "", args=0x406a0c "") at pop3-commands.c:773 No locals. #6 0x00000000004045c9 in client_handle_input (client=0x1bac4a00) at pop3-client.c:628 _data_stack_cur_id = 3 line = 0x1baab872 "QUIT" args = 0x406a0c "" ret = <value optimized out> #7 0x00002b4114db0698 in io_loop_call_io (io=0x1bac01d0) at ioloop.c:384 ioloop = 0x1baa8610 t_id = 2 #8 0x00002b4114db19d5 in io_loop_handler_run (ioloop=<value optimized out>) at ioloop-epoll.c:213 ctx = <value optimized out> event = 0x1baa88b0 list = 0x1bac0220 io = 0x0 tv = {tv_sec = 9, tv_usec = 871818} events_count = <value optimized out> msecs = <value optimized out> ---Type <return> to continue, or q <return> to quit--- ret = 1 i = 0 call = false #9 0x00002b4114db062d in io_loop_run (ioloop=0x1baa8610) at ioloop.c:405 No locals. #10 0x00002b4114d9ef13 in master_service_run (service=0x1baa84e0, callback=0x1baccbe4) at master-service.c:478 No locals. #11 0x00000000004039a4 in main (argc=1, argv=0x1baa8370) at main.c:252 service_flags = <value optimized out> storage_service_flags = MAIL_STORAGE_SERVICE_FLAG_DISALLOW_ROOT postlogin_socket_path = 0x0 username = 0x0 c = <value optimized out> set_roots = {0x4072a0, 0x0} (gdb) quit
I have also kept the user's mailbox. I couldn't reproduce the problem by talking POP3 directly to the server by hand.
Any ideas?
Thanks,
Kostas
Hi Timo,
indeed it is a bug that I could not reproduce myself. Having debug symbols and producing the stack trace is the next logical step and I will work on this tomorrow. Since --enable-debug does not work in your configure script, can you direct me as to what is needed? Is there an option in configure or do I need to mess with the makefiles?
On the other hand, I have found two different bugs. Having pop3_lock_session=yes we have the situation described here and also of course delays in local deliveries in case a client has an active pop session. And I can tell you we have a lot of abusing clients that keep hitting our pop servers continuously, or keep connections open for a VERY long time.
To address that, we put pop3_lock_session=no. In this case, there is an fcntl lock leak somewhere. The good news is that we have reproduced that and I will send relevant information in a different mail. I also read the following thread, from a while back:
http://www.dovecot.org/list/dovecot/2009-February/037098.html
Regards,
Kostas
On 08/16/2011 04:42 PM, Kostas Zorbadelos wrote:
On 08/15/2011 11:17 AM, kzorba@otenet.gr wrote:
Quoting Timo Sirainentss@iki.fi:
OK,
we managed to reproduce the problem with the following scenario.
Using Thunderbird (3.1.11 if that matters) we set up a mail account using POP served by dovecot. In the account settings we have checked the boxes "Leave messages on server" and "Until I delete them" which is the default setting for new accounts.
We make the first POP session and get the mails. We then delete a message (moving it to trash) and try to Get Mails. Dovecot crashes with the aforementioned stack trace (included below). If you again try Get Mail in the client, you get the deleted mail again.
It seems to be related to the indexing code since if we change the setting
mbox_min_index_size
to something bigger than the mailbox (so no cache index files are created) the problem does not appear and the mails get deleted from the server normally. We didn't manage to reproduce the problem with Microsoft Outlook.
We 'll keep investigating until we have a fix.
Regards,
Kostas
With a quick test I can't reproduce pop3_lock_session=yes causing a crash. I guess it needs something else besides what I tested. It would be helpful if your Dovecot binaries weren't stripped of debug symbols. I could then ask for some more information from the core dumps with gdb.
Timo, all
here is a backtrace with debuging symbols in the executables:
[root@pop08 ]# gdb /opt/dovecot-debug/libexec/dovecot/pop3 core.2929 GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-32.el5_6.2) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or laterhttp://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /opt/dovecot-debug/libexec/dovecot/pop3...done. BFD: Warning: /var/mail7/folders/P/K/U/amihal/core.2929 is truncated: expected core file size>= 569344, found: 565248. Reading symbols from /opt/dovecot-debug/lib/dovecot/libdovecot-storage.so.0...done. Loaded symbols for /opt/dovecot-debug/lib/dovecot/libdovecot-storage.so.0 Reading symbols from /opt/dovecot-debug/lib/dovecot/libdovecot.so.0...done. Loaded symbols for /opt/dovecot-debug/lib/dovecot/libdovecot.so.0 Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/librt.so.1 Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done. Loaded symbols for /lib64/libpthread.so.0 Core was generated by `dovecot/pop3'. Program terminated with signal 11, Segmentation fault. #0 istream_raw_mbox_get_start_offset (stream=0x0) at istream-raw-mbox.c:498 498 istream-raw-mbox.c: No such file or directory. in istream-raw-mbox.c (gdb) bt #0 istream_raw_mbox_get_start_offset (stream=0x0) at istream-raw-mbox.c:498 #1 0x00002b4114afc769 in mbox_mail_get_special (_mail=0x1bad8fa0, field=<value optimized out>, value_r=0x7fff9600fa88) at mbox-mail.c:198 #2 0x00002b4114adb2d0 in index_mail_expunge (mail=0x0) at index-mail.c:1503 #3 0x0000000000405eac in client_update_mails (client=0x1bac4a00) at pop3-commands.c:255 #4 0x00000000004061d1 in cmd_quit (client=0x1bac4a00, name=0x1baccbd4 "", args=0x406a0c "") at pop3-commands.c:274 #5 client_command_execute (client=0x1bac4a00, name=0x1baccbd4 "", args=0x406a0c "") at pop3-commands.c:773 #6 0x00000000004045c9 in client_handle_input (client=0x1bac4a00) at pop3-client.c:628 #7 0x00002b4114db0698 in io_loop_call_io (io=0x1bac01d0) at ioloop.c:384 #8 0x00002b4114db19d5 in io_loop_handler_run (ioloop=<value optimized out>) at ioloop-epoll.c:213 #9 0x00002b4114db062d in io_loop_run (ioloop=0x1baa8610) at ioloop.c:405 #10 0x00002b4114d9ef13 in master_service_run (service=0x1baa84e0, callback=0x1baccbe4) at master-service.c:478 #11 0x00000000004039a4 in main (argc=1, argv=0x1baa8370) at main.c:252 (gdb) bt full #0 istream_raw_mbox_get_start_offset (stream=0x0) at istream-raw-mbox.c:498 rstream =<value optimized out> __FUNCTION__ = "istream_raw_mbox_get_start_offset" #1 0x00002b4114afc769 in mbox_mail_get_special (_mail=0x1bad8fa0, field=<value optimized out>, value_r=0x7fff9600fa88) at mbox-mail.c:198 mail =<value optimized out> mbox = 0x1bac97d0 offset = 0 #2 0x00002b4114adb2d0 in index_mail_expunge (mail=0x0) at index-mail.c:1503 value =<value optimized out> guid_128 = "\240\267\252\033\000\000\000\000\001\000\000\000\000\000\000" #3 0x0000000000405eac in client_update_mails (client=0x1bac4a00) at pop3-commands.c:255 search_args = 0x0 ctx = 0x1bace150 mail = 0x1bad8fa0 msgnum = 0 bit = 464309220 ret =<value optimized out> #4 0x00000000004061d1 in cmd_quit (client=0x1bac4a00, name=0x1baccbd4 "", args=0x406a0c "") at pop3-commands.c:274 No locals. #5 client_command_execute (client=0x1bac4a00, name=0x1baccbd4 "", args=0x406a0c "") at pop3-commands.c:773 No locals. #6 0x00000000004045c9 in client_handle_input (client=0x1bac4a00) at pop3-client.c:628 _data_stack_cur_id = 3 line = 0x1baab872 "QUIT" args = 0x406a0c "" ret =<value optimized out> #7 0x00002b4114db0698 in io_loop_call_io (io=0x1bac01d0) at ioloop.c:384 ioloop = 0x1baa8610 t_id = 2 #8 0x00002b4114db19d5 in io_loop_handler_run (ioloop=<value optimized out>) at ioloop-epoll.c:213 ctx =<value optimized out> event = 0x1baa88b0 list = 0x1bac0220 io = 0x0 tv = {tv_sec = 9, tv_usec = 871818} events_count =<value optimized out> msecs =<value optimized out> ---Type<return> to continue, or q<return> to quit--- ret = 1 i = 0 call = false #9 0x00002b4114db062d in io_loop_run (ioloop=0x1baa8610) at ioloop.c:405 No locals. #10 0x00002b4114d9ef13 in master_service_run (service=0x1baa84e0, callback=0x1baccbe4) at master-service.c:478 No locals. #11 0x00000000004039a4 in main (argc=1, argv=0x1baa8370) at main.c:252 service_flags =<value optimized out> storage_service_flags = MAIL_STORAGE_SERVICE_FLAG_DISALLOW_ROOT postlogin_socket_path = 0x0 username = 0x0 c =<value optimized out> set_roots = {0x4072a0, 0x0} (gdb) quit
I have also kept the user's mailbox. I couldn't reproduce the problem by talking POP3 directly to the server by hand.
Any ideas?
Thanks,
Kostas
Hi Timo,
indeed it is a bug that I could not reproduce myself. Having debug symbols and producing the stack trace is the next logical step and I will work on this tomorrow. Since --enable-debug does not work in your configure script, can you direct me as to what is needed? Is there an option in configure or do I need to mess with the makefiles?
On the other hand, I have found two different bugs. Having pop3_lock_session=yes we have the situation described here and also of course delays in local deliveries in case a client has an active pop session. And I can tell you we have a lot of abusing clients that keep hitting our pop servers continuously, or keep connections open for a VERY long time.
To address that, we put pop3_lock_session=no. In this case, there is an fcntl lock leak somewhere. The good news is that we have reproduced that and I will send relevant information in a different mail. I also read the following thread, from a while back:
http://www.dovecot.org/list/dovecot/2009-February/037098.html
Regards,
Kostas
The problem is more easily introduced than I imagined.
Included attached is an example mailbox containing just 3 mails. Use it to make a couple of POP sessions. Make sure indexes are generated for it by setting appropriately mbox_min_index_size e.g. mbox_min_index_size = 2k
Now, in the first session just generate the indexes and quit
kzorba@enigma(1)[05:23 PM]~->telnet dovecot-dev 110 Trying <IP>... Connected to dovecot-dev. Escape character is '^]'. +OK OTENET ready user kzorba1 +OK pass XXXXXXX +OK Logged in. LIST +OK 3 messages: 1 812 2 821 3 816 . quit +OK Logging out. Connection closed by foreign host.
In the second session we LIST and try to DELE a message
kzorba@enigma(1)[05:25 PM]~->telnet dovecot-dev 110 Trying <IP>... Connected to dovecot-dev. Escape character is '^]'. +OK OTENET ready user kzorba1 +OK pass XXXXXXX +OK Logged in. LIST +OK 3 messages: 1 812 2 821 3 816 . dele 2 +OK Marked to be deleted. quit Connection closed by foreign host.
In the server logs we get Aug 19 17:25:24 dovecot dovecot: master: Error: service(pop3): child 3489 killed with signal 11 (core dumped)
With the backtrace:
#0 0x00002b9ca3e93dce in istream_raw_mbox_get_start_offset (stream=0x0) at istream-raw-mbox.c:498
#1 0x00002b9ca3e97956 in mbox_mail_get_special (_mail=0x101cf7d0, field=MAIL_FETCH_GUID, value_r=0x7fff1fd69798) at mbox-mail.c:198
#2 0x00002b9ca3e47729 in mail_get_special (mail=0x101cf7d0, field=MAIL_FETCH_GUID, value_r=0x7fff1fd69798) at mail.c:188
#3 0x00002b9ca3e6d133 in index_mail_expunge (mail=0x101cf7d0) at index-mail.c:1503
#4 0x00002b9ca3e4784d in mail_expunge (mail=0x101cf7d0) at mail.c:233
#5 0x0000000000406272 in client_update_mails (client=0x101baa00) at pop3-commands.c:255
#6 0x000000000040635a in cmd_quit (client=0x101baa00, args=0x407f62 "") at pop3-commands.c:274
#7 0x000000000040774a in client_command_execute (client=0x101baa00, name=0x101960c0 "QUIT", args=0x407f62 "") at pop3-commands.c:773
#8 0x000000000040567c in client_handle_input (client=0x101baa00) at pop3-client.c:628
#9 0x0000000000405870 in client_input (client=0x101baa00) at pop3-client.c:681
#10 0x00002b9ca418af83 in io_loop_call_io (io=0x101b61d0) at ioloop.c:384
#11 0x00002b9ca418c685 in io_loop_handler_run (ioloop=0x1019e610) at ioloop-epoll.c:213
#12 0x00002b9ca418b024 in io_loop_run (ioloop=0x1019e610) at ioloop.c:405
#13 0x00002b9ca4171ca2 in master_service_run (service=0x1019e4e0, callback=0x403bab
I guess something is wrong in the indexing code because if I disable the index creation using an appropriate value for mbox_min_index_size everything works as expected and mails are deleted.
In our case, all clients that leave messages on server cannot actually delete messages from their inboxes. Clients with Thunderbird, keep getting the mails after they delete them, customers with Outlook don't see them again, but the deleted messages remain in the server. Needless to say that I will try to provide a patch but it will take me far more time than any developer familiar with the code already.
Regards,
Kostas
On Fri, 2011-08-19 at 17:52 +0300, Kostas Zorbadelos wrote:
The problem is more easily introduced than I imagined.
Well, I still couldn't reproduce it. But I can kind of see the problem. http://hg.dovecot.org/dovecot-2.0/rev/030394c74f54 should help.
On 08/23/2011 10:15 PM, Timo Sirainen wrote:
On Fri, 2011-08-19 at 17:52 +0300, Kostas Zorbadelos wrote:
The problem is more easily introduced than I imagined.
Well, I still couldn't reproduce it. But I can kind of see the problem. http://hg.dovecot.org/dovecot-2.0/rev/030394c74f54 should help.
Strange, I thought it would be reproduced easily. Anyway, your patch seems to work. I could also see with gdb that it had to do with an mbox having no open stream and I thought it was some kind of wrong initialization somewhere. I guess you can have an mbox opened, but with no stream created for it :)
Thank you so much about this. I guess the fix will be included in 2.0.14.
I think we have found another bug concerning fcntl lock leaks when
pop3_lock_session = no, but this (if this stands) is a subject for a different thread.
Regards,
Kostas
participants (3)
-
Kostas Zorbadelos
-
kzorba@otenet.gr
-
Timo Sirainen