Quoting Timo Sirainen <tss@iki.fi>:
With a quick test I can't reproduce pop3_lock_session=yes causing a crash. I guess it needs something else besides what I tested. It would be helpful if your Dovecot binaries weren't stripped of debug symbols. I could then ask for some more information from the core dumps with gdb.
Hi Timo,
indeed it is a bug that I could not reproduce myself. Having debug symbols and producing the stack trace is the next logical step and I will work on this tomorrow. Since --enable-debug does not work in your configure script, can you direct me as to what is needed? Is there an option in configure or do I need to mess with the makefiles?
On the other hand, I have found two different bugs. Having pop3_lock_session=yes we have the situation described here and also of course delays in local deliveries in case a client has an active pop session. And I can tell you we have a lot of abusing clients that keep hitting our pop servers continuously, or keep connections open for a VERY long time.
To address that, we put pop3_lock_session=no. In this case, there is an fcntl lock leak somewhere. The good news is that we have reproduced that and I will send relevant information in a different mail. I also read the following thread, from a while back:
http://www.dovecot.org/list/dovecot/2009-February/037098.html
Regards,
Kostas
On Wed, 2011-08-10 at 13:07 +0300, Kostas Zorbadelos wrote:
On 07/22/2011 01:02 PM, Kostas Zorbadelos wrote:
Hello,
since I saw no action on this, here is a newer update we discovered today.
After setting pop3_lock_session = no the core dumps went away. We will leave it like that and watch it for the next few days. If we set pop3_lock_session = yes, the problem is reproduced.
If I can do anything else to help debug the problem, please let me know.
Regards,
Kostas
Greetings to all.
It's my first post to the list. We just completed a migration
from qpopper to dovecot for our IMAP and POP3 services. We have a rather large mail environment (we are the biggest provider in Greece).So, here are the details:
- Keep getting errors like these in our production environment
Jul 22 00:18:21 pop01 dovecot: master: Error: service(pop3):
child 4078 killed with signal 11 (core dumps disabled) Jul 22 00:19:31 pop03 dovecot: master: Error: service(pop3):
child 18849 killed with signal 11 (core dumps disabled)
dovecot -n output
/opt/dovecot/sbin/dovecot -n # 2.0.13: /opt/dovecot/etc/dovecot/dovecot.conf # OS: Linux 2.6.18-92.1.22.el5 x86_64 CentOS release 5.5 (Final) auth_cache_negative_ttl = 10 mins auth_cache_size = 5 M auth_cache_ttl = 10 mins auth_verbose = yes default_client_limit = 5000 default_process_limit = 500 disable_plaintext_auth = no first_valid_uid = 200 listen = * log_timestamp = "%Y-%m-%d %H:%M:%S " login_greeting =<COMPANY> ready mail_access_groups = mail otemail disk root mail_fsync = always mail_location = mbox:INDEX=/var/index/dovecot/%2.16Hn/%2.254Hn/%u mail_nfs_storage = yes mbox_lock_timeout = 2 mins mbox_min_index_size = 200 k mbox_read_locks = dotlock_try fcntl mbox_write_locks = dotlock_try fcntl passdb { args = /opt/dovecot/etc/dovecot/dovecot-ldap.conf.ext driver = ldap } protocols = imap pop3 service auth-worker { user = dovenull } service imap-login { inet_listener imap { port = 143 } inet_listener imaps { port = 993 ssl = yes } } service pop3-login { inet_listener pop3 { port = 110 } inet_listener pop3s { port = 995 ssl = yes } } ssl = no userdb { args = /opt/dovecot/etc/dovecot/dovecot-ldap.conf.ext driver = ldap } verbose_proctitle = yes protocol imap { imap_client_workarounds = delay-newmail tb-extra-mailbox-sep mail_max_userip_connections = 100 } protocol pop3 { mail_max_userip_connections = 100 pop3_client_workarounds = outlook-no-nuls oe-ns-eoh pop3_fast_size_lookups = yes pop3_lock_session = yes pop3_reuse_xuidl = yes pop3_uidl_format = %08Xu%08Xv }
I enabled core dumps in one of our backend servers and here is
the relevant gdb trace:[root@pop08 ~]# gdb
/opt/dovecot/libexec/dovecot/pop3<path_to_core_file>/core.9273 GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-32.el5_6.2) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or
later<http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /opt/dovecot/libexec/dovecot/pop3...(no
debugging symbols found)...done. Reading symbols from
/opt/dovecot/lib/dovecot/libdovecot-storage.so.0...(no debugging
symbols found)...done. Loaded symbols for /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 Reading symbols from
/opt/dovecot/lib/dovecot/libdovecot.so.0...(no debugging symbols
found)...done. Loaded symbols for /opt/dovecot/lib/dovecot/libdovecot.so.0 Reading symbols from /lib64/libdl.so.2...(no debugging symbols
found)...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/librt.so.1...(no debugging symbols
found)...done. Loaded symbols for /lib64/librt.so.1 Reading symbols from /lib64/libc.so.6...(no debugging symbols
found)...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging
symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib64/libpthread.so.0...(no debugging
symbols found)...done. Loaded symbols for /lib64/libpthread.so.0 Core was generated by `dovecot/pop3'. Program terminated with signal 11, Segmentation fault. #0 0x00002b52e1027e54 in istream_raw_mbox_get_start_offset ()
from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 (gdb) bt full #0 0x00002b52e1027e54 in istream_raw_mbox_get_start_offset ()
from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 No symbol table info available. #1 0x00002b52e102b759 in ?? () from
/opt/dovecot/lib/dovecot/libdovecot-storage.so.0 No symbol table info available. #2 0x00002b52e100a2c0 in index_mail_expunge () from
/opt/dovecot/lib/dovecot/libdovecot-storage.so.0 No symbol table info available. #3 0x0000000000405e9c in client_update_mails () No symbol table info available. #4 0x00000000004061c1 in client_command_execute () No symbol table info available. #5 0x00000000004045b9 in client_handle_input () No symbol table info available. #6 0x00002b52e12df698 in io_loop_call_io () from
/opt/dovecot/lib/dovecot/libdovecot.so.0 No symbol table info available. #7 0x00002b52e12e09d5 in io_loop_handler_run () from
/opt/dovecot/lib/dovecot/libdovecot.so.0 No symbol table info available. #8 0x00002b52e12df62d in io_loop_run () from
/opt/dovecot/lib/dovecot/libdovecot.so.0 No symbol table info available. #9 0x00002b52e12cdf13 in master_service_run () from
/opt/dovecot/lib/dovecot/libdovecot.so.0 No symbol table info available. #10 0x0000000000403994 in main () No symbol table info available. (gdb)All traces of the crashes are identical, that is #0 0x00002b52e1027e54 in istream_raw_mbox_get_start_offset ()
from /opt/dovecot/lib/dovecot/libdovecot-storage.so.0 #1 0x00002b52e102b759 in ?? () from
/opt/dovecot/lib/dovecot/libdovecot-storage.so.0 #2 0x00002b52e100a2c0 in index_mail_expunge () from
/opt/dovecot/lib/dovecot/libdovecot-storage.so.0 #3 0x0000000000405e9c in client_update_mails () #4 0x00000000004061c1 in client_command_execute () #5 0x00000000004045b9 in client_handle_input () #6 0x00002b52e12df698 in io_loop_call_io () from
/opt/dovecot/lib/dovecot/libdovecot.so.0 #7 0x00002b52e12e09d5 in io_loop_handler_run () from
/opt/dovecot/lib/dovecot/libdovecot.so.0 #8 0x00002b52e12df62d in io_loop_run () from
/opt/dovecot/lib/dovecot/libdovecot.so.0 #9 0x00002b52e12cdf13 in master_service_run () from
/opt/dovecot/lib/dovecot/libdovecot.so.0 #10 0x0000000000403994 in main ()We have mboxes over NFS and we also have an ldap user backend.
For now, I do not have a scenario that reproduces the problem. Any idea, or input are highly
appreciated. Of course I can provide any information requested (without exposing restricted company or
client data) to help trace the problem and lead to the solution.Thanks and keep up the good work!
Regards,
Kostas Zorbadelos