[Dovecot] FreeBSD, Dovecot and ZFS
Mail Issues - FreeBSD
Hello, my apologies if this may be the wrong forum but hoping that maybe someone might be able to provide some insight.
Have a very sporadic and strange issue with our mail servers running Dovecot on FreeBSD. There are three servers hosting Dovecot with FreeBSD as the underlying operating system. All three connect to a NAS server, again running FreeBSD and ZFS.
When the specific issue occurs, clients connecting to check mail via POP3 or IMAP experience long delays and timeouts. To the point where POP3 Logins fail due to the timeouts. The issue is further compounded by clients increasing the number of attempts to check mail.
Part of the frustration in attempting to diagnose the issue is not knowing the root cause or symptom that initates the issue.
Wondering if anyone has experienced anything similar, or suggestions on ways to help identify the root cause
On 2013-02-12 4:25 PM, Jason Lock JLock@csolve.net wrote:
When the specific issue occurs, clients connecting to check mail via POP3 or IMAP experience long delays and timeouts. To the point where POP3 Logins fail due to the timeouts. The issue is further compounded by clients increasing the number of attempts to check mail.
WAG would be similar issues you can face when using NFS with multiple servers accessing it (file locking issues).
The solution would be to use Director to make sure users are always directed to the same server.
http://wiki2.dovecot.org/Director
If that isn't the problem, then much more info would be needed (ie, doveconf -n output, logs exhibiting the problem, etc)...
--
Best regards,
*/Charles /*
WAG would be similar issues you can face when using NFS with multiple servers accessing it (file locking issues). The solution would be to use Director to make sure users are always directed to the same server. http://wiki2.dovecot.org/Director If that isn't the problem, then much more info would be needed (ie, doveconf -n output, logs exhibiting the problem, etc)...
Best regards,
*/Charles /*
Thank you for your reply. To further expand the problem does not happen with any regularity, we went over 30 days with no issue after two weeks of sporadic occurrences. It usually only appears, if at all, any time after 2:00 PM (i.e. 14:30, 15:20, 16:10). And not every day (has not happened on a weekend). The number of POP3 and IMAP processes increase dramatically when the issue occurs.
Here is a copy of the dovecot -n output
# 1.2.17: /usr/local/etc/dovecot.conf # OS: FreeBSD 8.1-RELEASE-p5 i386 nfs protocols: imap imaps pop3 pop3s ssl_cert_file: /mail/shared/etc/ssl/certs/dovecot.pem ssl_key_file: /mail/shared/etc/ssl/private/dovecot.pem disable_plaintext_auth: no login_dir: /var/run/dovecot/login login_executable(default): /usr/local/libexec/dovecot/imap-login login_executable(imap): /usr/local/libexec/dovecot/imap-login login_executable(pop3): /usr/local/libexec/dovecot/pop3-login login_greeting: Hello there, who might you be? max_mail_processes: 4096 verbose_proctitle: yes first_valid_uid: 26 first_valid_gid: 0 mail_privileged_group: mail mail_location: maildir:/mail/store/%d/%n mmap_disable: yes dotlock_use_excl: no mail_nfs_storage: yes mail_nfs_index: yes lock_method: dotlock mail_executable(default): /usr/local/libexec/dovecot/imap mail_executable(imap): /usr/local/libexec/dovecot/imap mail_executable(pop3): /usr/local/libexec/dovecot/pop3 mail_plugin_dir(default): /usr/local/lib/dovecot/imap mail_plugin_dir(imap): /usr/local/lib/dovecot/imap mail_plugin_dir(pop3): /usr/local/lib/dovecot/pop3 imap_client_workarounds(default): delay-newmail outlook-idle netscape-eoh tb-extra-mailbox-sep imap_client_workarounds(imap): delay-newmail outlook-idle netscape-eoh tb-extra-mailbox-sep imap_client_workarounds(pop3): pop3_client_workarounds(default): pop3_client_workarounds(imap): pop3_client_workarounds(pop3): outlook-no-nuls oe-ns-eoh lda: postmaster_address: {REMOVED} sendmail_path: /usr/local/sbin/exim auth default: default_realm: {REMOVED} username_format: %Lu passdb: driver: sql args: /mail/shared/etc/dovecot-sql.conf userdb: driver: passwd userdb: driver: static args: uid=26 gid=6 home=/mail/store/%d/%n
At 9PM +0000 on 12/02/13 you (Jason Lock) wrote:
Mail Issues - FreeBSD
Hello, my apologies if this may be the wrong forum but hoping that maybe someone might be able to provide some insight.
This may turn out to be something better addressed on freebsd-stable, but this is a perfectly good place to start.
Have a very sporadic and strange issue with our mail servers running Dovecot on FreeBSD. There are three servers hosting Dovecot with FreeBSD as the underlying operating system. All three connect to a NAS server, again running FreeBSD and ZFS.
Over NFS, I assume? What version, what mount options, and what type of authentication? What locking strategies is Dovecot using? Are there any suspicious messages in syslog on either machine?
When the specific issue occurs, clients connecting to check mail via POP3 or IMAP experience long delays and timeouts. To the point where POP3 Logins fail due to the timeouts. The issue is further compounded by clients increasing the number of attempts to check mail.
Are the delays happening before or after login?
If you can provoke this and get a 'procstat -k' for the relevant dovecot process this might be helpful. If 'long' delays means several minutes, running something along the lines of 'procstat -k $(pgrep -U dovecot -U doveauth)' every minute or so for a while might be one way to catch this, though this will collect a lot of data rather fast so you will need some way to locate the relevant entry.
Ben
At 9PM +0000 on 12/02/13 you (Jason Lock) wrote:
Mail Issues - FreeBSD
Hello, my apologies if this may be the wrong forum but hoping that maybe someone might be able to provide some insight.
This may turn out to be something better addressed on freebsd-stable, but this is a perfectly good place to start.
Thank you for your reply.
Have a very sporadic and strange issue with our mail servers running Dovecot on FreeBSD. There are three servers hosting Dovecot with FreeBSD as the underlying operating system. All three connect to a NAS server, again running FreeBSD and ZFS.
Over NFS, I assume? What version, what mount options, and what type of authentication?
Yes, using NFSv3 to attach the share from the NAS to each of the POP3/IMAP servers. Only mount options set are RW. Not authentication in place, NAS and POP3/IMAP Servers share VLAN just for the NAS connections.
What locking strategies is Dovecot using?
In dovecot using the following:
dotlock_use_excl: no mail_nfs_storage: yes mail_nfs_index: yes lock_method: dotlock
Are there any suspicious messages in syslog on either machine?
Nothing specific.
When the specific issue occurs, clients connecting to check mail via POP3 or IMAP experience long delays and timeouts. To the point where POP3 Logins fail due to the timeouts. The issue is further compounded by clients increasing the number of attempts to check mail.
Are the delays happening before or after login?
Delays appear during login, username gets passed but then timeouts after the password is sent.
If you can provoke this and get a 'procstat -k' for the relevant dovecot process this might be helpful. If 'long' delays means >> several minutes, running something along the lines of 'procstat -k $(pgrep -U dovecot -U doveauth)' every minute or so for a >> while might be one way to catch this, though this will collect a lot of data rather fast so you will need some way to locate the >> relevant entry.
Will look to capture that information if possible, have not been able to re-create the situation in which the issue occurs.
Ben
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 2/12/13 1:25 PM, Jason Lock wrote:
Mail Issues - FreeBSD
Hello, my apologies if this may be the wrong forum but hoping that maybe someone might be able to provide some insight.
Have a very sporadic and strange issue with our mail servers running Dovecot on FreeBSD. There are three servers hosting Dovecot with FreeBSD as the underlying operating system. All three connect to a NAS server, again running FreeBSD and ZFS.
Speaking for NFS on FreeBSD, please note that the FreeBSD NFS client and server generally respects application's sync() operations which could cause delays, if the backing pool is not configured with any fast ZIL device, ideally a SSD. Normally a 'gstat -I 1s' on the NFS server will give you busy disks with small throughput, in this case, a ZIL would be helpful. Before buying a SSD, you may want to try disabling 'sync' on dataset (be careful though, that doing so would increase the chance of data loss in the event of power loss, make sure to revert the change with 'zfs inherit sync' on the datasets once you are done with experiments) and see if that helps, if so, then a SSD ZIL would be helpful; higher end systems can use a Fusion IO or ZeusRAM for that purpose.
Another possible cause is compression, at this time, FreeBSD's ZFS compression thread runs at kernel priority and that would cause the system to do ZFS compression before doing everything else. If you use compression=gzip, you will see stalls more often, for short term, please consider using compression=lzjb or have dovecot to do the compression; for mid term, please use my recent LZ4 port that would consume way less CPU and have better compression ratio than lzjb; for long term, we are going to implement a scheduling policy which is similar to Solaris's System Duty Cycle and have the compatibility shims to emulate that.
Lastly, try increasing your NFS iod numbers. The system default is very conservative and may become a bottleneck when you have a lot of users on one system.
Hope this helps... If you have further questions about FreeBSD please consider posting to freebsd-stable@freebsd.org .
Cheers, -----BEGIN PGP SIGNATURE-----
iQEcBAEBCAAGBQJRGuGyAAoJEG80Jeu8UPuzWdYH/iybDczQnLzvlzKfrGn/Kdm7 197P2wK4phktNJVrNjYKFYc8CqelcLoSfZiRGDc3CxpmKnrwRkUWnZMXfpYUVJVL 3SSpmNY73h4atrEpyyoKmz4tIxJp7c24IB8201j/vc09yEaooy2dc9pJrJ5cAcjz d7eSidcQQk8VdqCBk8haDpB2igZZXWz8zgZ8mKPcqSkGuxZkzZwZCkZc7XfPsCGK GKswLtXutHA19mKs+wBLJ1r+ZQxabI8ZwQi1V2+VBFz4a5edkQJVbW6VvFSmzjO9 HfbW0uO+wJztSHiceKEUiJ7bIm5ygolhA2BvZ5WlY2VkejXbN6nfQkOlvmDeS0s= =sMYy -----END PGP SIGNATURE-----
Am 12.02.2013 22:25, schrieb Jason Lock:
Mail Issues - FreeBSD
Hello, my apologies if this may be the wrong forum but hoping that maybe someone might be able to provide some insight.
Have a very sporadic and strange issue with our mail servers running Dovecot on FreeBSD. There are three servers hosting Dovecot with FreeBSD as the underlying operating system. All three connect to a NAS server, again running FreeBSD and ZFS.
When the specific issue occurs, clients connecting to check mail via POP3 or IMAP experience long delays and timeouts. To the point where POP3 Logins fail due to the timeouts. The issue is further compounded by clients increasing the number of attempts to check mail.
Part of the frustration in attempting to diagnose the issue is not knowing the root cause or symptom that initates the issue.
Wondering if anyone has experienced anything similar, or suggestions on ways to help identify the root cause
show your debug logs, dovecot confs, nfs setups are special, what did you setup to honor nfs
Best Regards MfG Robert Schetterer
-- [*] sys4 AG
http://sys4.de, +49 (89) 30 90 46 64 Franziskanerstraße 15, 81669 München
Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer Aufsichtsratsvorsitzender: Joerg Heidrich
participants (5)
-
Ben Morrow
-
Charles Marcus
-
Jason Lock
-
Robert Schetterer
-
Xin Li