sudden performance drop - i/o related
Hi I have exim/dovecot server that worked great for last few years and two weeks ago it got ill ;) First were users reporting errors on saving mails to Sent (timeouts). Now the logs are infested with warnings about long waits:
May 10 10:18: Maildir /mail/xxx Synchronization took 193 seconds (1 new msgs, 0 flag change attempts, 0 expunge attempts) May 10 10:18: Maildir /mail/xxx Synchronization took 125 seconds (1 new msgs, 0 flag change attempts, 0 expunge attempts) May 10 10:18: Maildir /mail/xxx Synchronization took 211 seconds (1 new msgs, 0 flag change attempts, 0 expunge attempts) May 10 10:18: Maildir /mail/xxx Synchronization took 107 seconds (8 new msgs, 0 flag change attempts, 0 expunge attempts) May 10 10:18: Transaction log file /mail/xxx was locked for 36 seconds (Mailbox was synchronized) May 10 10:18: Transaction log file /mail/xxx was locked for 160 seconds (Mailbox was synchronized) May 10 10:18: Transaction log file /mail/xxx was locked for 72 seconds (Mailbox was synchronized) May 10 10:18: Locking transaction log file /mail/xxx took 60 seconds (syncing) May 10 10:18: Locking transaction log file /mail/xxx took 38 seconds (syncing) May 10 10:18: Locking transaction log file /mail/xxx took 35 seconds (syncing)
It looks like i/o risen from 150writes/s to 500writes/s (in top hours) - but there's no real change in number of emails or the volume. Number of users is steady (~100 active users, ~250 imap sessions), number of emails (by count or by volume) is rising and falling within 15% margin.
The box is FreeBSD 11.4, dovecot is 2.3.13. Filesystem is ZFS, disks are fine, free space is around 20% (~200GB) Layout is Maildir. CPU is not overloaded (2x6core), same with memory (48GB).
I didn't change anything in configuration.
Tonight I did some finetuning like maildir_copy_with_hardlinks=yes or mail_fsync=never/optimized (I'm not happy with that but I'm afraid it won't really help and I'll be able to revert that). I'm also thinking about switching from Maildir to sdbox (I know it won't hurt).
I don't know where to look to find where the i/o goes. I don't have any metrics/stats enabled (I looked at the docs but it looks it's not really simple and needs some digging to get valuable config). Maybe somebody has suggestions what to look for?
For detailed per-process stats I need to rebuild kernel with dtrace (other night I guess)... Simple top (in i/o mode - similar to linux's iotop) doesn't catch short living processes (like LDA deliveries).
best regards
Marcin Gryszkalis, PGP 0xA5DBEEC7 http://fork.pl/gpg.txt
On 11/05/2021 01:07 Marcin Gryszkalis <mg@fork.pl> wrote:
Hi I have exim/dovecot server that worked great for last few years and two weeks ago it got ill ;) First were users reporting errors on saving mails to Sent (timeouts). Now the logs are infested with warnings about long waits:
May 10 10:18: Maildir /mail/xxx Synchronization took 193 seconds (1 new msgs, 0 flag change attempts, 0 expunge attempts) May 10 10:18: Maildir /mail/xxx Synchronization took 125 seconds (1 new msgs, 0 flag change attempts, 0 expunge attempts) May 10 10:18: Maildir /mail/xxx Synchronization took 211 seconds (1 new msgs, 0 flag change attempts, 0 expunge attempts) May 10 10:18: Maildir /mail/xxx Synchronization took 107 seconds (8 new msgs, 0 flag change attempts, 0 expunge attempts) May 10 10:18: Transaction log file /mail/xxx was locked for 36 seconds (Mailbox was synchronized) May 10 10:18: Transaction log file /mail/xxx was locked for 160 seconds (Mailbox was synchronized) May 10 10:18: Transaction log file /mail/xxx was locked for 72 seconds (Mailbox was synchronized) May 10 10:18: Locking transaction log file /mail/xxx took 60 seconds (syncing) May 10 10:18: Locking transaction log file /mail/xxx took 38 seconds (syncing) May 10 10:18: Locking transaction log file /mail/xxx took 35 seconds (syncing)
It looks like i/o risen from 150writes/s to 500writes/s (in top hours) - but there's no real change in number of emails or the volume. Number of users is steady (~100 active users, ~250 imap sessions), number of emails (by count or by volume) is rising and falling within 15% margin.
The box is FreeBSD 11.4, dovecot is 2.3.13. Filesystem is ZFS, disks are fine, free space is around 20% (~200GB) Layout is Maildir. CPU is not overloaded (2x6core), same with memory (48GB).
I didn't change anything in configuration.
Tonight I did some finetuning like maildir_copy_with_hardlinks=yes or mail_fsync=never/optimized (I'm not happy with that but I'm afraid it won't really help and I'll be able to revert that). I'm also thinking about switching from Maildir to sdbox (I know it won't hurt).
I don't know where to look to find where the i/o goes. I don't have any metrics/stats enabled (I looked at the docs but it looks it's not really simple and needs some digging to get valuable config). Maybe somebody has suggestions what to look for?
For detailed per-process stats I need to rebuild kernel with dtrace (other night I guess)... Simple top (in i/o mode - similar to linux's iotop) doesn't catch short living processes (like LDA deliveries).
best regards
Marcin Gryszkalis, PGP 0xA5DBEEC7 http://fork.pl/gpg.txt
One thing that does come to mind is that you are delivering outside dovecot. Without knowing your system better, I would suggest that one thing to try would be to use dovecot-lda to deliver mail.
Are your users directly accessing the maildir?
Aki
On 11.05.2021 07:30, Aki Tuomi wrote:
On 11/05/2021 01:07 Marcin Gryszkalis <mg@fork.pl> wrote: It looks like i/o risen from 150writes/s to 500writes/s (in top hours) -
One thing that does come to mind is that you are delivering outside dovecot. Without knowing your system better, I would suggest that one thing to try would be to use dovecot-lda to deliver mail. exim delivers locally via /usr/local/libexec/dovecot/dovecot-lda and it's the only way used for delivery (not counting occasional restoring mail from backups)
Are your users directly accessing the maildir? Not sure what you mean, they use imap (plus few dovecot/pop3 boxes for automated processing).
best regards
Marcin Gryszkalis, PGP 0xA5DBEEC7 http://fork.pl/gpg.txt
On 11/05/2021 10:34 Marcin Gryszkalis <mg@fork.pl> wrote:
On 11.05.2021 07:30, Aki Tuomi wrote:
On 11/05/2021 01:07 Marcin Gryszkalis <mg@fork.pl> wrote: It looks like i/o risen from 150writes/s to 500writes/s (in top hours) -
One thing that does come to mind is that you are delivering outside dovecot. Without knowing your system better, I would suggest that one thing to try would be to use dovecot-lda to deliver mail. exim delivers locally via /usr/local/libexec/dovecot/dovecot-lda and it's the only way used for delivery (not counting occasional restoring mail from backups)
Are your users directly accessing the maildir? Not sure what you mean, they use imap (plus few dovecot/pop3 boxes for automated processing).
best regards
Marcin Gryszkalis, PGP 0xA5DBEEC7 http://fork.pl/gpg.txt
Your logs indicate though that dovecot is finding new mails that were not indexed before. So something external must be placing them there.
Aki
participants (2)
-
Aki Tuomi
-
Marcin Gryszkalis