sudden performance drop - i/o related

Marcin Gryszkalis mg at fork.pl
Tue May 11 01:07:39 EEST 2021


Hi
I have exim/dovecot server that worked great for last few years and two 
weeks ago it got ill ;)
First were users reporting errors on saving mails to Sent (timeouts).
Now the logs are infested with warnings about long waits:

May 10 10:18: Maildir /mail/xxx Synchronization took 193 seconds (1 new 
msgs, 0 flag change attempts, 0 expunge attempts)
May 10 10:18: Maildir /mail/xxx Synchronization took 125 seconds (1 new 
msgs, 0 flag change attempts, 0 expunge attempts)
May 10 10:18: Maildir /mail/xxx Synchronization took 211 seconds (1 new 
msgs, 0 flag change attempts, 0 expunge attempts)
May 10 10:18: Maildir /mail/xxx Synchronization took 107 seconds (8 new 
msgs, 0 flag change attempts, 0 expunge attempts)
May 10 10:18: Transaction log file /mail/xxx was locked for 36 seconds 
(Mailbox was synchronized)
May 10 10:18: Transaction log file /mail/xxx was locked for 160 seconds 
(Mailbox was synchronized)
May 10 10:18: Transaction log file /mail/xxx was locked for 72 seconds 
(Mailbox was synchronized)
May 10 10:18: Locking transaction log file /mail/xxx took 60 seconds 
(syncing)
May 10 10:18: Locking transaction log file /mail/xxx took 38 seconds 
(syncing)
May 10 10:18: Locking transaction log file /mail/xxx took 35 seconds 
(syncing)

It looks like i/o risen from 150writes/s to 500writes/s (in top hours) - 
but there's no real change in number of emails or the volume. Number of 
users is steady (~100 active users, ~250 imap sessions), number of 
emails (by count or by volume) is rising and falling within 15% margin.

The box is FreeBSD 11.4, dovecot is 2.3.13.
Filesystem is ZFS, disks are fine, free space is around 20% (~200GB)
Layout is Maildir. CPU is not overloaded (2x6core), same with memory (48GB).

I didn't change anything in configuration.

Tonight I did some finetuning like maildir_copy_with_hardlinks=yes or 
mail_fsync=never/optimized (I'm not happy with that but I'm afraid it 
won't really help and I'll be able to revert that). I'm also thinking 
about switching from Maildir to sdbox (I know it won't hurt).

I don't know where to look to find where the i/o goes. I don't have any 
metrics/stats enabled (I looked at the docs but it looks it's not really 
simple and needs some digging to get valuable config). Maybe somebody 
has suggestions what to look for?

For detailed per-process stats I need to rebuild kernel with dtrace 
(other night I guess)... Simple top (in i/o mode - similar to linux's 
iotop) doesn't catch short living processes (like LDA deliveries).

best regards
-- 
Marcin Gryszkalis, PGP 0xA5DBEEC7 http://fork.pl/gpg.txt


More information about the dovecot mailing list