sudden performance drop - i/o related
Marcin Gryszkalis
mg at fork.pl
Tue May 11 01:07:39 EEST 2021
Hi
I have exim/dovecot server that worked great for last few years and two
weeks ago it got ill ;)
First were users reporting errors on saving mails to Sent (timeouts).
Now the logs are infested with warnings about long waits:
May 10 10:18: Maildir /mail/xxx Synchronization took 193 seconds (1 new
msgs, 0 flag change attempts, 0 expunge attempts)
May 10 10:18: Maildir /mail/xxx Synchronization took 125 seconds (1 new
msgs, 0 flag change attempts, 0 expunge attempts)
May 10 10:18: Maildir /mail/xxx Synchronization took 211 seconds (1 new
msgs, 0 flag change attempts, 0 expunge attempts)
May 10 10:18: Maildir /mail/xxx Synchronization took 107 seconds (8 new
msgs, 0 flag change attempts, 0 expunge attempts)
May 10 10:18: Transaction log file /mail/xxx was locked for 36 seconds
(Mailbox was synchronized)
May 10 10:18: Transaction log file /mail/xxx was locked for 160 seconds
(Mailbox was synchronized)
May 10 10:18: Transaction log file /mail/xxx was locked for 72 seconds
(Mailbox was synchronized)
May 10 10:18: Locking transaction log file /mail/xxx took 60 seconds
(syncing)
May 10 10:18: Locking transaction log file /mail/xxx took 38 seconds
(syncing)
May 10 10:18: Locking transaction log file /mail/xxx took 35 seconds
(syncing)
It looks like i/o risen from 150writes/s to 500writes/s (in top hours) -
but there's no real change in number of emails or the volume. Number of
users is steady (~100 active users, ~250 imap sessions), number of
emails (by count or by volume) is rising and falling within 15% margin.
The box is FreeBSD 11.4, dovecot is 2.3.13.
Filesystem is ZFS, disks are fine, free space is around 20% (~200GB)
Layout is Maildir. CPU is not overloaded (2x6core), same with memory (48GB).
I didn't change anything in configuration.
Tonight I did some finetuning like maildir_copy_with_hardlinks=yes or
mail_fsync=never/optimized (I'm not happy with that but I'm afraid it
won't really help and I'll be able to revert that). I'm also thinking
about switching from Maildir to sdbox (I know it won't hurt).
I don't know where to look to find where the i/o goes. I don't have any
metrics/stats enabled (I looked at the docs but it looks it's not really
simple and needs some digging to get valuable config). Maybe somebody
has suggestions what to look for?
For detailed per-process stats I need to rebuild kernel with dtrace
(other night I guess)... Simple top (in i/o mode - similar to linux's
iotop) doesn't catch short living processes (like LDA deliveries).
best regards
--
Marcin Gryszkalis, PGP 0xA5DBEEC7 http://fork.pl/gpg.txt
More information about the dovecot
mailing list