[Dovecot] Load spikes on NFS server, multiple index updaters.

Jorgen Lundman lundman at lundman.net
Wed Aug 26 05:06:16 EEST 2009


We are occasionally experiencing trouble where the NFS server's load 
will shoot over 60+. (Normal of sub 1.0).

I have been hunting this for a while, and I believe it comes down to 
"deliver".

System setup:

NFS servers: x4540 Solaris 10 x64 ZFS over NFS.
NFS clients: Solaris 10 x64 postfix-2.4.1 with dovecot-1.1.11 deliver.


What appears to happen, when I check for nfsstat per process, is that I 
see 4 processes (in this case on vmx04) taking up majority of NFS ops:

root at vmx04:/var/tmp# ./nfsclientstats.pl
process    read write readdir getattr setattr lookup access create 
remove rename mkdir rmdir
24303         0     0       1      19       0    190    171      0 
0      0     0     0
24551         0     0       1      18       0    180    162      0 
0      0     0     0
26099         0     0       1      18       0    180    162      0 
0      0     0     0
295           0     0       1      18       0    180    162      0 
0      0     0     0
6793          3     0       0       0       0      5      5      0 
0      0     0     0
7234          0     1       0       2       0      9      9      0 
1      0     0     0



Checking what these processes are doing, I find the following happening:

26099:  getdents64(8, 0xCE7A4000, 8192)                 = 8136
26099: 
stat64("/export/censored/mail/cur/1223013930.V4700010I69f93eM483098.vmx02.unix:2,S", 
0x08047930) = 0
26099: 
stat64("/export/censored/mail/cur/1222066290.V4700007I67562bM241839.vmx04.unix:2,S", 
0x08047930) = 0
26099: 
stat64("/export/censored/mail/cur/1225325373.V4700008I94a1f9M286935.vmx04.unix:2,S", 
0x08047930) = 0
26099: 
stat64("/export/censored/mail/cur/1236170307.V4700002I67ca03M310418.vmx06.unix:2,", 
0x08047930) = 0
26099: 
stat64("/export/censored/mail/cur/1223581462.V4700011I69ffd6M720814.vmx02.unix:2,S", 
0x08047930) = 0



Very well, so it is rebuilding the dovecot.index, or recalculating the 
user's quota usage.

Is the directory large?

root at vmx04# ls -l /export/censored/mail/cur/|wc -l
   199626

You bet! But what is annoying is that if I also check process 24303, 
24551 and 295, they are scanning the SAME user's directory.

295: 
stat64("/export/censored/mail/cur/1230544947.V4700004I11d1d7fM492433.vmx04.unix:2,S", 
0x08047930) = 0
295: 
stat64("/export/censored/mail/cur/1223003964.V4700007I68932dM763546.vmx04.unix:2,S", 
0x08047930) = 0


So, in vmx04 we have 4 processes working in one user's giant directory, 
and on the other vmx clients, many more.

Could the semantics to 're-computing dovecot.index' be done such that 
the first "deliver" process locks it to do the work, and sub-sequent 
deliver processes will return temporary failures, until the work has 
finished.

Has it been already addresses in dovecot-1.1.18?

Advice please.

Lund

-- 
Jorgen Lundman       | <lundman at lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)


More information about the dovecot mailing list