On Fri, Sep 22, 2006 at 03:28:03PM +0100, David Lee wrote:
Performance has been sluggish: high load average, probably caused by NFS stat activity (itself because of "noac"?).
Although older Linuxes (e.g. Redhat 9, 2.4.20-43.9) have been OK, more recent releases (e.g. FC5, 2.6.16-1) introduced some nasty deadlocking, requiring machine reboot every day. (Unacceptable!)
We hope dovecot will improve matters.
It will (it should).
Any advice or comments or experiences?
Use maildir on NFS, not mbox. Especially for random-access IMAP services, and it isn't shabby in the POP3 slurpdown scenarios either. The performance difference is an order-of-magnitude, and converting mailboxes between these formats is trivially scriptable. The only practical downside to maildir I've experienced is that backup of lots-of-tiny-files (maildir) is more expensive than one-big-file (mbox).
Also, in such a set-up (multiple IMAP/Linux NFS-mouting from NetApp) where should the dovecot index files be? NFS from the NetApp? Or on each Linux machine (if so, on disk or ramdisk?)?
I put indices on the filer. Access serializes on dotlocking of dovecot-uidlist anyway.
Note: without Trond M's NFS client patchset we see comedy VFS out-of-sync errors after a few days uptime, resulting in mistaken deadlocking of dovecot indices (usually only one user, but always a high-volume user) that needs a node reboot to fix, and some intervention with the "lock status / break" command on the OnTAP command line. With the patches it's been rock solid.
Could you provide more details? (I wonder if these are related to deadlock problems we see with Washington/IMAP on 2.6.16?) Are these patches in the processes of being pushed into the relevant source codes so that they will ultimately be unnecessary?
They could well be related issues. I'm no expert on the Linux NFS client code itself and I wouldn't wish such a role on anyone :)
Unpatched, we see (after a day or two of uptime) the following kernel gripe: "do_vfs_lock: VFS is out of sync with lock manager!". This repeats a few times, and shortly afterwards the box locks up hard.
Patched, our mailstores are currently at >two months uptimes, and the last reboot was for routine maintenance. The only problem with this is that nfsstat's counters (e.g. for getattr) have reached 2^31-1 and stopped turning :)
Trond's code is often incorporated into the kernel. I have always tracked his patchsets in production platforms that were reliant upon NFS, and have for years found his code to make a difference in stability.
Also note that he works for NetApp, so you can guess what his interoperability testing might include.
JG
-- Josh "Koshua" Goodall "as modern as tomorrow afternoon" joshua@roughtrade.net - FW109