Ron Leach wrote:
Finally, and I do apologise for all the questions, we're wishing to move to NFS. (At the moment we have a 'one box' Dovecot solution, but this makes upgrade of OS, upgrade of Dovecot, or upgrade of storage always a problem. We have already exported the new XFS filestore over NFS - but Dovecot is not (yet) using it, that's the next step for us.) Does the fsync solution we've been discussing work just as well when the XFS filestore is exported over NFS?
I realise replying to self is not the best thing to do, but I do not want to waste people's time. NFS is handled with quite different system calls, and has a quite different 'sequential behaviour' (for want of a better word. Deep in the comments on Ted Tso's post - referred previously by Timo - are these remarks:
Delayed allocation and the zero-length file problem | Thoughts by Ted
"In comment #49, Ted says:
First of all, that’s not NFS’s model; NFS v2/v3 requires that each write RPC call not return until the data has hit stable storage. So in fact it’s a stronger requirement than alloc on close.
This statement is misleading.
Firstly, it accurately describes the NFSv2 semantics, but no sane person deliberately uses NFSv2 anymore so the statement is of no help in the real world.
The NFSv3 semantics are more flexible. The v3 WRITE RPC adds a flag which allows the client to say whether it wants the data on stable storage before the RPC returns, i.e. whether to do the old slow thing that was the only way with NFSv2. This flag is hardly ever used by clients (O_SYNC or a “sync” mount enable it, as does O_DIRECT in most circumstances). Instead clients will typically send a bunch of WRITE calls with data, and then a COMMIT call which does the actual forcing of data to server-side stable storage. This is significantly faster than the NFSv2 model.
NFSv4 behaves like NFSv3 in this regard, but adds a further feature called file delegations which complicate the picture even further.
Now to actually answer Frank’s question. But first some background.
The NFS protocol doesn’t know about any block-level behaviour like allocation, it works entirely on files. When the allocation occurs is a server-side implementation detail entirely. Having the data on stable storage is indeed a stronger requirement than forcing allocation, but the server could choose to do the allocation at any time from the start of unstable WRITE RPC to the end of the COMMIT RPC, which can be a window of several seconds.
NFS however does keep data in the client which has been written by applications on the client but not yet sent to the server. If the lifetime of the application is short and the file is small, this could include the entire data of the file.
NFS clients practice a behaviour known as CTO or “Close-To-Open consistency”, which is a very weak form of inter-client file cache consistency. This means that when the application close()s the last fd the client will perform the equivalent of an fsync(), i.e. issue WRITEs to the server for all any dirty data remaining in the client and a COMMIT to force that data to stable storage on the server. In other words, when close() returns to the app, the data is safe on the server. This is the behaviour on close() that Frank refers to above. Note that this is a much tighter constraint than POSIX requires."
http://thunk.org/tytso/blog/2009/03/12/delayed-allocation-and-the-zero-lengt... [Comment # 84]
Further, the NFS shares can be mounted on the client with a 'sync' option that forces physical writes before returning to the caller. Though this would be horrifically slow in any high load (network transmission times, disc io queues etc), in our situation of low load we could consider using this option to minimise the potential for email loss due to crash or power fail.
One further optimization, not relevant to Dovecot or email, but worth mentioning in the (unlikely) event that anyone is really this interested, if we were to split our XFS share into 'two' shares, one for email, and the other for general data storage, then we could apply 'sync' only to the XFS share for email (hence ensuring immediate writes) and not to the other share for general storage.
Unless I'm wrong about something here, I think this closes the NFS-related concern about XFS and Dovecot and loss of email.
regards, Ron