On Fri, 23 Jul 2004 21:41:36 +0200 Matthias Andree matthias.andree@gmx.de wrote:
Tim Southerwood ts@doc.ic.ac.uk writes:
This would constitute a violation of responsibilities or layers. What you are suggesting appears as though you would want the software to work around a problem when the administrator teams for Solaris and Linux aren't talking to each other.
Well, yes, upto a point. We're OK *potentially*, because I'm one of two linux admins and the solaris guy sits behind me (but on leave right now).
But I've worked in places before where a small department might run a gateway server (eg IMAP) but have to deal with a centrally provided filestore on some uber-SAN provided by a totally different department (I'm citing universities here) - and worse, getting the central IT people to touch "their" fileserver can be practically impossible in any sensible timeframe.
It's a quite grim, but nonetheless, true reality that some people who may want to run dovecot are stuck in such a situation. Service level aggreements signed by upper tier managers usually don't include "fcntl/F_SETLKW must work". I speak from experience.
Dovecot uses this locking in two places that matter so it's not IMHO a terrible disaster to add a small workaround.
I've often been tempted to add some special case to software and ultimately given in, only to find out weeks, months or even years later that the special case handler wasn't working properly -- such seldomly used code is a maintenance nightmare. And these special cases need rather lengthy comments because a few months later the maintainer will see the code and throw it out because it looks extraneous.
Yes, I understand - I've seen the code to GNU/tar! (except for the "throwing out" bit - that program is *the* museum of cruft) I totally agree on the desire to keep stuff clean. Finding the balance is usually a matter of debate though.
Maybe such a workaround should be kept as a separate patch and not become part of the baseline code.
That would be a wise and perfectly helpful way to proceed if that is what you would prefer. Put a patch on the ftp site and note it in a FAQ along the lines: "so you've got a broken NFS server" or something.
Incidently, talking to another colleague, it seems that we also had this problem with sunsite.org.uk (which we operate) - that was exhibiting the same problem with F_SETLKW with one ftpd program running over the NFS share between it's four hosts (between solaris client and server NFS). That problem was "fixed" in a hurry by using a different ftpd.
Unfortunately, we don't usually have enough time to get to the bottom of every odd fault we get, but this time I'm being more tenacious because this is irritating me (Solaris, not dovecot).
I'm on leave for 1.5 weeks - but I'll mail through to our solaris chap and let him have a look at it.
Anyway - I still stand by my point of view that not all NFS implementations are perfect, and doing something which helps people around broken systems (which they may not control) is helpful - but in the way that buggers up the dovecot codebase the least.
I suppose that if 99.5% of dovecot's userbase don't have this issue, perhaps you should leave the code alone. I will eventually do my own patch, but I'll try and do it a right as possible and I'll mail it in here if anyone else needs it.
Best wishes,
Tim
-- Tim Southerwood