[Gllug] NFS problem

Nix nix at esperi.org.uk
Thu Oct 18 21:23:07 UTC 2007


On 18 Oct 2007, John Edwards spake thusly:
> On Thu, Oct 18, 2007 at 12:03:34PM +0100, Alain Williams wrote:
>> The cxx compiler occasionally blows up with:
>> 	cxx: Severe: Unable to compress file "./cxx_repository/tru64.g1.com.aokfBa":
>>         	  Permission denied
>> 	cxx: Info: 1 catastrophic error detected in the compilation of "../../../doc1gensrc/igxmlscanner.cpp".
>> 	cxx: Info: Compilation terminated.
>> leave it a few minutes and it may work properly. It does seem to fail on some files a
>> lot 'in preference' to others.
>> 
>> It is the randomness that confuses me.

This just screams `NFS attribute caching' to me.

I suspect that the Tru64 box is screwing up its attribute caching such
that an fopen (..., O_CREAT, 000) followed by an fchmod() to something
saner (or close analogue) is leaving the attribute cache with an entry
with mode 000 in it rather than whatever mode it's being reset to.
A few seconds later the attribute cache expires and the correct value
is sucked from the server.

If mounting with the `noac' option makes the problem go away, that's
your culprit. (Reducing the blocksize insanely far, as you have done,
might simply make things take so long that the attribute cache expires
naturally before the problem happens.)


I strongly recommend that you do a bit of wiresharking and get a
protocol capture on (probably) port 2049 to and from the NFS server
while the problem is happening. I'm just guessing without the
protocol stream at the point of failure to look at.

(An strace at the time of failure might be useful too, although I can't
remember what the strace analogue is called on Tru64, or indeed if I
ever found one. It might be useful to know what actual error is coming
back from fwrite()/fread()/whatever, and also what that `whatever' is.)

> I've never used TRU64, but have you tried to mount this without the
> "soft" option?
>
> In the past that has caused problems on some Linux machines (long time
> ago, can't remember the details).

hard/nointr is useful if you're getting unexpected -EIOs, but this is a
problem with the *file mode*.

-- 
`Some people don't think performance issues are "real bugs", and I think 
such people shouldn't be allowed to program.' --- Linus Torvalds
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list