NFS, hung processes

Tue Aug 1 16:35:41 AEST 1989

In article <13134 at bloom-beacon.MIT.EDU>, jik at athena.mit.edu (Jonathan I. Kamens) writes:
> 
>   One solution, which is what we use, is not to hard mount anything
> but the most important NFS filesystems.  We mount all user filesystems
> soft with a five minute error timeout by default, so if a user's
> fileserver goes down, processes will only try to access it for five
> minutes.  Once the user gets his prompt back, he can carefully save
> whatever work he is doing to a local hard disk or mail it to himself
> to prevent it from being lost.

A problem with "soft" mounting is that a timed-out I/O will return
an error result to the user program.  Unix programs are notorious
for not checking for error returns on read(), write() etc and can
fail in mysterious ways.

This can be particularly bad in the case of an executable that
is running from a dead server.  A pagein that gets an error from
a soft mount will crash the process and leave a core dump.  I prefer
to mount /usr and local executables ("/usr/local" around here) with
"hard" and set the "intr" option so that I can at least kill a hung
process with a SIGTERM if I get fed up waiting.  The "intr" should
work OK - although it can take a while since it has to wait for the
hung NFS operation to timeout (can take a minute or so).

Made in New Zealand -->  Brent Callaghan  @ Sun Microsystems
			 uucp: sun!bcallaghan
			 phone: (415) 336 1051