fsck
Bill Lasher
W0L at PSUVM.BITNET
Fri Nov 10 07:34:21 AEST 1989
Some of you may have been following the fsck question I posted last week.
Thanks to help from several of you, including some people at SGI, I finally
decided the REAL problem was our system administration. One of the people at
SGI thought the following might be of interest to others, and suggested I post
it.
The original note follows:
=========================================================================
Date: 9 November 1989, 14:16:04 EST
From: Bill Lasher (814) 898-6391 W0L at PSUVM
Subject: Re: fsck, init state 3
To: dunlap at sgi.sgi.com
In-Reply-To: dunlap%bigboote.csd AT sgi.com -- Thu, 9 Nov 89 11:06:55 PST
Our most recent problem (the RPC timeout) I think was caused by the way
we implemented the nightly reboot. We scheduled them 5 minutes apart,
figuring that would be enough time. I found out today that one machine
was still in the process of restarting when the YP server he was
communicating with started to reboot. This caused the system to hang.
Rebooting did in fact clear things up, but it took some time. Part of
the problem is that the time on each machine is not exactly the same (a
diference of a couple of minutes). We are going to set all machines to
the same time, and change the reboot interval to 10 minutes.
I think we got thrown off the track because running fsck nightly changed
the total time it took for the systems to reboot, and things just
happened to work out O.K. Also, we probably weren't patient enough
earlier to let reboot do it's thing; when reboot didn't work, we tried
fsck, which did work because it took longer to finish up, and by the
time it was done the network wasn't as busy (or something like that.) I
think we were also in a hurry to get things fixed, and as a result got
sloppy (ie, running fsck without unmounting, etc.).
Some of our problems may come back, but we will handle each of them
separately as they occur, and try to be more careful. I suspect some of
the earlier problems (the full disks, hung spool queues) showed up
because we were letting the systems run for a week at a time without
rebooting, and things just got a little messy. We had planned from the
beginning to have them reboot every night, but we had too many other
things going on to get it implemented.
We'll just take it from here and see what happens.
Best regards,
Bill
========================================================================
END OF ORIGINAL NOTE
You may not follow all the details, but you probably get the general idea.
I think it's a good example of what can happen when an experienced computer
user gets his first UNIX/networked system.
Bill
"If I knew what I was doing, I wouldn't have had to ask the question!"
More information about the Comp.sys.sgi
mailing list