Mysterious Sun-4 bug
Hugh LaMaster
lamaster at pioneer.arc.nasa.gov
Thu Jun 27 08:33:11 AEST 1991
We are experiencing a peculiar bug which has appeared from time to time
on our Sun-4/490 server. This system is very heavily loaded, mainly
because it is running Sybase. It recently had an official clean 4.1.1
release installed, with DBE 1.1, and selected patches added. The
system has a Sun VME FDDI board, and FDDI 1.1 is installed. There is
heavy NFS traffic to another Sun server via FDDI (at the moment -
Ethernet has also been used).
The bug has appeared in 4.1, 4.1 + various patches (almost 4.1.1), 4.1.1,
with and without DBE installed, with and without FDDI (ie, with NFS
traffic over ethernet). The same symptom has appeared in all cases:
a process which is usually doing NFS I/O will hang in "D" state. The
offending process cannot be killed, and eventually other processes
start hanging as well. During this period, Sybase activity
will have been very heavy. The Sybase datasever process itself, however,
never hangs (note: Sybase is set up so that its I/O is local, *and*
Sybase is using its own raw partitions). Even though Sybase itself
never hangs, *If Sybase asych. I/O is turned OFF,
the problem rarely if ever appears.*
So, to cause the hang, you seem to need:
Sybase, with asynch I/O on.
A heavy Sybase load.
Another process doing NFS reads/writes...
Oh yes. It seems to take a while to get in this predicament. After
the inevitable reboot, the system is usually OK for a while.
Has anyone else experienced this problem?
It could be an NFS problem, an asynch I/O problem, a load dependent
kernel problem, ...
Any help would be much appreciated.
--
Hugh LaMaster, M/S 233-9, UUCP: ames!lamaster
NASA Ames Research Center Internet: lamaster at ames.arc.nasa.gov
Moffett Field, CA 94035 With Good Mailer: lamaster at george.arc.nasa.gov
Phone: 415/604-1056 #include <std.disclaimer>
More information about the Comp.unix.wizards
mailing list