Sys V/386/3.2 UNIX system getting hung (?)
M.BAKER
mrb1 at homxc.ATT.COM
Fri Apr 7 01:43:27 AEST 1989
Hi ---
Since the net was so helpful on my last query, I'd like to
give it another try:
We have an AT&T 6386E system running UNIX SysV/3.2.
While running our application, it has been observed to
'hang'. Specifically, the application stops in the
middle of things. More importantly, all the terminal I/O
stops.......including the system console. You can't log
in on a free getty. Anything you
type gets echoed back to the screen, but nothing gets
done with it. If you hit "Ctrl-Alt-Del", the screen
displays a message saying "You must run shutdown before
using Ctrl-Alt-Del" or something very similar to that.
There is no "Fatal Error - Parity Check at ...." message
or anything abnormal on the console.
The only thing to do then (that seems to work for me) is
to hit RESET.
Well, rebooting kind of destroys all the clues. Since
the kernel apparently never did a panic(), there's no
dump available to look at with crash.
If the hang occurred in the middle of the night, and
time elapses before you reset the system, sar shows
nothing past the last recorded 'checkpoint' before the
system 'died'.
I will furnish more details of our hardware configuration/software
application upon request....for now, I think that these basic clues
should be able to get us aimed in the right direction.
My first suspicion:
The 3.1 & 3.2 software notes state that if you "run out of
free clists, all input/output activity from/to terminal ports and
the console will cease. No warning message is printed by the
system to show that it is out of clists". Sounded good at first,
so we raised the NCLIST tunable parameter from 120 to 170 (recom-
mended value for 4M machine) and then to 200 (the max. in mtune).
Stil had the problem, though. Which leads to a couple of quick ques-
tions:
1.) Can you check the number of free clists while the
system is running? sar doesn't seem to be any
help here, and I'm sure crash can reveal it but
I'm not sure how to get to it.
2.) Is there any circumstance in which clists can get slowly
used up (i.e., occasionally not returned to the
free pool)?
Also, could this problem be symptomatic of the time slicer
interrupt going away
(not being generated, or recognized) which robs UNIX of knowing
that time is passing us by? Or are we just in some kind of major
deadlock?
I think that the processor is still alive, since console characters
echo to the screen and it responds to the Ctrl-ALt-Del keyin. Plus
this is a protected mode machine, so it's a little tougher for an
application to clobber the OS by writing in the wrong area, or
whatever.
Any clues/suggestions/tips/criticisms/flames/whatever would be
really appreciated.
Thanks
M. Baker
homxc!mrb1 201-949-3455
More information about the Comp.unix.questions
mailing list