Trouble killing processes
Guy Harris
guy at gorodish.Sun.COM
Wed May 18 05:51:13 AEST 1988
> The real fix would of course be in the kernel. I would suggest setting
> a timeout on each system call. This way, an lseek on a dead tape drive,
> say, would fail after n secs of cpu. Some sort of context might need
> to be saved before the syscall starts, so things can be restored. This
> could be expensive. Comments?
Probably not a good idea.
"lseek" is a bad example; in all current UNIX systems that I'm familiar with,
"lseek" only sets a "seek pointer" in memory - it never goes near the device.
This pointer is then used by the driver to position the tape before doing any
I/O operation.
A more germane example *might* be an I/O operation or an "position the tape"
"ioctl" operation on a dead tape drive, except that the *only* reason this
would require a timeout should either be that the tape driver is buggy and
doesn't immediately detect a dead drive or that it doesn't have some timeout
scheme *in the driver* to detect a dead drive. Even such a timeout could be
tricky; some magtape operations can take a *very* long time to complete.
Basically, system calls should take as long as they need to; this could very
well be infinite ("pause()" or "sigpause()") or, worse, finite but
indeterminate. In either case, no timeout can be imposed.
A typical "wedged" process is either waiting for something that *must* complete
(in which case its unkillability is unfortunate but unavoidable) or is hung due
to a kernel bug (in which case the real fix is, of course, in the kernel - but
it's not to kludge in a timeout).
(P.S. the timer obviously doesn't want to be based on CPU time - a blocked
process tends to consume CPU time *extremely* slowly, if at all.)
More information about the Comp.unix.wizards
mailing list