UNIX question

Chris Torek chris at umcp-cs.UUCP
Sat Dec 14 08:06:27 AEST 1985


In article <974 at ccice5.UUCP> ahb at ccice5.UUCP (Al Brumm) writes:

> A clean way to [ignore children] on Sys3 was to use the following
> system call in the parent process:
>	signal(SIGCLD, SIG_IGN);

Cute... maybe I will add this hack to our kernel.  One question:
Is SIGCLD always reset to SIG_DFL on exec?  If not, since ignored
signals normally remain ignored, it could break other programs
which expect to collect children; and programs that ignore SIGCLD
would have to carefully un-ignore it just after forks.

> Note that this would not allow you to examine the child's exit
> status.  However, you could examine the exit status by doing the 
> following:
>	int
>	sigcld()
>	{
>		int pid, status;
>		pid = wait(&status);
>		...
>	}
>	main()
>	{
>		int	(*sigcld)();
>
>		signal(SIGCLD, sigcld);
>	}

Well, the `int (*sigcld)()' declaration is wrong and (in this case)
unnecessary; it should be `int sigcld()' if anything.  But that is
not all that is amiss.  In V7, 3BSD, and 4BSD, and I suspect also
in Sys III and V (and Vr2 and Vr2V2), and probably in V8 as well,
signals are not queued, and without the `jobs library' of 4.1BSD,
or the signal facilities of 4.2, this code cannot be made to operate
reliably.  It *will fail*, someday, no doubt at the worst possible
moment.

The problem is that several children may exit in quick succession.
Only one SIGCLD signal will be delivered, since the parent process
will (just this once) not manage to run before all have exited.
The sigcld handler has no way of determining how many children are
to be processed.

In 4.1BSD and later, the solution is a new `system call', wait3().
This call has two optional parameters, WNOHANG and WUNTRACTED.
WNOHANG tells the kernel not to wait for existing children to exit.
Instead, wait3 returns 0 in this case, allowing the signal handler
to finish up, having now collected all exited children.  (WUNTRACED
exists only for C-shell style job control with stopped processes,
and is irrelevant here.)

Unfortunately, this solution is still incomplete.  There are race
conditions unless the child exit signal is withheld (but not ignored)
for the duration of the child collection routine, and can be withheld
during process creation (in case the created process exits before
the parent finishes updating data structures).  This is the case
under the 4.1BSD `jobs' library, and in all 4.2 and 4.3 systems.

Anyway, what it all boils down to is that process control is
unreliable in many versions of Unix, but can be made reliable in
4.1, 4.2, and 4.3BSD.  If there is any way to reliably handle
process exit and `job control' style processing in System III and
System V, I am not aware of it---though that should be unsurprising
since I have never used them.  If it is possible in the latest AT&T
Unixes, I would like to know how.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris at umcp-cs		ARPA:	chris at mimsy.umd.edu



More information about the Comp.unix.wizards mailing list