Subtle bug in fflush and fix (really, signals and the C library)
gwyn at brl-smoke.UUCP
gwyn at brl-smoke.UUCP
Sun Jul 13 04:04:04 AEST 1986
In article <603 at batcomputer.TN.CORNELL.EDU> garry%cadif-oak at cu-arpa.cs.cornell.edu writes:
>In a recent article gwyn at brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) wrote:
>>The problems are not as well known as they should be. Signals
>>can impact the design of much of the C library. AT&T reworked
>>the standard I/O routines in an attempt to permit the use of
>>printf() in a signal handler, but they didn't get it quite right
>>because it is NOT POSSIBLE to get it right, at least not without
>>severe performance impact...
>I can see it would be tough to do at user level. But I don't see why
>the kernel read() and write() routines can't hang a simple queue on
>their front ends. Elucidate, please?
I'm afraid it would be a long and boring story and may violate
our non-disclosure agreement to go into the details, but...
The fundamental source of difficulty is the fact that the
standard I/O library (as well as some other C library
routines) maintain static data structures that can be
left in an inconsistent state when an asynchronous interrupt
occurs. Using Berkeley-style "reliable signal" facilities,
which seem to have finally made it (in some form) into UNIX
System V, one could establish "conditional critical regions"
that hold signals during times that the data structures are
being rearranged. However, since that adds system calls
there is a performance penalty. Without reliable signals,
one would have to resort to semaphore mechanisms, which are
either expensive (System V sem*() facilities) or may require
special hardware support. (Although sometimes one can exploit
conventional facilities; I once had a semaphore package for
the PDP-11 that used ASRB and INC to obtain atomic interlock.)
The semaphore approach greatly complicates the library code,
since it has to complete the interrupted code's work if a
critical region is in progress when an interrupt occurs. If
a second interrupt comes along there can be a royal mess.
There are other approaches (e.g., coroutining) that I can
think of but they're even more elaborate than this.
This situation is one of the reasons that ANSI X3J11 had to
decide just how much to promise could be guaranteed for a
signal catching function. (We didn't promise much; you can
abort() or longjmp() or do some implementation-specified
atomic variable modify or something else I forget.)
More information about the Comp.unix.wizards
mailing list