pointer poison (was: effect of free())

T. William Wells bill at twwells.com
Tue Oct 3 02:00:47 AEST 1989


In article <184 at bbxsda.UUCP> scott at bbxsda.UUCP (Scott Amspoker) writes:
: >I'm not sure what you are talking about. The original posting in this
: >thread (mine) said that the C standard permits the compiler writer to
: >assume that you don't use a pointer after a free...
: >[...]
: >If I write the program so that it is not designed to reference
: >pointers after they are freed, assuming I get it right, it doesn't
: >matter whether the generated code trashes the pointer...
: >[...]
: >The conclusion is that I should avoid referencing the pointer after
: >it is freed, so that this perfectly legit optimization won't break my
: >code.
:
: I agree, the original posting had to do with testing a pointer after
: a free() call.  However, the discussion quickly expanded into pointer
: handling in general.  Many readers were concerned that their code
: could possibly be testing or moving a pointer that did not contain
: a valid address.  Even though their programs may have the appropriate
: logic in place that would prevent ultimate *dereferencing* of such a
: pointer, it was suggested that merely handling such a pointer is
: considered a bug: (ex:  p1 = p2  causes a trap if p2 contains an
: invalid address).  While I would agree that handling a pointer
: that you won't ultimately be using (because of some later
: condition) is questionable style - it's hardly an outright *bug*.

I'm going to explain this at length, so that we can stop arguing
about it. Your assertion amounts to using a freed pointer's value
doesn't break anything so it is OK. And I'm saying that that is not
true.

A C program can operate in one of two modes: within the C model, and
outside it. Programs that operate within the C model may do different
things due to implementation differences, yet, until they stray
outside the model, will do predictable things. (Compiler bugs
permitting, anyway. :-) Programs that operate outside the C model
might do *anything*.

Obviously, you want to never write a C program that goes outside the
C model. Unfortunately, this is not always possible. For example,
when writing code that has to reference specific addresses. But such
programs are always nonportable, and so should never be written that
way unless their purposes are inherently nonportable. (And then, the
only parts of the program which go outside the model should be the
ones that must.) It is nice when a compiler has a wider model than
the C model, making it possible to write such programs, and making it
easier to debug your program when it goes outside the usual model. But
such a compiler can also be misused, especially by those who take its
model as "the" C model.

What exactly is "the C model"? This is a list of assertions associated
with each part of a program. Each assertion may say something about
source code of that part or the state of the "C machine" when
executing that code. For example, the C model includes a statement t
that the right hand operand of the divide operator must be nonzero
when the code is executed. If you do divide by zero, *anything* can
happen. It is the case that most machines will either quietly ignore
this error condition, stop the program execution and return to the
OS, or trap to an error routine, but if your program played taps over
your machine's speaker and went into an infinite loop, you shouldn't
be too surprised. :-)

The question arises as to which exact set of assertions should
comprise the C model. Obviously, the "dictatorship" view: *my*
compiler defines the C model, is right out. On the other hand, the
"liberal" (American sense) view, the view that this set is null or as
close as possible, is also right out.

Another view, the "democratic" view, says that the C model is the
intersection of the models of some set of popular compilers. This too
is out. Like any absolute democracy, it tramples on those who are not
in the majority, by declaring that their particular problems are of
little concern.

Yet another view, the "anarchic" view, says that the C model is the
intersection of every C model. This view, too, is out. Should we
cater the the quirks of, say, one of the brain-damaged "C compilers"
for the 8051? Or what about compiler bugs? What about compiler
"features" (like 8086 compiler's near and far keywords)?

Should we even try to define the C model in terms of what existing
compilers do? Never mind that this really is begging the question,
the answer is the same as in politics: you *can* define the C model
in terms of existing compilers, but this is going to result in
compromise and dissatisfaction all 'round. And eventual chaos.

There is, as in politics, one way that can work, the "constitutional"
method. In this method, there is a piece of paper which defines the
langauge, the standard. The standard serves as the touchstone by which
we determine the model: any assertion stated or implied by the
standard is part of the model; any other assertion is not.

Just as with constitutions, standards require interpretation, will
contain ambiguity and incompleteness and downright error, and will
generate endless debate. Such is the consequence of our being finite;
a standard represents the best we can do at the time, but we aren't
going to have a *perfect* standard (not, at least, till progamming
becomes an engineering discipline instead of an art. No it isn't!)

In spite of this, many of the views mentioned above have some merit.
Obviously, a standard that is largely inconsistent with existing
practice is going to be worthless. And a standard that ignores the
needs of the minorities is going to alienate a large part of the
community. (We are all, after all, likely to become a part of that
minority at some time or another. :-)

So, having a standard doesn't really solve the problems. Instead,
however, it gives those problems to a small group of people who will
do their best to satisfy as many of the conflicting desires of the C
community. It is guaranteed that some minorities will be left out in
the cold. And some parts of the standard will even offend the
majority (6 character monocase externals, faugh!).

But once it is done we have a *single* (well, within the parameters
of "implementation defined") C model which we can all look to and
which a programmer, who you can usually bet is not as conversant with
the problems of many different machines as the standard writers, can
follow and have, as a consequence, a justified belief that his
program will be portable.

(A similar reasoning applies to the de facto standards that exists in
the absence of a real standard. See "democracy" above and apply that
de facto standard instead of a real standard in the following
paragraphs.)

Now, to get out of the ether and back to the real world, we have a
practical question: should a programmer limit his portable
programmers to the C model in the standard or should he use a wider
or even different model? The answer to the latter should be clear: a
programmer writing portable code that is inconsistent with the
standard C model is just fooling himself.

But that still leaves the question open: should we use a wider model?
The answer to that should still be "no". For the "yes" answer implies
that you know about all those zillions of systems out there and are
willing to gamble that none of them breaks your model. And also that
you know about all those zillions of systems *that do not exist yet*
and are willing to make that gamble about them as well.

So, to summarize my point so far: if you are writing portable
programs, you must write to the actual standard or to some kind of
"democratic" de facto standard. But best of all is to write your
programs so that they don't violate the de facto standard and are
easily modified, as the two converge, to meet the actual standard.

The freed pointer thing is, as has been argued, acceptable within the
de facto standard's model. No one has shown a real system where this
fails. Fine. But, once we are following the C standard, using a freed
pointer will not be within the C model. Since (unlike, e.g.,
prototypes) there is no contradiction involved, one can just not use
freed pointers, one should never do it at all.

---
Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
bill at twwells.com



More information about the Comp.lang.c mailing list