ANSI C and the C Pre-Processor

Sat Sep 29 20:54:00 AEST 1984

>From Henry Spencer:
|
| > ............ However, this idea is being extended to include strings and
| > character constants as tokens that don't get scanned for replacement text.
| 
| K+R, section 12.1:  "Text inside a string or a character constant is
| not subject to replacement."  In other words, this is not something new:
| the language has always been specified to behave that way.

I think it instructional to consider the wording of the proposed
(draft) standard.  [This is from the July version, I doubt that its
changed in the Sept one].

Sect 9.2: ..... Character constants and strings in the token sequence or
in the rest of the program are not scanned for defined identifiers or
formal parameters. ....

Now consider the wording in the April version (it was sect 9.1 then)

Sect 9.1: ..... Character strings in the token sequence or in the
rest of the program are not scanned for defined identifiers. ....

Note the difference.  K&R was never clear on this point - its
wording on this point (and others) was ambiguous.  That is,
a perfectly viable interpretation, taken by Reiser, was that
strings in the token sequence could be scanned for parameters.

There are (as has been pointed out many times) many reasons
for allowing this.  The ONLY one for denying it, that I can
see, is that some people get confused (don't understand what's
happening).  The right way to solve that problem is to clearly
document what happens - no-one will have any problems with
it if its made clear what will happen.

Henry continues:
| 
| > The questions are: Should this change be endorsed?
| 
| Of course it should be endorsed, since it's not really a change at all.
| The standard is the documentation, not Reiser's code.

The problem is that K&R is *not* a standard.  If it was, we wouldn't
need X3J11.  In the absence of a standard, and in the presence
of ambiguous documentation, the only place to look is in the
implementations.  Henry also stated (quote omitted) that most
non unix C compilers adopted the restrictive approach.  So,
now we have a conflict - no immedate practical reason (in terms
of broken code) for jumping one way or the other.  In short,
nearly the ideal situation for adopting the best solution.

If C were a language for amateur programmers, beginnners, etc,
I would tend to favour the restricted approach.  But that's not
what C is.  Its a dangerous language, filled with dangerous
features.  Its for professionals.  We should adopt the most
useful approach - the one that gives the greatest power to
the programnmer - that is clearly the liberal approach.

Pragmatically too, it will be much easier to convert programs
broken by this strategy (those in which macro replacement text
contains strings containing "accidental" references to parameters)
than those broken by the current draft proposed standard
(those that use replacement inside strings to good effect).
In the former case, all that needs to be done is to rename
the formal parameter.  In the latter, some whole new mechanism
needs to be devised - possibly requiring changes in the source.
I also suspect that less programs would be broken by the former.

Henry again:
| 
| As for what should be done to bring back the lost functionality...  the
| ANSI C folks have basically said "if you want a general-purpose macro
| processor, use m4".  The programs that this "change" will break are
| broken already, and should be fixed to do it right.

No-one is asking for a full blown macro processor, just that subset
that is really useful for C programs.  If the committee were to
take the "use m4" attitude, they would logically have to standardize
m4 as a (possibly optional) part of the C compiler.  Otherwise
all those programs that go to the trouble of adopting their
recommendation, and use m4, will stop being portable, which can
hardly be the aim.

Joe Mueller replied:
|
| As Henry stated, the X3J11 committee (ANSI C), felt that the preprocessor
| was not intended to be a general purpose macro processor, BUT, we did
| acknowledge that there was a large body of code that used these types
| of "features". The committee is currently concidering proposals for
| 
| a) token concatination operations within the preprocessor. It will
|    definitely NOT be startoftoken/**/argument. Currently it looks like
|    the # will be used like this: startoftoken#argument. I don't believe
|    we have definitely decided the syntax for the operation. I think that
|    the committee did decide that the functionality was needed.

I agree that this is needed - while I regret the need to alter
some of my source (I am a xxx/**/yyy user) I admit that this
is a revolting way of forming tokens, something better, anything
better, would be welcome.  [No, please don't tell me about your
favourite revolting way of avoiding xxx/**/yyy, I've seen most
of them, none of the existing ones is clearly better.]
The '#' operator proposal looks reasonable to me.  When you're
considering this, please also remember to do something about
the problems of blanks in the actual parameter strings - are
they signifigant, or not?  That is spaces between the preceding
comma or '(' and the start of the replacement text, and blanks after
the text before the ')' or next comma.  I would prefer that the
standard make it clear that these should not be included as
part of the replacement text.

Joe:
|
| b) "stringizing" (I didn't make up this term, someone else did) arguments
|    is also under concideration. One proposal is to do the substitution
|    if the argument name is the only thing within the quotes. i.e.
|    #define foo(bar) printf("bar")
|    will expand bar within the quotes where
|    #define foo(bar) printf("the argument was bar")
|    will not expand bar.

Ugh!  How could you justify that!  I appreciate, that combined with
constant string concatenation, it would give all the functionality
that is needed - the second example could be rephrased as:
	#define foo(bar) printf("the argument was ""bar")
but that's going to be a nasty distinction to try to explain
to anyone.  And that would break ALL existing implementations.

Seems to me that in this case, adopting the Reiser interpretation
is the better thing to do.  Document it clearly, so people aren't
trapped, and that should end the problems.

Robert Elz				decvax!mulga!kre