C not LALR(1) & compiler bugs
richw at ada-uts.UUCP
richw at ada-uts.UUCP
Sat Jan 18 07:20:00 AEST 1986
C's grammar is CONTEXT SENSITIVE !? Can it be ?!
The following is quoted from page 121 of "C: A Reference Manual" by
Harbison & Steele (which, by the way, beats the pants off of
Kernighan & Ritchie as a reference manual). After the quote,
I've included a small program which just may reveal a minor bug
in your C compiler (it did for mine).
Allowing ordinary identifiers, as opposed to reserved words only,
as type specifiers makes the C grammar context sensitive, and
hence not LALR(1). To see this, consider this program line
A ( *B );
If A has been defined as a typedef name, then the line is a
declaration of a variable B to be of type "pointer to A."
(The parentheses surrounding "*B" are ignored.) If A is not
a type name, then this line is a call of the function A with
the single parameter *B. This ambiguity cannot be resolved
grammatically.
C compilers based on UNIX' YACC parser-generator -- such
as the Portable C Compiler -- handle this problem by feeding
information acquired during semantic analysis back to the
lexer. In fact, most C compilers do some typedef analysis
during lexical analysis.
All I have to say, concerning the design of C's syntax, is "Oops".
I also realized that this, combined with that real spiffy feature
of C that identifiers are the same if the first 8 characters are
the same, could be combined to really confuse C compilers. I tried
the following program on the compiler I use:
typedef int long_type_name;
f(a)
int *a;
{
long_type_of_function_name (*a);
printf("Bye");
}
According to H&S, a correct C compiler should say that this is a
redeclaration of "a" (since "long_type_of_function_name" and
"long_type_name" are, uh, the same identifer). However, the
compiler I use simply eats it up, thinking that the line in
question is a call to some external function (which, since it
wasn't explicitly declared, C gratiously assumes returns an
int -- isn't C just so helpful !). My guess is that when the
lexer checks to see if the function name is really a typedef'd
name, it checks ALL of the characters in both names (i.e. strcmp)
instead of checking just the first 8 (i.e. strncmp).
Of course, since the identifiers really ARE different, it SEEMS
as if the compiler's thinking it's a function call IS correct.
Technically, it's a buggy compiler, though.
Isn't it strange that it seems better for the compiler to be wrong?
Doesn't that make you wonder if something is SERIOUSLY wrong with C?
Personally, I think that the real fault for my "buggy" compiler
lies not with the compiler writer, but in the shoddy language design
that haunts the deep-dark corners of C. I mean, is there any excuse
for the grammar being context sensitive? Or, for that matter, for
identifiers having only 8 significant characters?
-- Rich "Picky-Picky-Picky" Wagner
P.S. Forgive me if this piece of C trivia has been already discussed
(or flamed, as in this case) in net.lang.c -- I just found out
about it and was amazed.
More information about the Comp.lang.c
mailing list