C source character set
Georg Wittig
wittig at gmdzi.UUCP
Tue Oct 3 01:31:58 AEST 1989
May be the follwing are RTFM questions, but I don't have the ANSI C papers;
Harbison & Steele II don't seem to cover it ...
My questions are about the legal characters in a C source programme:
[1] There exist editors that allow you to enter any ASCII character. Consider
the following program fragment:
/* in the following lines let @ be the character '\0' */
int x;
x = 1 + /* foo @ bar */
2 /* */
;
Is this program fragment equivalent to
[a] ``int x; x = 1 + 2;''
In this case C compilers cannot use ``fgets'' to read the source
lines.
or [b] ``int x; x = 1 + ;''
This will result in a syntax error message in later compiler
phases.
What about a '\0' outside a C comment? Does it terminate the current line
or must it be kept so that a syntax error message will be the result?
What about a '\0' in a string constant?
[2] Furthermore, there are (non-UNIX) operating systems that encode the end of
a source line by the number of bytes of that line instead of inserting a
newline character (\x0a or \x0d in ASCII, \x15 in EBCDIC) at the end of
that line.
As an example, the line ``abc'' could be encoded as ``\3abc'', and not as
``abc\x0d''. In those environments ``[f]getc'' must generate an artificial
'\n' character at the end of the line. Or am I mistaken?
What if exactly this artificial '\n' is also a character of the line?
What is a ``line'' in this context?
Consider a (perverse looking) macro like the following:
/* in the following line let @ be the character '\n' */
#define X(a,b) foo@#define X(a,b) ((a)+(b))
i = X(27,38);
Is this required to pass the preprocessor phase without an error message,
and if so what is the output of that phase? I can think of at least 5
different ways to process such a crazy macro.
[3] Line continuation by `\': Does it only apply to #define contexts and string
constant contexts, or is it a general rule? Example:
int terrible_long_identifier;
terrible_lon\
g_identifier = 1;
Does the assignment statement alter the value of that terrible long
variable, or is it a syntax error (``terrible_lon'' and ``g_identifier''
undeclared)?
Thanks in advance,
--
Georg Wittig GMD-Z1.BI P.O. Box 1240 D-5205 St. Augustin 1 (West Germany)
email: wittig at gmdzi.uucp phone: (+49 2241) 14-2294
-------------------------------------------------------------------------------
"Freedom's just another word for nothing left to lose" (Kris Kristofferson)
More information about the Comp.std.c
mailing list