summary of C-standards workshop at Usenix

Henry Spencer henry at utzoo.UUCP
Sun Jul 1 10:59:02 AEST 1984


The following is an informal report on what was said at the C Standards
workshop at Usenix.  The workshop essentially consisted of a presentation
by Larry Rosler (of the ANSI C effort) plus question-and-answer afterwards.
I apologize to Larry for any errors in the following.  (Incidentally, he
deserves a vote of thanks from everyone who attended the session.  He flew
in from the East Coast, at considerable inconvenience, basically just to
give that talk.)

The ANSI C standards effort is X3J11.  It's split into three subcommittees:
environment, library, and language.  Rosler is chairman of the language
subcommittee.

The environment subcommittee is wrestling with a whole mess of very fuzzy
things about how C relates to its surroundings.  Alone of the three sub-
committees, this one has no existing document to work from, so they're sort
of feeling their way.  Among the things they're trying to cope with are
how a C program gets run (tentatively "main(argc, argv)", but the question
of environment variables is very difficult on non-Unix systems) and how to
resolve problems with European character sets.

The library subcommittee is working from chapters 2 and 3 of the Unix
manual.  Most of chapter 2 is gone because it's Unix-dependent, although
a few things like "signal" are still there.  Most of chapter 3 is still
present:  stdio, chars and strings, memory allocation, basic math functions
(nobody feels like standardizing the Bessel functions!).  They are looking
at things like error handling in the math library.

The language subcommittee is the one all the detail following is about.

Their basic goals are:
	- portability
	- preservation of the "spirit of C", i.e. the ability to get
		right down into the bits if you want
	- minimizing the impact on existing valid programs
	- formalizing proven enhancements (emphasis on "proven")
	- producing precise but readable documents

The specific approach to that last item is to tidy up and tighten up
the existing C Reference Manual.  The idea of defining C by use of a
mathematical formal definition was discussed, but it was rejected on
the grounds that the audience for a definition written in English is
several orders of magnitude larger.

They've started from the System V.2 C Reference Manual.  There have been
three major areas of change in that since the "white book":

1. Long identifiers.  The problem with Berklix-style arbitrary-length
	names is that they break existing tools and file formats.  The
	breakage is much less severe if one simply cranks up the limit
	instead of making it infinite.  Internal names (including pre-
	processor names) are now significant to 31 characters.  External
	names are, alas, significant only to 6 characters and case is
	not significant in them; this cannot be improved without making
	the standard incompatible with most non-Unix object-module formats.

2. Void and enum.  "void" is the type returned by a function that doesn't
	return a value.  You can also cast things to "void" to throw away
	an unwanted value.  The keyword is also used in a couple of other
	places, discussed later, to avoid having to introduce too many
	new keywords (any of which has the potential to break existing
	programs).  Enums are as in V7; improvements to permit things
	like ordering comparisons (>=, etc.) on enums are still being
	thought about.

3. Structure/union improvements.  Structure assignment, passing, and
	returning are as in V7.  Structure comparison isn't there, at
	least not so far.  Member names are now local to the
	particular structure, instead of all being in a global name
	space; this means that you have to be more careful about getting
	the type of (e.g.) the left-hand-side of "->" correct, or the
	compiler will object.

The committee has introduced three major changes since the V.2 CRM:

A. Function-argument type declaration and checking.  Instead of just
	saying "extern int fread();", you can now say:

		extern int fread(char *, int, int, FILE *);

	so the compiler can do proper type checks.  In the event of
	a type mismatch, the same conversions as for the assignment
	operator apply.  (Hooray, no more casting NULL pointers!)
	Variable-argument functions like printf can be declared like:

		extern int printf(char *,);

	It is admitted that the comma is not all that conspicuous,
	and that this syntax makes it impossible to declare a function
	which has *only* variable arguments.  These things are, of
	necessity, compromises.  [Please note that neither Larry Rosler
	nor I necessarily *like* all the things I'm reporting.]  There
	is an ambiguity when it comes to declaring no-argument functions,
	since "extern int rand();" looks like an old-style declaration
	which doesn't say anything about the arguments.  The convention
	for this is:

		extern int rand(void);

	which means "no parameters".

B. "const".  A new keyword (sigh) which is used to mark things that are
	read-only, with run-time assignments forbidden.  These things
	might be put in ROM or in text space.  Some examples, with notes:

		const float pi = 3.14159;

	This is a real, live, named constant, which will show up in the
	symbol table (unlike #defines).

		const short yacctable[1000] = { ... };

	An obvious case.

		const char *p;		/* pointer to constant */
		const *const q;		/* constant pointer to something */

	Illustrating two different uses:  the first is a pointer that
	can be changed but can't be assigned through; the second is a
	pointer that can be assigned through but can't be changed.  It
	is agreed that the syntax is less than ideal.  Note that const
	is *not* a storage class, it is part of the type.

		extern char *strcpy(char *, const char *);

	Illustrating telling the compiler that strcpy doesn't change
	its second argument.

C. Single-precision arithmetic.  If all operands in an expression are
	float, the compiler is allowed (not required!) to evaluate it in
	float rather than double arithmetic.  The choice is explicitly
	implementation-dependent.  Casts can be used to force evaluation
	in double.  Numeric constants, e.g. "1.0", are double, *not* float!
	This last isn't ideal, but trying to fix it invariably makes life
	much more complex.

	The original double-only rule was partly a concession to the
	pdp11, partly just plain simpler, but partly a way of avoiding
	multiple versions of all the library routines.  With declarations
	of function argument types, the last problem is pretty much fixed.
	All the library functions in the standard want "full width"
	types, so that if you don't declare them, you're still safe.

Some lesser issues:

I. "Promiscuous" pointer assignments are illegal.  You must use casts
	when mixing pointer types or mixing ints with pointers.

II.  "void *" is a new kind of pointer, which cannot be dereferenced but
	can be assigned to any other type of pointer without a cast.  The
	idea here is that "char *" is no longer required to be the
	"universal" pointer type which can point to anything.  So for
	example, the declaration of fread earlier really should go:

		extern int fread(void *, int, int, FILE *);

	(People who have machines where all pointers have the same
	representation, don't complain.  You are lucky.  Others aren't.)

III.  "volatile" (the choice of name is tentative) acts like "const"
	in the syntax, but with different semantics.  It means that the
	data in question is "magic" in some way (e.g. device registers)
	and that compilers should not optimize references to such things.
	This resolves a long-standing problem with writing optimizing
	compilers for C.

IV.  "signal" is in the library.  This means that reentrancy is explicitly
	part of C.

V.  The preprocessor is part of the language.  The committee has opted
	for a simple and clean definition, which does not perpetuate some
	implementation accidents of some of the existing ones.  There are
	some minor improvements, like permitting space before the "#".

Some trivial additions:

i.  Hexadecimal string escapes.  [Retch.]  "Here's an ESC \x1b ".

ii.  String constant concatenation.  Two string *constants* occurring
	adjacent to each other in the source are considered concatenated.
	Note that this is constants only.  Among other minor things, this
	makes string continuation across line boundaries less ugly.

iii. "unsigned char", "unsigned short", "unsigned long" are all part of
	the language.  Plain "char" is *not* required to be signed or
	unsigned (requiring either would make efficient implementations
	impossible on some machines).  The question of a "char-sized int"
	type, of whatever syntax, has not yet been resolved.

iv.  The unary + operator.  Same conversions and type restrictions as
	unary -.  Does nothing.  This is partly consistency with other
	languages, and partly consistency with things like "atof".  (At
	the moment, "+3.14" is valid when atoffed from a string but not
	when compiled into a program!)

v. Initialization of unions and automatic aggregates.  The latter is
	just removal of an existing restriction.  The former is tricky;
	there is *no* clean way to define it.  The committee has opted
	to do something not necessarily good, but simple:  the type of
	the initializer is that of the lexically-first member.

vi. The selection expression of a "switch" can be of any integer type.
	(E.g. it can be a "long".)

vii.  #elif.  An added bit of preprocessor syntax, to simplify using
	#if's like a "switch".

Some things are gone:

01. "entry", "asm", and "fortran" keywords.  (Although the last two
	will probably be mentioned in a "recognized extensions" appendix.)

02. "long float" is no longer a synonym for "double".  Nobody ever used
	it.  There was discussion of using "long float" and "long double"
	to cope with machines having more than two floating-point types,
	but conversions and such are an unknown swamp in such a case, and
	the committee decided not to try.

03. 8 and 9 are not octal digits.

04. Pointer-integer conversions now are strictly type-checked, as I
	mentioned earlier.

05. The following code fragment is illegal:

		foo(parm)
		int parm;
		{
			int parm;
			...

	Some compilers interpret such a situation as nested scopes, so
	the inner declaration hides the outer one.  In this particular
	case, this seems both useless and dangerous.  The scope of the
	arguments of a function is now identical to that of the local
	declarations, so this is a duplicate declaration and illegal.

06. Nothing is said about the alignment of bitfields, not even the
	K&R guarantee that they don't straddle word boundaries.

07. Some existing compilers permit taking the address of a variable
	declared "register" if the variable is not in fact placed in
	a register.  This is now outlawed; "register" and the unary
	"&" operator don't mix.

All in all, the current draft standard doesn't sound too bad to me.
I will be getting a copy of it shortly, and may have some more comments
at that time.  A number of things are still unsettled.  The committee's
(very tentative) notion of schedule is a final draft for public comment
by the end of the year, and a real standard by the end of next year.
[Sound of crossing of fingers.]

Comments on this should *not* be addressed to me; I'm just an interested
observer, not a participant.  Write to:

	Lawrence Rosler
	Supervisor, Language Systems Engineering Group
	AT&T Bell Laboratories
	Summit, NJ  USA

No, I don't have a network address for him.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry



More information about the Comp.unix.wizards mailing list