Comments on proposed C standard
Jim Gardner
jagardner at watmath.UUCP
Fri Aug 22 04:14:22 AEST 1986
The following (huge) document comments on the latest proposal
for a C standard. It is paginated, but does not contain tabs.
COMMENTS ON
THE DRAFT PROPOSED STANDARD
(Dated July 21, 1986)
- prepared by -
The Software Development Group
University of Waterloo
Waterloo, Ontario
Our comments are based on the _D_r_a_f_t _P_r_o_p_o_s_e_d _A_m_e_r_i_c_a_n
_N_a_t_i_o_n_a_l _S_t_a_n_d_a_r_d _f_o_r _I_n_f_o_r_m_a_t_i_o_n _S_y_s_t_e_m_s -- _P_r_o_g_r_a_m_m_i_n_g
_L_a_n_g_u_a_g_e _C, Doc.No. X3J11/86-104, dated July 21, 1986. In
addition, we make a number of comments on the Rationale for
the standard, Doc.No. X3J11/86-099, dated July 7, 1986.
This document supersedes previous submissions from the
Software Development Group, which were submitted in comment
on previous drafts of the standard.
Generally, we will use the same order of presentation
as the standard itself. Our section headings correspond to
the appropriate sections in the standard. However, we will
begin with some general observations.
_G_e_n_e_r_a_l _N_o_t_e _1: _R_e_s_e_r_v_e_d _W_o_r_d_s:
The standard defines many new symbols, particularly
#defined names in header files. These are effectively
reserved words, since programs that use the symbols for
other things will get into trouble sooner or later. We
count a total of 255 effectively reserved words:
32 language keywords
44 implementation limits
179 library-related names
In contrast, Cobol only has 227 reserved words!
To avoid a jungle of symbols that are effectively
reserved, we strongly urge that the committee follow one of
its own principles: symbols that begin with an underscore
are not for the programmer's use. This is a simple rule
that gets around most of the pitfalls. _A_l_l _n_e_w_l_y _i_n_t_r_o_d_u_c_e_d
_s_y_m_b_o_l_s _s_h_o_u_l_d _b_e_g_i_n _w_i_t_h _a_n _u_n_d_e_r_s_c_o_r_e. This means that
the symbols in <limits.h> should be
_CHAR_BIT
_CHAR_MAX
_SCHAR_MAX
- 1 -
University of Waterloo August, 1986
/* etc. */
The same holds for other new symbols: "size_t" should become
"_size_t", "ptrdiff_t" should become "_ptrdiff_t", and so
on. Of course, the old stand-bys like NULL and "errno" will
stay as they are, even if we might wish differently.
Note that we would recommend against reserving the
prefixes tttooo_, SSSIIIGGG, ssstttrrr, mmmeeemmm, and iiisss. This sort of rule
would prevent implementations from supporting common opera-
tions that aren't in the standard. For example, it actively
rules out "isascii" and "isodigit" since these are not
recognized by the standard. This will break a great deal of
code. Besides, the fewer "reserved word" rules a programmer
has to remember, the better.
_G_e_n_e_r_a_l _N_o_t_e _2: _P_o_r_t_a_b_i_l_i_t_y:
In Section 1.2, the designers state the principle MMMaaakkkeee
iiittt fffaaasssttt, eeevvveeennn iiifff iiittt iiisss nnnooottt ggguuuaaarrraaannnttteeeeeeddd tttooo bbbeee pppooorrrtttaaabbbllleee. We do
not argue with this principle in general, but we think it
should be counterbalanced by other considerations. When
there are only a few popular alternative behaviors, the
standard should provide both a _f_a_s_t operation (with
implementation-defined behavior) and one or more possibly
slower operations with well-defined behavior.
The whole point of a language standard is to allow
program portability. The standard should ensure that there
is _s_o_m_e way to write a portable program. For example,
consider the ">>" operator. The standard does not indicate
whether ">>" shifts arithmetically (propagating the sign
bit) or logically (inserting zeros).
What this effectively states is that the operation is
only defined on a subset of the possible range of operands
(i.e. when the operand to be shifted is positive). It is
not rigorously defined outside the range, but implementa-
tions are expected to support the operation outside the
range. This sort of situation crops up in many places in
the standard.
In order to make it possible to write portable
programs, we suggest that the standard should provide new
extended-range operators corresponding to each limited-range
operation. The standard can allow ">>" to work in an
implementation-defined way outside of its defined range, but
it should define additional functions, macros, or operations
to handle the full range of operands.
- 2 -
University of Waterloo August, 1986
For example, you might have _ARITH_RS(A,B) which works
like "A>>B" when A is positive, but which always performs an
arithmetic shift when B is negative. If the programmer
wants to be sure of a _l_o_g_i_c_a_l right shift, the operand to be
shifted can be cast to uuunnnsssiiigggnnneeeddd. In this way, the
programmer could always dictate whether an arithmetic or
logical shift was desired.
A second way to make portable programs possible is to
make appropriate definitions in <stdefs.h> or some other
header. In most cases where behavior is implementation-
defined, there are a limited number of possibilities. For
example, an implementation could define a symbol
_ARITH_SHIFT to indicate that right shifts were done
arithmetically and _LOG_SHIFT to indicate that right shifts
were done logically. With appropriate #iiifffdddeeefff directives,
source code could be adapted to either possibility. Similar
symbols would tell how "A%B" works when B is negative, how
integer division worked in the same situation, and so on.
Note that this technique adds _n_o extra expense at execution
time to determine how an implementation behaves.
_G_e_n_e_r_a_l _N_o_t_e _3: _T_h_e _9_0% _R_u_l_e:
The portability of a program is influenced by two
factors: how it uses C code, and how it uses the library
functions. If a program is ported from system A to system
B, the implementation on B will usually report places where
code is used incorrectly. However, it usually will not
report situations where B's implementation of a library
function differs significantly from A's implementation.
Thus, compatibility of library functions is of major concern
in porting programs, and therefore in design of a standard
for writing portable programs.
Our philosophy is that a function on system A should
not have the same name as a function on system B, unless the
A function is at least 90% the same as the B function. If
the two functions are not almost identical in functionality,
pretending that they _a_r_e the same by giving them the same
name is just asking for trouble.
In the context of the C standard, the 90% rule suggests
that the standard library functions should behave in a
manner that is almost identical on all systems. It is a
mistake, for example, to make the definition of a binary
file loose enough to encompass a widely divergent set of I/O
devices and file formats. We would rather see the defini-
tion restricted to allow operations that could reasonably be
regarded as portable, and nothing more. If a particular
system had special file formats that needed to be supported,
- 3 -
University of Waterloo August, 1986
the implementation on that system could provide additional
I/O routines to deal with such formats.
If a program is written using special routines for
system-dependent I/O, porting the program is actually
simpler. When the program is taken to a new system, the C
implementation will issue diagnostic messages indicating the
special I/O routines that are not available on the new
system, and the programmer finds out what has to be changed.
When porting a program written only with "standard"
routines, the programmer must laboriously track down system
dependencies that were disguised by using the "standard"
routines and this is usually a great deal more work.
In general, then, we believe that the standard library
should _n_o_t be designed to conceal the system dependencies
that exist on a particular machine. Instead, it should
provide support for features that are common to _a_l_l
machines, leaving it up to the individual implementation to
support dependencies.
_G_e_n_e_r_a_l _N_o_t_e _4: _T_h_e _C_o_r_r_e_c_t _A_n_s_w_e_r:
At times, efficiency has been put ahead of correctness.
A good example of this occurs with mixed signed and unsigned
operations. Consider the following code.
short i;
unsigned short u;
...
if ( i < u ) ...
Since the committee allows this to result in either a signed
or an unsigned comparison (depending on whether
sssiiizzzeeeooofff(ssshhhooorrrttt) is less than sssiiizzzeeeooofff(iiinnnttt)), iiittt iiisss pppooossssssiiibbbllleee ttthhhaaattt
aaa nnneeegggaaatttiiivvveee vvvaaallluuueee ooofff "iii" cccooouuulllddd bbbeee fffooouuunnnddd greater than a posi-
tive value of "u". Correctness has been sacrificed to
efficiency.
This situation should be avoided. If a signed integer
is negative, it should be less than all unsigned integers.
All other considerations of lengthening or converting argu-
ments are secondary. Note that efficiency doesn't enter
into this -- a compiler or interpreter has a much better
chance of generating efficient code than a user checking for
the situation explicitly by writing iiifff statements or
conditional expressions.
- 4 -
University of Waterloo August, 1986
If a programmer really wants the possibility of a nega-
tive integer being larger than a signed one, he or she can
always use explicit casts, as in
if ( (unsigned)i < (unsigned)u ) ...
Similarly, if a programmer knows that a particular form of
comparison is more efficient than the default, he or she can
always use explicit casting to ask for the more efficient
comparison method. When it _i_s possible for the programmer
to control efficiency, why give a more naive programmer the
wrong answer?
_G_e_n_e_r_a_l _N_o_t_e _5: _E_x_i_s_t_i_n_g _I_m_p_l_e_m_e_n_t_a_t_i_o_n_s:
The Rationale states that existing code is important,
but existing implementations are not. We agree with the
principle, but must point out that many popular C programs
are intimately connected to a particular implementation.
Adopting practices contrary to the way popular implementa-
tions work (e.g. the UNIX C compilers) may indeed break
programs in subtle ways. For this reason, our comments
sometimes state, "Implementation X does this differently."
In such cases, we are not saying that the standard should be
changed to do things X's way; we simply want to point out
that some important compiler does not behave in the given
manner, and one should expect some code to break as a
result.
_G_e_n_e_r_a_l _N_o_t_e _6: _M_e_a_n_i_n_g_f_u_l _S_l_o_p_p_y _C_o_d_e:
In a few instances, the designers have made it
necessary for conforming implementations to support "sloppy"
programming practices. For example, the following are
supposed to be equivalent.
extern int i;
int extern i;
When a sloppy practice is well-established, the designers
are justified, because existing programs should continue to
work. However, the above practice is virtually unknown (at
least to the people we have talked to and in the programs we
have examined), and requiring all implementations to support
it is surprising.
More to the point, making sloppy code meaningful has
undesirable side effects. A user who accidentally writes
such code receives no diagnostic message, because the code
is correct...even when the code is probably not what the
user intended to write. The program may behave in an
unexpected way because a typing mistake is accepted.
- 5 -
University of Waterloo August, 1986
In addition, the implementation that is forced to
accept sloppy code has more difficulty generating error
messages. It has less chance of identifying precisely where
the code went wrong, because the programmer has so much more
leeway. Consequently, the diagnostic facilities of the
implementation are degraded for _a_l_l users, in order to
support the sloppy few.
The standard should not require implementations to
support "loose" language constructs that are seldom used.
Obviously, there are instances where the designers must
decide whether a construct is or isn't used and different
people may have different opinions on the matter. Still,
the basic principle should be, "No sloppiness, unless
required by common practice."
_G_e_n_e_r_a_l _N_o_t_e _7: _R_a_t_i_o_n_a_l_e:
The purpose of the Rationale document should be to
explain why particular decisions were made. All too often,
the Rationale is used to explain what the standard says.
Obviously then, the standard itself should be made more
clear, with more examples and illustrations.
_S_p_e_c_i_f_i_c _S_e_c_t_i_o_n _C_o_m_m_e_n_t_s:
The rest of this document talks about specific sections
of the standard and the Rationale.
_1._6 _C_o_m_p_l_i_a_n_c_e:
A conforming freestanding implementation should provide
the standard header <stddef.h> in addition to <limits.h> and
<float.h>. The <stddef.h> should _n_o_t include a declaration
for "errno". For more on "errno", see our comments on Sec-
tion 4.1.1.
_2._1._1._2 _T_r_a_n_s_l_a_t_i_o_n _P_h_a_s_e_s:
Since the process of linking translated source files is
described in this section, one might believe that linking
must take place in the translation environment. The
standard should explicitly state that linking can take place
in either the translation environment or the execution
environment (or in some other environment, for that matter).
_2._2._4._2 _N_u_m_e_r_i_c_a_l _L_i_m_i_t_s:
We do not understand why so many #defined names are
missing their vowels. For example, why use SHRT instead of
SHORT? The difference in keystrokes is minimal, and the
standard guarantees that #defined names can be 31 characters
- 6 -
University of Waterloo August, 1986
long. This criticism applies to many of the names chosen by
the designers.
Technically speaking, the definition of FLT_ROUNDS is
incorrect. The beginning of the section states that each
macro must be at least the given value. The given value for
FLT_ROUNDS is 0, but the value -1 is also said to be
meaningful. Also, the alternatives for this value are
"rounds", "chops", and "indeterminate". This overlooks the
fact that "chops" could mean "truncate towards zero" or
"truncate towards negative infinity". We conclude that
there should actually be four alternatives for FLT_ROUNDS,
not just three.
_3._1._2._1 _S_c_o_p_e_s _o_f _I_d_e_n_t_i_f_i_e_r_s:
The Rationale says that the behavior is undefined if
you use an identifier outside its scope. The standard
itself says nothing about this possibility.
_3._1._2._2 _L_i_n_k_a_g_e_s _o_f _I_d_e_n_t_i_f_i_e_r_s:
According to the rules for declarations that include
the keyword eeexxxttteeerrrnnn, it is not possible to declare
extern i;
...
static i;
inside a file and have the first declaration of "i" refer to
the static (internal linkage) "i". The Rationale says this
decision was made in order to allow one-pass compilers, but
in fact, one-pass compilers are possible without this
restriction. All that the one-pass compiler needs to make
this work is a loader with a bit of intelligence.
We believe that this is contrary to the principle
stated in Section 1.2 of the Rationale: Existing code is
important, existing implementations are not. Ruling out the
above construct will break a good many existing UNIX
programs, since the existing UNIX compilers allow eeexxxttteeerrrnnn
declarations to be resolved to ssstttaaatttiiiccc objects that have file
scope and internal linkage. Therefore, we believe the above
construct should be made legal.
We note also that if the eeexxxttteeerrrnnn definition occurs in a
function and the ssstttaaatttiiiccc outside a function, we have a
different situation. For example, consider
- 7 -
University of Waterloo August, 1986
f()
{
extern int i;
...
}
static float i;
According to the second paragraph of the Semantics section
in 3.5.1, an eeexxxttteeerrrnnn definition inside a function refers to
an object that is defined somewhere with file scope. It
cannot refer to the ssstttaaatttiiiccc definition (because that comes
later), so it must refer to some definition with external
linkage. As soon as the ssstttaaatttiiiccc declaration is encountered
however, all subsequent references in the file refer to the
static variable. This is odd, to say the least.
_3._1._2._3 _N_a_m_e _S_p_a_c_e _o_f _I_d_e_n_t_i_f_i_e_r_s:
The Rationale states that the intention is to _p_e_r_m_i_t as
many separate name spaces as possible. In fact, we believe
it _r_e_q_u_i_r_e_s as many separate name spaces as possible.
The standard says that all tags (structure, union, and
enum tags) should be folded together. We don't see why this
is necessary. Distinguishing the different types of tags
will not break any existing programs, but folding them
together may break programs that were written for an
implementation that _d_i_d distinguish the different tags.
_3._1._2._5 _T_y_p_e_s:
An unsigned and signed integer take up the same amount
of memory. The standard should also state they have the
same alignment requirements. This assumption is true of all
machines we know, and allows simpler coding of portable
programs.
There is also the implication at various points in the
standard that an integral zero consists of all 0-bits. For
example, Footnote 71 (to "calloc") implies that every zero
except pointers and floating point types consists entirely
of 0-bits. Furthermore, the range of values available to
the unsigned type overlaps the range of non-negative values
for the signed type.
This argues that the document should state that all
values which can be represented by both signed and unsigned
integers (i.e. the non-negative integers that can be
represented by sssiiigggnnneeeddd iiinnnttt) have the same bit pattern. This
is true for all common representation schemes: one's comple-
ment, two's complement, and signed magnitude. We believe
- 8 -
University of Waterloo August, 1986
that the signed-unsigned algorithm stated in 3.2.1.2 tacitly
assumes that this equivalence is true. The equivalence also
legitimizes many of the bit operations that take place
inside existing C programs.
The explanation of pointer types should be considerably
expanded. Our reading of the standard shows several assump-
tions about various pointer types that are never stated
explicitly. We believe that users would understand the
language better if these assumptions were stated explicitly
in this section.
For example, footnote 36 to section 3.5.2.2 assumes
that the alignment and size of all pointers to struc-
ture/union types will be the same. We believe this assump-
tion is valid, but it should be stated explicitly in
3.1.2.5.
Similarly, if A is a pointer to type T, it should be
true that
(char *) (A + 1) == ((char *) A) + sizeof(T)
(If this were not true, "malloc" would be in serious
trouble.) This should be stated explicitly.
As another example, given that the alignment of signed
and unsigned integers is equal, and given that arrays are
made up of contiguous objects, a statement like the
following is true.
int *p;
(int *) ( (unsigned *)p + 10) == p+10
It would be helpful if the standard or the Rationale
actually pointed this out.
Also, we cannot find an explicit definition of the
phrase "pointer to object". We assume that it means a
pointer type which is not a pointer to a function or vvvoooiiiddd,
but we could not find such a definition.
_3._1._3._2 _I_n_t_e_g_e_r _C_o_n_s_t_a_n_t_s:
According to the standard, an unsuffixed octal or hex
integer constant can be interpreted as either signed or
unsigned. Certain constants will be interpreted as signed
on some machines and unsigned on others, because of
differences in machine word size. Due to the drastic effect
an unsigned operand may have (e.g. in a comparison opera-
tion), there must be some way to ensure that a number is
taken as signed. We suggest an "s" suffix.
- 9 -
University of Waterloo August, 1986
Since the sign is not part of the definition of a
constant, the "number" -32768 will be treated as a long
integer, even though it fits into 16 bits. This will
surprise programmers who use it as the smallest possible
short integer.
_3._2._2._1 _A_r_r_a_y_s, _f_u_n_c_t_i_o_n_s, _a_n_d _p_o_i_n_t_e_r_s:
The second paragraph of this section states
Except when used as an operand that may or shall
be a function locator, an identifier declared as
"function returning type" is converted to an
expression that has type "pointer to function
returning type".
The way we read this, it appears that we can say something
like
extern int f();
(*f)();
Since the "*" operator may not take a function locator, the
function locator is regarded as a pointer, and therefore the
"*" operator accepts it. By applying recursion, it then
seems legal to say
(**f)()
(***f)()
(****f)()
and so on.
The Rationale should point out that function pointers
cannot be cast into other pointer types, and that the only
thing that can be assigned to a function pointer is a
pointer of the same type or (vvvoooiiiddd *) 000.
We believe the document has a built-in assumption that
any pointer cast to (vvvoooiiiddd *) yields a unique value. (If
this assumption is not true, a function like "memcpy" could
not work.) We think this assumption should be stated
explicitly.
_3._3._2._2 _F_u_n_c_t_i_o_n _C_a_l_l_s:
The second paragraph of the Semantics section should be
changed to the following:
If the postfix expression preceding the
parentheses in a function call consists solely of
- 10 -
University of Waterloo August, 1986
an identifier, and if no declaration is in scope
for this identifier, the identifier is implicitly
declared exactly as if, in the innermost block
containing the function call, the declaration
extern int identifier;
appeared.
This prevents implicit declaration when the function call
has a form like
(f)() or
(*f)()
_3._3._3._2 _A_d_d_r_e_s_s _a_n_d _I_n_d_i_r_e_c_t_i_o_n _O_p_e_r_a_t_o_r_s:
Consider an array declared with
int A[10];
By 3.3.3.4, we have
sizeof(A) == sizeof(int) * 10
We also have
(char *)(A+1) == (char *)A + sizeof(int)
What is the value of
(char *)( (&A) + 1)
Is it
( (char *)A ) + sizeof(int)
or
( (char *)A ) + 10 * sizeof(int)
3.3.3.2 implies the second (i.e. that &A is a pointer to an
array of 10 ints), but does not state it precisely.
_3._3._4 _C_a_s_t _O_p_e_r_a_t_o_r_s:
This section says that a pointer to type ccchhhaaarrr has the
least strict alignment. It should also make some comment
saying that a pointer to vvvoooiiiddd is the most _g_e_n_e_r_a_l pointer
type, and therefore shares the least strict alignment with
ccchhhaaarrr.
- 11 -
University of Waterloo August, 1986
_3._3._6 _A_d_d_i_t_i_v_e _O_p_e_r_a_t_o_r_s:
A very close reading of this section indicates that
arithmetic with (vvvoooiiiddd *) pointers is illegal. However, the
point is very subtle and could easily be missed. We suggest
that it be emphasized. The same point should be made in
3.3.8 (on relational operators).
_3._3._1_5 _C_o_n_d_i_t_i_o_n_a_l _E_x_p_r_e_s_s_i_o_n:
The standard states that you can have expressions of
the form
i ? p : v
where "p" is a pointer type and "v" is a (vvvoooiiiddd *). The
result of this expression is said to be a (vvvoooiiiddd *).
It seems to us that this is the wrong way around.
Instead, the result of the expression should have the type
of the pointer "p". For example, consider
char *cp;
int *ip;
...
ip = cp ? cp : malloc(10);
Since the result of "malloc" is (vvvoooiiiddd *) the result of the
right hand side of the assignment will be (vvvoooiiiddd *). This
will be quietly assigned to "ip", even if the actual value
of the expression is "cp". To avoid such quiet problems,
the result should be the pointer type that is not (vvvoooiiiddd *).
_3._3._1_6._1 _S_i_m_p_l_e _A_s_s_i_g_n_m_e_n_t:
The standard must be more clear on assignments of
"pointers to functions". Suppose A and B are both pointers
to functions returning iiinnnttt but the functions have different
prototypes (or one function has a prototype and the other
doesn't). Is A=B legal? Guidelines for compatibility
between function pointers should be established. We believe
the guidelines should follow the rules for type equivalence
given in 3.5.5.
The standard says that assigning overlapping objects to
one another is undefined (and therefore illegal). While we
recognize that there are many instances when assigning
overlapping objects to one another cannot be done safely
(e.g. when objects are referenced with pointers), there are
some instances where we believe it is a mistake to say the
operation is illegal. In particular, many of our own C
programs use the operation
- 12 -
University of Waterloo August, 1986
union {
float f;
int i;
} u;
...
u.f = u.i;
According to the standard, this operation will become
illegal.
We might point out the odd effect that
u.f = (float) u.i;
would still seem to be legal, even if the assignment without
the cast is not. The cast operation presumably takes the
value of "u.i", converts it, and stores it in some temporary
storage, so assigning it to "u.f" causes no overlap. If
this really is intended, the standard or the Rationale
should comment on it.
The difference between ccchhhaaarrr, uuunnnsssiiigggnnneeeddd ccchhhaaarrr, and sssiiigggnnneeeddd
ccchhhaaarrr must be discussed. If a program declares
char *p;
unsigned char *u;
signed char *s;
is it possible to make assignments like
p = u;
u = p;
u = s;
s = u;
p = s;
s = p;
This question arises because ccchhhaaarrr may be signed in some
implementations and unsigned in others. As a result, some
of the above assignments will be valid on some machines but
not on others.
We suggest that the assignment rules be changed to
allow the (uncast) assignment of ccchhhaaarrr to uuunnnsssiiigggnnneeeddd ccchhhaaarrr and
vice versa. The same should apply to pointers to these
types.
Note that people writing portable programs will never
use the ccchhhaaarrr type; they will use sssiiigggnnneeeddd ccchhhaaarrr when they are
using the value arithmetically and uuunnnsssiiigggnnneeeddd ccchhhaaarrr when they
are using the character as a character. Using the plain
- 13 -
University of Waterloo August, 1986
ccchhhaaarrr type will be non-portable. However, this runs into
other problems. In particular, suppose someone writes
unsigned char a[] = "string";
unsigned char *cp;
...
cp = "abc";
These operations will work on a system where ccchhhaaarrr is
unsigned, but not if ccchhhaaarrr is signed. To make such opera-
tions possible, it must be possible to intermix plain ccchhhaaarrr
and uuunnnsssiiigggnnneeeddd ccchhhaaarrr types in the ways shown above.
_3._3._1_6._2 _C_o_m_p_o_u_n_d _A_s_s_i_g_n_m_e_n_t:
According to the standard, an operation like
int i;
i /= 3.5;
would be performed using floating point division. However,
the Berkeley C compiler uses integer division. For this
reason, this should be marked as a quite change.
_3._3._1_7 _C_o_m_m_a _O_p_e_r_a_t_o_r:
The standard states that the comma operator is a
sequence point, but it is not clear what point of the comma
operation is _t_h_e point. For example, consider the expres-
sion
A = ((B=1),B) + ((B=2),B) + ((B=3),B);
What should A equal? (We note that the Berkeley C compiler
assigns the value 9 to A in the expression above.) Does the
sequence point take place at the comma (i.e. when only the
left half of the expression has been evaluated) or does it
take place when both sides of the comma have been evaluated?
Are there actually two sequence points? The same sort of
problem obviously occurs with
func( (b=1,b) , (b=2,b) );
In fact, the question generalizes. Several operators (e.g.
"&&", "||") are said to be sequence points, when the opera-
tion actually has several "points" to it. The standard
should be more explicit, e.g.
There is a sequence point after the evaluation of
the left operand.
or
- 14 -
University of Waterloo August, 1986
There is a sequence point after the evaluation of
the result of the operator.
The second paragraph of 3.3 makes some effort to address
this problem, but it is too nebulous to be much help.
_3._4 _C_o_n_s_t_a_n_t _E_x_p_r_e_s_s_i_o_n_s:
We point out that if the "offsetof" macro is
implemented as suggested in the rationale, it will not be a
constant expression according to the rules of this section.
Since we like the suggested implementation, we suggest the
definition of constant expressions be modified. In
particular, the standard should say that the implementa-
tion's behavior is _u_n_d_e_f_i_n_e_d if a constant expression does
not comply with the given rules. This gives an implementa-
tion the freedom to support an expanded definition of
constant expressions if desired.
_3._5 _D_e_c_l_a_r_a_t_i_o_n_s:
For the sake of readability, it should not be legal to
enclose an entire declarator in parentheses, as in
int (x);
A function prototype containing such a declaration is very
deceptive. For example,
int f(int (x));
means that "f" has an integer parameter named "x"...unless
"x" happens to be the name of a type as declared in a
tttyyypppeeedddeeefff statement, in which case the argument of "f" is a
function that takes an argument of type "x" and returns an
integer. Confusion can be avoided if such extraneous
parentheses are not allowed.
We were surprised that the standard allows storage
class specifiers to be intermixed with type specifiers, as
in
const int extern long a;
We were even more surprised to discover that the Berkeley C
compiler already supports such constructs. We don't really
understand why it is necessary to support this sort of thing
-- we would be surprised if any existing programs make use
of it. An implementation that accepts this kind of code has
a good deal of trouble generating comprehensible error
messages, since it cannot be so rigid in its approach to
- 15 -
University of Waterloo August, 1986
parsing. _A_l_l programmers will receive poorer diagnostic
messages in the interests of catering to the very few
programmers who would want to ignore very well-established
code-writing conventions.
_3._5._1 _S_t_o_r_a_g_e-_C_l_a_s_s _S_p_e_c_i_f_i_e_r_s:
The semantic description of the rrreeegggiiisssttteeerrr storage class
should be reworded to the following:
A declaration with storage-class specifier
rrreeegggiiisssttteeerrr is an aaauuutttooo declaration with a suggestion
that the object will be frequently accessed, and
thus that the compiler should attempt to speed up
access to the object. One restriction applies to
an object declared with storage-class specifier
rrreeegggiiisssttteeerrr: the unary "&" (address-of) operator
must not be applied to it. Since the program
cannot legitimately generate a pointer to an
object with the storage-class specifier rrreeegggiiisssttteeerrr,
a frequently-used optimization is to keep the
object in fast storage which cannot be accessed
through a pointer, e.g. a hardware register.
By rephrasing the definition this way, you give the rrreeegggiiisssttteeerrr
storage-class more meaning. In particular, you open the
door to compilers that perform global optimizations using
the fact that rrreeegggiiisssttteeerrr variables can never have their values
changed by indirection through a pointer. The compiler can
optimize the use of rrreeegggiiisssttteeerrr variables because it can always
know when the register values are used and changed.
As currently defined in the standard, rrreeegggiiisssttteeerrr is an
all-or-nothing optimization. We feel that machines which
can't give "all" (due to a shortage of registers) shouldn't
be forced to give "nothing".
The standard might also make some statement on what
implementations should do if there are several rrreeegggiiisssttteeerrr
declarations and only some of these can be used for
optimizations. We propose that the standard say that the
declarations which come lexically first will be optimized
first. This gives a programmer some way of indicating
preference of optimization.
_3._5._2._1 _S_t_r_u_c_t_u_r_e _a_n_d _U_n_i_o_n _S_p_e_c_i_f_i_e_r_s:
We suggest that the definition for "struct-declaration"
be changed to
- 16 -
University of Waterloo August, 1986
struct-declaration:
type-specifier-list struct-declarator-list;
struct-or-union-specifier;
The added possibility lets you define an unnamed element of
this type. The sub-elements will appear as first-level ele-
ments in the enclosing structure. Using this scheme, the
example in 3.3.2.3 could become
struct {
int type;
union {
int intnode;
double doublenode;
};
} u;
/* ... */
u.type = 1;
u.doublenode = 3.14;
/* ... */
if (u.type == 1)
/* ... */ sin(u.doublenode) /* ... */
_3._5._2._2 _S_t_r_u_c_t_u_r_e _a_n_d _U_n_i_o_n _T_a_g_s:
The form
struct y;
now has a special meaning. Suppose we define
typedef struct y z;
Does the code
z;
have the same effect as
struct y;
We note that you can use pointers to structures without
having to define the structure itself. Do you ever have to
define a structure's contents in a particular source file?
_3._5._2._3 _E_n_u_m_e_r_a_t_i_o_n _T_y_p_e_s:
If we have
- 17 -
University of Waterloo August, 1986
enum E1 { e1 } var;
enum E2 { e2 };
is it legal to say
var = e2;
The answer is almost certainly yes...but we would be happy
if we were allowed to give a warning or an error message for
the operation, if there is no explicit cast. Similarly, we
would like to give a warning for things like
var = e2 + 1;
_3._5._2._4 _c_o_n_s_t _a_n_d _v_o_l_a_t_i_l_e:
According to our reading of the standard, the following
code is illegal.
f1() {
extern const x;
...
}
f2() {
extern x;
...
}
int x;
On the other hand, it would be very convenient if one func-
tion could declare an object cccooonnnsssttt while another did not.
This would let a function indicate when it did not intend to
change the value of an external object, and thereby allow
local optimizations. The actual definition of the object
would establish whether or not the object really was cccooonnnsssttt
(and therefore suitable for allocation in read-only memory).
The same principle would hold for vvvooolllaaatttiiillleee.
_3._5._3._3 _F_u_n_c_t_i_o_n _D_e_c_l_a_r_a_t_o_r_s:
The last sentence of the Semantics section reads
If the list is empty in a function declaration
that is part of a function definition, the func-
tion has no parameters.
What does this say about a function definition like
- 18 -
University of Waterloo August, 1986
int (*F(int a))() {...
Since the empty identifier list appears as part of a func-
tion definition, the function pointed to by F's return value
takes no arguments. This rules out returning a pointer to
an arbitrary integer function.
_3._5._5 _T_y_p_e _D_e_f_i_n_i_t_i_o_n_s _a_n_d _T_y_p_e _E_q_u_i_v_a_l_e_n_c_e:
The standard should discuss structs that have the same
tag but different internal structures.
We also have some questions about the situation where a
tttyyypppeeedddeeefff declares a named type with the same name as a
variable defined in an enclosing scope. Inside the scope of
the named type, is the variable completely invisible? Or
can the variable be visible in contexts where the compiler
can clearly determine that the named type is not valid?
As another questionable construction, consider the code
typedef int X;
typedef X *Y;
f(void)
{
typedef char X;
Y b;
The definition of X inside the function clearly supercedes
the external definition of X. However, it is not clear if
"b" is a pointer to an integer (using the definition of X at
the time Y was defined) or a pointer to a character (as X
was defined at the time "b" was declared).
An even more subtle situation is
struct X { /* definition 1 */ };
typedef struct X *Y;
...
f(void)
{
struct X { /* new definition */ };
Y Z;
...
Is Z a pointer to the old X structure or the new one? We
believe that most people would expect Z to be a pointer to
the old X structure. However, a strict reading of the
definition of tttyyypppeeedddeeefff suggests otherwise. The standard says
that a typedef type is not a new type; it is a name for a
type that could be defined in another way. Since a pointer
- 19 -
University of Waterloo August, 1986
to the old X structure could _n_o_t be defined in another way
after the declaration of the new X structure, the typedef
type Y would have to refer to the new structure. To avoid
such hair-splitting, the standard should state precisely
what happens in such a case.
_3._5._6 _I_n_i_t_i_a_l_i_z_a_t_i_o_n:
Is the following initialization legal?
int f(int a)
{
const int b = a*2;
Consider
struct X {
int a,b;
};
f() {
struct X Z;
int junk = (Z.a=1,Z.b=2,7);
int more_declarations;
Is this allowed? Can we initialize an aaauuutttooo structure in
this way? Can we use the side effects in an initializer to
initialize another object? (We note that the designers of
the standard ruled out the use of non-constant expressions
to initialize auto aggregates precisely because of the
problem of side effects. The above example shows that side
effects are still possible.)
Can auto initializers make use of external variables
with the same name as the symbol being initialized? For
example, is the following valid?
int i = 1;
f() {
char i = i * 2;
...
This sort of construction is allowed and used in Berkeley C
code.
It appears that the standard says that the following is
legal.
int i = {{{{{10}}}}};
- 20 -
University of Waterloo August, 1986
Is this really intended?
The paragraph beginning at line 542 (about initializa-
tion of subaggregates inside aggregates) is very confusing.
At the very least, it should be reworded to be more clear.
We also believe that it might not say what you mean it to
say, but it's too hard to construe for us to be sure. For
example, how is the following interpreted?
int a[4][5][6] =
{
{ 1, 2 },
{ 3, 4, 5 },
{ 6, 7, 8, 9 }
};
_3._6._4._2 _T_h_e _s_w_i_t_c_h _S_t_a_t_e_m_e_n_t:
The Rationale says that ranges in case labels were
rejected because many current compilers would generate
excessive amounts of code. This does not seem to be a good
reason for rejecting something that could be quite useful.
Making a compiler generate the equivalent iiifff code for a
switch range is trivial compared with (for example)
requiring both signed and unsigned characters. This is
indeed a minor extension.
It is our belief that a compiler is usually able to
produce better code for case ranges than a programmer trying
to do it by hand using iiifff statements. Therefore efficiency
is actually improved by supporting case ranges.
If you do not want to sanctify case ranges as part of
the standard, the committee should still recognize that case
ranges are likely to be common extensions to the language
and should be listed in Section 5.6.4. More to the point,
the committee should develop some syntax for case labels so
that implementations that want to offer the extension can do
so in a consistent way.
The ".." notation mentioned in the Rationale is not
acceptable because of the tokenizing rules: 1..3 will be
interpreted as the two floating point numbers 1.0 and 0.3.
We would suggest using the tilde as the case range
separator, as in
case 1~10: ...
- 21 -
University of Waterloo August, 1986
This does not introduce a new operator and yet it is easy to
parse because tilde has no binary meaning. It also looks
good (i.e. the visual appearance suggests its meaning).
To improve ssswwwiiitttccchhh statements even more, we would
recommend provisions for "open-ended" case ranges as well,
of the form
case >n:
case >=n:
case <n:
case <=n:
These avoid the example
case 0..65535:
given in the Rationale, since the case just becomes
case >=0:
Moreover, _t_h_i_s form is completely portable, since it's
independent of the number of bits in the switch variable.
_3._7._1 _F_u_n_c_t_i_o_n _D_e_f_i_n_i_t_i_o_n_s:
The standard seems to allow extraneous declarations in
a function heading, as in
f(a,b,c)
int a;
typedef struct X ...;
int b;
int c;
{ ...
Was this the intention? It strikes us as a poor idea.
Also note that the UNIX C compiler currently allows the
form
int (*f())(a,b,c) {...
in function definitions, but the standard will require
int (*f(a,b,c))() {...
We believe this is a quiet change.
- 22 -
University of Waterloo August, 1986
_3._8._1 _C_o_n_d_i_t_i_o_n_a_l _I_n_c_l_u_s_i_o_n:
The directive #elif should be renamed to the more
mnemonic #elseif. As an alternative, the compiler might
recognize the following.
#else if
#else ifdef
#else ifndef
If an undefined identifier appears in an #if expres-
sion, _a_n_d _i_f _i_t_s _v_a_l_u_e _w_a_s _n_e_e_d_e_d, an error should be given.
Thus if A is not defined,
#if A
gives an error. However, if B is defined and non-zero
#if (B||A)
does not give an error, because the value of A is
irrelevant.
Note that the confusion of an undefined symbol meaning
"zero" does not arise from something simple like
#if UNDEF_SYMBOL
but from code like
#define X (5*y)
int y = 0;
...
printf("%d ",X);
#if X
printf("is non-zero");
#else
printf("is zero");
#endif
In the #iiifff directive, the X is replaced with "(5*y)". The
preprocessor then checks to see if "y" is a #defined symbol.
It isn't so, it turns into zero and the final result of the
#iiifff condition is zero, even though X itself was defined. If
the definition of X is changed to some different expression
(e.g. a simple constant), the #iiifff condition suddenly becomes
true.
- 23 -
University of Waterloo August, 1986
If the user really wants to assume "undefined" means
zero, he or she should write
#if defined(A)&&A
or
#ifndef A
#define A 0
#endif
_3._8._3 _M_a_c_r_o _R_e_p_l_a_c_e_m_e_n_t:
What happens if a macro with parameters is invoked with
the wrong number of parameters or no parameters? Is it an
error, or is the text preserved?
The fourth paragraph on page 80 (lines 7 through 12) is
very difficult to understand. An example would certainly
help clarify what it is trying to say.
_S_e_c_t_i_o_n _4: _G_e_n_e_r_a_l _N_o_t_e_s:
It should be stated that if a program mixes macro and
non-macro invocations of the same library functions, the
results are unpredictable. For example, characters written
with the "putchar" macro and the "putchar" function
intermixed may come out in the wrong order (or perhaps won't
come out at all).
Many functions return pointers to values that are
created by the system. For example, "strerror" returns a
pointer to a string that the system sets up. Are such
values placed in static storage areas or in memory that has
been dynamically allocated (by "malloc")? The answer to
this question must be stated exactly in each case to make
for uniformity across systems. The difference is important,
since static storage makes a function dangerous to use in
exception handlers. In addition, storage allocated through
"malloc" can be freed if it is no longer needed, while
static storage cannot be.
_4._1._1 _T_e_r_m_s _a_n_d _C_o_m_m_o_n _D_e_f_i_n_i_t_i_o_n_s:
The file <stddef.h> should be required for stand-alone
operation. However, it should not mention the "errno"
value. All the other contents of <stddef.h> are
characteristics of the hardware and the implementation. The
"errno" value is related to the library and should have its
own header <errno.h>.
- 24 -
University of Waterloo August, 1986
The standard implies that headers which need symbol
definitions that are "officially" in other headers will
redefine the symbols. For example, <stdlib.h> needs to use
"size_t" in function prototypes, so it will include its own
definition of "size_t". We feel that it makes more sense
for <stdlib.h> to explicitly #include the header that
defines "size_t" rather than giving its own definition.
Multiple definitions of the same symbol always mean trouble.
A similar problem is raised with the functions "strtod"
and "strtol". Their definition implies that including
<stdlib.h> is all you have to do to use the functions.
However, the user may also need to use the symbols HUGE_VAL,
ERANGE, LONG_MAX, LONG_MIN, and "errno". Should the
<stdlib.h> file make these available (by defining the values
directly or including the appropriate header files) or
should the user have to include the appropriate headers
explicitly? The standard should answer this question.
_4._1._2 _H_e_a_d_e_r_s:
The first paragraph contains the sentence
If the program redefines a reserved external
identifier, even with a semantically equivalent
form, the behavior is implementation-defined.
The term "implementation-defined" should be changed to
"undefined". By definition, "implementation-defined" refers
to behavior of a correct program construct. We believe it
is too broad-sweeping to say that redefinition of a reserved
external identifier should always be allowed; therefore,
"undefined" is the better term, giving implementations the
choice to accept or not accept the construct. Also,
"implementation-defined" implies that the implementation
must document how it behaves. The ramifications of
redefining a library symbol may be too unpredictable to
document.
_4._3._1._9 _T_h_e _i_s_s_p_a_c_e _F_u_n_c_t_i_o_n:
The "isspace" function should also test for the line-
feed character if it is not identical with the new-line.
_4._5._4._6 _T_h_e _m_o_d_f _F_u_n_c_t_i_o_n:
We feel "modf" should behave in the same way as float
to integer conversions. This means that "*iptr" should have
the same value as
- 25 -
University of Waterloo August, 1986
(double)(long) value
when this operation does not cause an overflow. This
definition is more consistent in the (-1,0) range than the
definition proposed in the standard. Even when the
"(double)(long)" conversion would cause an overflow, "modf"
should still behave as if it is performing this sort of
conversion, in the interests of consistency.
If the integer part of "value" is exactly equal to the
most negative long integer, a problem arises. The
"(double)(long)" approach is likely to give one lower than
the most negative integer. The "modf" code should recognize
this problem and issue an EDOM error in such cases.
_4._5._6._5 _T_h_e _f_m_o_d _F_u_n_c_t_i_o_n:
"fmod" should follow the same principle as "modf". In
the expression
x == i*y + f
the sign of "f" should be such that
i == (long) (x/y)
Alternatively, you might declare that "f" is always posi-
tive. Either alternative is better than declaring that "f"
has the same sign as "x".
_4._7 _S_i_g_n_a_l _H_a_n_d_l_i_n_g:
Does the SIGABRT signal catch other abnormal termina-
tions besides one raised by "raise" or "abort"? We believe
it should not.
_4._7._2._1 _T_h_e _r_a_i_s_e _F_u_n_c_t_i_o_n:
Must "raise" be able to generate _e_v_e_r_y valid signal, or
is the implementation allowed to restrict the sort of
signals that "raise" can send? Is it allowed to issue more
than the standard signals?
_4._8._1 _V_a_r_i_a_b_l_e _A_r_g_u_m_e_n_t _L_i_s_t _A_c_c_e_s_s _M_a_c_r_o_s:
The standard does not explain why these routines should
be implemented as macros. We realize that the reason is
that the parameters aren't necessarily expressions, but the
standard should say this; otherwise, it just sounds like a
petty rule.
- 26 -
University of Waterloo August, 1986
_4._9._1 _I/_O _I_n_t_r_o_d_u_c_t_i_o_n:
What happens if the BUFSIZ default value depends on the
type of device that is connected to the I/O stream? Making
this a fixed constant may be inadvisable.
_4._9._2 _S_t_r_e_a_m_s:
The sentence beginning at line 59 should read
Data read in from a text stream will not
necessarily compare equal to the data that were
earlier written out to that stream, unless the
data consist only of complete _n_o_n-_n_u_l_l lines, _w_i_t_h
_n_o _t_r_a_i_l_i_n_g _b_l_a_n_k_s, and composed only of printable
characters and the control characters horizontal
tab, new-line, vertical tab, and form feed.
Also, we do not know why the backspace was excluded
from the set of characters that could be safely written and
read on a text stream.
The committee obviously believes that binary files will
map into some machine-dependent idea of what a binary file
is. This is not necessarily so. For example, it is not
obvious how to map the binary file concept into a record-
based file system. Such systems can have random access to
records, but if records do not have a fixed length, there is
no simple relationship between the UNIX concept of random
access and the file system's.
The committee says that the contents of a binary file
stream will be exactly what is written with an
implementation-defined number of NUL characters appended.
This is a curious change on existing UNIX file system
concepts. One of the most important principles of binary
file streams on UNIX is that you can write a file, then read
it and get back _e_x_a_c_t_l_y what was written. The addition of
extra NUL characters violates this principle.
Evidently, the designers allowed the extra NUL
characters in order to accommodate systems that might need
to pad files out to a certain length. However, it is not
clear that the freedom to add NUL characters is sufficient
to satisfy arbitrary file system requirements. The file
system may be just as upset at extra NUL characters as it
would be with data that was not padded to some appropriate
boundary. For this reason, we feel that the standard should
simply state that reading from a binary file stream gives
precisely what was written to the file stream, and leave it
up to the implementation to figure out how to provide such a
service.
- 27 -
University of Waterloo August, 1986
It is not the business of a portable standard to
describe how to perform non-portable operations. In
particular, we believe it is a mistake to encourage the use
of binary streams when creating files in system-specific
formats. A program that builds formatted files in a byte-
by-byte manner will certainly not be portable to systems
that use different file formats. If someone does try to
port such a program, it is better for the program to fail in
a very obvious way than to write out a distorted version of
some other system's file format. If an implementation
believes users will need to create certain kinds of system-
specific files, the implementation should provide its own
routines to accomplish such tasks.
_4._9._6._1 _T_h_e _f_p_r_i_n_t_f _F_u_n_c_t_i_o_n:
The description of the "%f" specifier says that the
output should have six decimal places (if there is no preci-
sion field) and that the number should be widened to the
appropriate number of digits. Since the IEEE floating point
standards indicate that floating point numbers may be as
great as 10**308, the standard may result in widening a
floating point number to as many as 314 (308+6) digits. We
recommend that implementations be allowed to use scientific
notation ("%e" format) in cases where the other approach
would widen the value beyond the maximum possible number of
significant digits. This would probably require the defini-
tion of a macro in <float.h> to indicate the maximum number
of significant digits.
The standard explicitly states that the "#" qualifier
has no effect on "%s". We see no reason why this is
necessary. In fact, we believe that a natural interpreta-
tion of "%#s" would be to print out a string using escape
sequences for non-printable characters. While this behavior
need not be required by the standard, we don't see why it
should be explicitly ruled out when it would clearly be a
useful facility. The same point applies to "%#c". All
things being considered, it would be easier to say that the
use of "#" in "%c", "%d", "%i", "%s", and "%u" is
implementation-defined.
The Environmental Limit section reads
The minimum value for the number of characters
produced by any single conversion shall be at
least 509.
Obviously, what you really mean is
Implementations may place a maximum on the number
- 28 -
University of Waterloo August, 1986
of characters produced by any single conversion,
but this maximum cannot be less than 509.
It seems perverse that lllooonnnggg dddooouuubbbllleee conversion
specifiers must use an upper case 'L' while lllooonnnggg ones must
use lower case. It is more sensible to allow either upper
or lower case in both instances.
_4._9._6._2 _T_h_e _f_s_c_a_n_f _F_u_n_c_t_i_o_n:
The last sentence of the first paragraph seems
redundant. The excess arguments will obviously be evaluated
before they are passed to "fscanf". What you mean to say is
that no error occurs if too many arguments are specified,
but the excess arguments are ignored.
It seems odd that "fscanf" returns EOF if input items
cannot be read. EOF is conceptually a special character
value (though of course, it is an integer). Since "fscanf"
returns an integer in all other cases, it would make more
sense for "fscanf" to return -1.
_4._9._6._7-_9 _v_f_p_r_i_n_t_f, _v_p_r_i_n_t_f, _v_s_p_r_i_n_t_f:
The Rationale states that a format for variable-length
argument lists was rejected because the functions
"vfprintf", etc. were "more controlled". This comment
confuses us, because we don't understand what "more
controlled" means. Very clearly, the "vfprintf" approach
offers less freedom and therefore is less useful.
We suggest that "printf" and friends obtain a new
specifier "%v", which accepts two arguments: a new format
string and a "va_list" of items to format. This is similar
to the existing "%r" construct on UNIX systems.
Given the "%v" specifier, writing functions to perform
the work of "vprintf" and friends is trivial. However, the
opposite is _n_o_t true -- "vprintf" and friends have
significant difficulty in simulating many of the results
that are possible with "%v".
The "%v" approach is simply faster, more readable, and
more versatile than using "vprintf" and friends. For
example, a call to "printf" could take several normal argu-
ments, followed by a "va_list" argument pointing to a
variable list, followed by more normal arguments. This
avoids the problem of having to make three calls, one for
the normal arguments, one for the variable list, and one for
the remaining normal arguments.
- 29 -
University of Waterloo August, 1986
_4._9._1_0._2 _T_h_e _f_e_o_f _F_u_n_c_t_i_o_n:
The semantics of the EOF "indicator" are based on the
UNIX stream I/O implementation. Not all systems treat end-
of-file in this manner, so we suggest adopting the following
simple and consistent rule:
"feof" should return TRUE if and only if the next
"getchar" will return EOF and the most recent
"getchar" also returned EOF.
(The second part of the provision is needed to avoid
Pascal's problem of having to read ahead.)
Thus "fseek" should _n_o_t clear the EOF indicator;
instead, it should re-evaluate it. After a call like
ungetc(non_EOF_character);
"feof" should return FALSE.
If a program reaches end-of-file, then another program
grows the file, it should be possible to continue reading
without explicitly clearing the EOF indicator.
_4._1_0._1._4 _T_h_e _s_t_r_t_o_d _F_u_n_c_t_i_o_n:
What do "strtod" and related functions assign to
"*endptr" if there is a range error?
_4._1_0._3 _M_e_m_o_r_y _M_a_n_a_g_e_m_e_n_t _F_u_n_c_t_i_o_n_s:
The standard states that pointer values returned by
"malloc" et al may be assigned to a pointer to any type of
object, then used to access such an object in the space
allocated. We suggest that this be changed to read "may be
assigned to a pointer to any type of object _w_h_o_s_e _s_i_z_e _i_s
_l_e_s_s _t_h_a_n _t_h_e _a_m_o_u_n_t _o_f _m_e_m_o_r_y _r_e_q_u_e_s_t_e_d". This allows
greater efficiency of memory allocation, especially on
machines that have a high alignment requirement for some
data types. For example, some machines require 32-byte
alignment for their highest precision floating point, but it
is silly to hand out memory in 32 byte chunks when the user
only requests a few bytes.
It would also be useful to have a library func-
tion/macro similar to "malloc" that would take both a length
and an alignment as arguments. This would allow for finer
allocation of memory, to shorter alignment boundaries.
- 30 -
University of Waterloo August, 1986
In order to make such a function/macro useful in
portable programs, an aaallliiigggnnnooofff operator would be very
convenient. This operator would behave in much the same way
as sssiiizzzeeeooofff: it would return an integral value indicating the
alignment of a type or object. For example, if a machine
has words containing four bytes and a particular type must
be aligned on a word boundary, the result of aaallliiigggnnnooofff would
be 4 (indicating four-byte alignment). The actual type of
the result of aaallliiigggnnnooofff would be implementation-defined like
"size_t".
Note that aaallliiigggnnnooofff would allow programs to write their
own efficient portable memory allocators. Memory could be
"nibbled" away in alignments suitable to whatever data
object needed the storage. It would not be necessary to get
the largest possible alignment for _e_v_e_r_y object.
_4._1_0._4._3 _T_h_e _g_e_t_e_n_v _F_u_n_c_t_i_o_n:
The description of this function should read as
follows.
The "getenv" function searches an _e_n_v_i_r_o_n_m_e_n_t
_l_i_s_t, provided by the host environment, for an
entry identified by the string pointed to by
"name". The set of environment names and the
method for altering the environment list are
implementation-defined.
The "getenv" function returns a pointer to a
string containing the value associated with the
given name.
Our point is that the
name=value
format is strictly a UNIX concept and need not be grafted
onto other techniques for handling environment variables.
The standard should decide whether the returned value
is stored in a static storage area or in storage obtained
through "malloc".
_4._1_0._4._4 _T_h_e _o_n_e_x_i_t _F_u_n_c_t_i_o_n:
Why isn't the "onexit" defined as
int onexit(void (*f)(void));
- 31 -
University of Waterloo August, 1986
This simplifies the definition considerably.
_4._1_0._4._5 _T_h_e _s_y_s_t_e_m _F_u_n_c_t_i_o_n:
The explanation of "system" should be expanded to make
it more clear that passing a null pointer is a query about
the existence of a command processor.
_4._1_0._6._2 _T_h_e _d_i_v _F_u_n_c_t_i_o_n:
We certainly recognize the need to implement a well-
specified integer division and remainder operation, but we
do not believe the given "div" function suits the need.
First, "div" is an inappropriate name for a function
that performs both a division and a remainder operation. In
fact, we believe that the function should _n_o_t perform both
operations. Instead, you should have
int _div(int numer,int denom);
int _rem(int numer,int denom);
This approach has several advantages.
(a) You do not have the overhead of calculating the
remainder when you want the quotient, and vice versa.
While it is true that many machines generate a quotient
and remainder simultaneously, this practice is far from
universal. VAX machines, for example, can only perform
division. To calculate A%B, the machine must make the
calculation A-(B*(A/B)). It is expensive to calculate
this number when it may not even be needed.
(b) On some machines, the two functions could be
implemented as macros. With a single function
returning a structure, macros could never be used, even
if the hardware did the division and remainder opera-
tions in the prescribed manner.
We also note that the operation prescribed by the
standard's "div" function is the less useful of the two
alternatives. In our experience, the operation that you
usually want to perform is the one that always gives a posi-
tive remainder. For example, it is much more common to want
(-2)/3 to have a quotient of -1 and a remainder of +1 than
to have a quotient of 0 and a remainder of -1. You almost
always want to move negative quotients towards negative
infinity, not towards zero.
- 32 -
University of Waterloo August, 1986
_4._1_1._3._2 _T_h_e _s_t_r_n_c_a_t _F_u_n_c_t_i_o_n:
It seems odd that "strncat" always adds a trailing '\0'
but "strncpy" does not.
_4._1_1._4 _C_o_m_p_a_r_i_s_o_n _F_u_n_c_t_i_o_n_s:
In the interests of portability, we believe that
character comparisons for "memcmp", "strcmp", and "strncmp"
should be made using uuunnnsssiiigggnnneeeddd ccchhhaaarrr instead of the
implementation-defined approach specified in the standard.
_4._1_1._5._6 _T_h_e _s_t_r_s_p_n _F_u_n_c_t_i_o_n:
For greater uniformity, the name of this function
should be changed to "strpspn". This emphasizes the way it
parallels "strpbrk".
_4._1_1._6._2 _T_h_e _s_t_r_e_r_r_o_r _F_u_n_c_t_i_o_n:
The standard should be more explicit about the connec-
tion between the "errnum" argument for "strerror" and the
possible values of "errno".
_4._1_2._1 _C_o_m_p_o_n_e_n_t_s _o_f _T_i_m_e:
Again, we wonder why vowels have fallen into disrepute.
CLK_TCK could easily be named _CLK_TICK or _CLOCK_TICK.
It should be explicitly stated that values of type
"time_t" may not represent time in meaningful units and may
not even give values that are uniformly distributed.
_4._1_2._2._1 _T_h_e _c_l_o_c_k _F_u_n_c_t_i_o_n:
"clock" is a poor name for a function that returns
processor time. A name like "processor_time" would be
better.
The description of "clock" says it returns processor
time used since some point in time related only to program
invocation. We believe that it should instead return
processor time accumulated since some previous point in
time, e.g. the time when the user logged on. To time a
particular program, the user would make two calls to
"clock": one at the beginning of execution and one at the
end (or whenever a time check is required).
- 33 -
University of Waterloo August, 1986
The reason for our suggestion is that many non-UNIX
systems have no system call to get per-process timings.
Instead, many just keep track of total session time. If
implementations are forced to support "clock" as it is now
described, many implementations will have to put "time
check" code into the set-up routine for every C program.
This seems very inefficient, especially because "clock" is
not the sort of function that will be used frequently.
If a program calls another process using the "system"
function, it may be more efficient on some systems for the
processor time of the child process to be included in the
parent's time, while on other systems it is more efficient
not to include the child's CPU time. Thus, this behavior
should be implementation-defined.
_4._1_2._2._4 _T_h_e _t_i_m_e _F_u_n_c_t_i_o_n:
The standard states that "time" returns
((time_t)-1)
if the current time is not available. However, -1 may well
be a valid time value on many systems.
If you are going to select a reserved value
arbitrarily, choosing 0 makes more sense, since it allows
tests of the form
if (time(p)) ...
A better solution would be to create a macro named
_TIME_UNAVAILABLE with
#define _TIME_UNAVAILABLE ( (time_t) X )
where X is some implementation-defined value. "time" would
return this value if the time was undefined.
_4._1_2._3 _T_i_m_e _M_a_n_i_p_u_l_a_t_i_o_n _F_u_n_c_t_i_o_n_s:
It has always been a nuisance to get the current time
of day in string format because you must declare your own
variable of type "time_t". The library needs a function
that behaves like "ctime" but which is declared with
char *timefunc(time_t timer);
We could then use
- 34 -
University of Waterloo August, 1986
timefunc( time( (time_t) 0 ) )
to get the current time-of-day string.
_S_u_m_m_a_r_y:
In order to avoid a deluge of reserved words, all newly
introduced symbols should follow a simple rule, e.g.
beginning with an underscore. Ambiguities in the defini-
tions of structures, unions, and tttyyypppeeedddeeefff constructs should
be clarified or eliminated.
If you have any questions or comments about any of the
material in this document, please contact Peter Fraser,
manager of the Software Development Group, at (519)
888-4546.
- 35 -
More information about the Comp.lang.c
mailing list