Comments on proposed C standard

Jim Gardner jagardner at watmath.UUCP
Fri Aug 22 04:14:22 AEST 1986


The following (huge) document comments on the latest proposal
for a C standard.  It is paginated, but does not contain tabs.





                        COMMENTS ON
                THE DRAFT PROPOSED STANDARD
                   (Dated July 21, 1986)


                      - prepared by -

               The Software Development Group
                   University of Waterloo
                     Waterloo, Ontario

     Our  comments  are based on the _D_r_a_f_t _P_r_o_p_o_s_e_d _A_m_e_r_i_c_a_n
_N_a_t_i_o_n_a_l _S_t_a_n_d_a_r_d _f_o_r  _I_n_f_o_r_m_a_t_i_o_n  _S_y_s_t_e_m_s  --  _P_r_o_g_r_a_m_m_i_n_g
_L_a_n_g_u_a_g_e  _C,  Doc.No. X3J11/86-104, dated July 21, 1986.  In
addition, we make a number of comments on the Rationale  for
the standard, Doc.No. X3J11/86-099, dated July 7, 1986.

     This  document supersedes previous submissions from the
Software Development Group, which were submitted in  comment
on previous drafts of the standard.

     Generally,  we  will use the same order of presentation
as the standard itself.  Our section headings correspond  to
the  appropriate sections in the standard.  However, we will
begin with some general observations.

_G_e_n_e_r_a_l _N_o_t_e _1: _R_e_s_e_r_v_e_d _W_o_r_d_s:

     The standard defines  many  new  symbols,  particularly
#defined  names  in  header  files.   These  are effectively
reserved words, since programs  that  use  the  symbols  for
other  things  will  get  into  trouble sooner or later.  We
count a total of 255 effectively reserved words:

           32 language keywords
           44 implementation limits
          179 library-related names

In contrast, Cobol only has 227 reserved words!

     To avoid a  jungle  of  symbols  that  are  effectively
reserved,  we strongly urge that the committee follow one of
its own principles: symbols that begin  with  an  underscore
are  not  for  the  programmer's use.  This is a simple rule
that gets around most of the pitfalls.  _A_l_l _n_e_w_l_y _i_n_t_r_o_d_u_c_e_d
_s_y_m_b_o_l_s  _s_h_o_u_l_d  _b_e_g_i_n  _w_i_t_h _a_n _u_n_d_e_r_s_c_o_r_e.  This means that
the symbols in <limits.h> should be

          _CHAR_BIT
          _CHAR_MAX
          _SCHAR_MAX



                           - 1 -


University of Waterloo                          August, 1986


          /* etc. */

The same holds for other new symbols: "size_t" should become
"_size_t",  "ptrdiff_t"  should  become "_ptrdiff_t", and so
on.  Of course, the old stand-bys like NULL and "errno" will
stay as they are, even if we might wish differently.

     Note  that  we  would  recommend  against reserving the
prefixes tttooo_, SSSIIIGGG, ssstttrrr, mmmeeemmm, and  iiisss.   This  sort  of  rule
would  prevent implementations from supporting common opera-
tions that aren't in the standard.  For example, it actively
rules  out  "isascii"  and  "isodigit"  since  these are not
recognized by the standard.  This will break a great deal of
code.  Besides, the fewer "reserved word" rules a programmer
has to remember, the better.

_G_e_n_e_r_a_l _N_o_t_e _2: _P_o_r_t_a_b_i_l_i_t_y:

     In Section 1.2, the designers state the principle  MMMaaakkkeee
iiittt fffaaasssttt, eeevvveeennn iiifff iiittt iiisss nnnooottt ggguuuaaarrraaannnttteeeeeeddd tttooo bbbeee pppooorrrtttaaabbbllleee.  We do
not argue with this principle in general, but  we  think  it
should  be  counterbalanced  by  other considerations.  When
there are only a  few  popular  alternative  behaviors,  the
standard   should   provide  both  a  _f_a_s_t  operation  (with
implementation-defined behavior) and one  or  more  possibly
slower operations with well-defined behavior.

     The  whole  point  of  a  language standard is to allow
program portability.  The standard should ensure that  there
is  _s_o_m_e  way  to  write  a  portable program.  For example,
consider the ">>" operator.  The standard does not  indicate
whether  ">>"  shifts  arithmetically  (propagating the sign
bit) or logically (inserting zeros).

     What this effectively states is that the  operation  is
only  defined  on a subset of the possible range of operands
(i.e. when the operand to be shifted is  positive).   It  is
not  rigorously  defined  outside the range, but implementa-
tions are expected to  support  the  operation  outside  the
range.   This  sort  of situation crops up in many places in
the standard.

     In  order  to  make  it  possible  to  write   portable
programs,  we  suggest  that the standard should provide new
extended-range operators corresponding to each limited-range
operation.   The  standard  can  allow  ">>"  to  work in an
implementation-defined way outside of its defined range, but
it should define additional functions, macros, or operations
to handle the full range of operands.





                           - 2 -


University of Waterloo                          August, 1986


     For example, you might have _ARITH_RS(A,B) which  works
like "A>>B" when A is positive, but which always performs an
arithmetic shift when B  is  negative.   If  the  programmer
wants to be sure of a _l_o_g_i_c_a_l right shift, the operand to be
shifted  can  be  cast  to  uuunnnsssiiigggnnneeeddd.   In  this  way,   the
programmer  could  always  dictate  whether an arithmetic or
logical shift was desired.

     A second way to make portable programs possible  is  to
make  appropriate  definitions  in  <stdefs.h> or some other
header.  In most cases  where  behavior  is  implementation-
defined,  there  are a limited number of possibilities.  For
example,   an   implementation   could   define   a   symbol
_ARITH_SHIFT   to  indicate  that  right  shifts  were  done
arithmetically and _LOG_SHIFT to indicate that right  shifts
were  done  logically.   With appropriate #iiifffdddeeefff directives,
source code could be adapted to either possibility.  Similar
symbols  would  tell how "A%B" works when B is negative, how
integer division worked in the same situation,  and  so  on.
Note  that this technique adds _n_o extra expense at execution
time to determine how an implementation behaves.

_G_e_n_e_r_a_l _N_o_t_e _3: _T_h_e _9_0% _R_u_l_e:

     The portability of  a  program  is  influenced  by  two
factors:  how  it  uses  C code, and how it uses the library
functions.  If a program is ported from system A  to  system
B,  the implementation on B will usually report places where
code is used incorrectly.   However,  it  usually  will  not
report  situations  where  B's  implementation  of a library
function  differs  significantly  from  A's  implementation.
Thus, compatibility of library functions is of major concern
in porting programs, and therefore in design of  a  standard
for writing portable programs.

     Our  philosophy  is  that a function on system A should
not have the same name as a function on system B, unless the
A  function  is at least 90% the same as the B function.  If
the two functions are not almost identical in functionality,
pretending  that  they  _a_r_e the same by giving them the same
name is just asking for trouble.

     In the context of the C standard, the 90% rule suggests
that  the  standard  library  functions  should  behave in a
manner that is almost identical on all  systems.   It  is  a
mistake,  for  example,  to  make the definition of a binary
file loose enough to encompass a widely divergent set of I/O
devices  and  file formats.  We would rather see the defini-
tion restricted to allow operations that could reasonably be
regarded  as  portable,  and  nothing more.  If a particular
system had special file formats that needed to be supported,



                           - 3 -


University of Waterloo                          August, 1986


the implementation on that system could  provide  additional
I/O routines to deal with such formats.

     If  a  program  is  written  using special routines for
system-dependent  I/O,  porting  the  program  is   actually
simpler.   When  the program is taken to a new system, the C
implementation will issue diagnostic messages indicating the
special  I/O  routines  that  are  not  available on the new
system, and the programmer finds out what has to be changed.
When   porting   a  program  written  only  with  "standard"
routines, the programmer must laboriously track down  system
dependencies  that  were  disguised  by using the "standard"
routines and this is usually a great deal more work.

     In general, then, we believe that the standard  library
should  _n_o_t  be  designed to conceal the system dependencies
that exist on a  particular  machine.   Instead,  it  should
provide   support  for  features  that  are  common  to  _a_l_l
machines, leaving it up to the individual implementation  to
support dependencies.

_G_e_n_e_r_a_l _N_o_t_e _4: _T_h_e _C_o_r_r_e_c_t _A_n_s_w_e_r:

     At times, efficiency has been put ahead of correctness.
A good example of this occurs with mixed signed and unsigned
operations.  Consider the following code.

          short i;
          unsigned short u;
             ...
          if ( i < u ) ...

Since the committee allows this to result in either a signed
or    an   unsigned   comparison   (depending   on   whether
sssiiizzzeeeooofff(ssshhhooorrrttt) is less than sssiiizzzeeeooofff(iiinnnttt)), iiittt iiisss pppooossssssiiibbbllleee ttthhhaaattt
aaa  nnneeegggaaatttiiivvveee vvvaaallluuueee ooofff "iii" cccooouuulllddd bbbeee fffooouuunnnddd greater than a posi-
tive value of  "u".   Correctness  has  been  sacrificed  to
efficiency.

     This  situation should be avoided.  If a signed integer
is negative, it should be less than all  unsigned  integers.
All  other considerations of lengthening or converting argu-
ments are secondary.  Note  that  efficiency  doesn't  enter
into  this  --  a  compiler or interpreter has a much better
chance of generating efficient code than a user checking for
the   situation  explicitly  by  writing  iiifff  statements  or
conditional expressions.







                           - 4 -


University of Waterloo                          August, 1986


     If a programmer really wants the possibility of a nega-
tive  integer  being larger than a signed one, he or she can
always use explicit casts, as in

          if ( (unsigned)i < (unsigned)u ) ...

Similarly, if a programmer knows that a particular  form  of
comparison is more efficient than the default, he or she can
always use explicit casting to ask for  the  more  efficient
comparison  method.   When it _i_s possible for the programmer
to control efficiency, why give a more naive programmer  the
wrong answer?

_G_e_n_e_r_a_l _N_o_t_e _5: _E_x_i_s_t_i_n_g _I_m_p_l_e_m_e_n_t_a_t_i_o_n_s:

     The  Rationale  states that existing code is important,
but existing implementations are not.   We  agree  with  the
principle,  but  must point out that many popular C programs
are intimately connected  to  a  particular  implementation.
Adopting  practices  contrary to the way popular implementa-
tions work (e.g. the UNIX  C  compilers)  may  indeed  break
programs  in  subtle  ways.   For  this reason, our comments
sometimes state, "Implementation X does  this  differently."
In such cases, we are not saying that the standard should be
changed to do things X's way; we simply want  to  point  out
that  some  important  compiler does not behave in the given
manner, and one should  expect  some  code  to  break  as  a
result.

_G_e_n_e_r_a_l _N_o_t_e _6: _M_e_a_n_i_n_g_f_u_l _S_l_o_p_p_y _C_o_d_e:

     In   a  few  instances,  the  designers  have  made  it
necessary for conforming implementations to support "sloppy"
programming  practices.   For  example,  the  following  are
supposed to be equivalent.

          extern int i;
          int extern i;

When  a  sloppy  practice is well-established, the designers
are justified, because existing programs should continue  to
work.   However, the above practice is virtually unknown (at
least to the people we have talked to and in the programs we
have examined), and requiring all implementations to support
it is surprising.
     More  to  the  point, making sloppy code meaningful has
undesirable side effects.  A user  who  accidentally  writes
such  code  receives no diagnostic message, because the code
is correct...even when the code is  probably  not  what  the
user  intended  to  write.   The  program  may  behave in an
unexpected way because a typing mistake is accepted.



                           - 5 -


University of Waterloo                          August, 1986


     In addition,  the  implementation  that  is  forced  to
accept  sloppy  code  has  more  difficulty generating error
messages.  It has less chance of identifying precisely where
the code went wrong, because the programmer has so much more
leeway.  Consequently,  the  diagnostic  facilities  of  the
implementation  are  degraded  for  _a_l_l  users,  in order to
support the sloppy few.
     The  standard  should  not  require  implementations to
support "loose" language constructs that  are  seldom  used.
Obviously,  there  are  instances  where  the designers must
decide whether a construct is or isn't  used  and  different
people  may  have  different opinions on the matter.  Still,
the  basic  principle  should  be,  "No  sloppiness,  unless
required by common practice."

_G_e_n_e_r_a_l _N_o_t_e _7: _R_a_t_i_o_n_a_l_e:

     The  purpose  of  the  Rationale  document should be to
explain why particular decisions were made.  All too  often,
the  Rationale  is  used  to explain what the standard says.
Obviously then, the standard  itself  should  be  made  more
clear, with more examples and illustrations.

_S_p_e_c_i_f_i_c _S_e_c_t_i_o_n _C_o_m_m_e_n_t_s:

     The rest of this document talks about specific sections
of the standard and the Rationale.

_1._6  _C_o_m_p_l_i_a_n_c_e:

     A conforming freestanding implementation should provide
the standard header <stddef.h> in addition to <limits.h> and
<float.h>.  The <stddef.h> should _n_o_t include a  declaration
for  "errno".  For more on "errno", see our comments on Sec-
tion 4.1.1.

_2._1._1._2  _T_r_a_n_s_l_a_t_i_o_n _P_h_a_s_e_s:

     Since the process of linking translated source files is
described  in  this  section, one might believe that linking
must  take  place  in  the  translation  environment.    The
standard should explicitly state that linking can take place
in either  the  translation  environment  or  the  execution
environment (or in some other environment, for that matter).

_2._2._4._2  _N_u_m_e_r_i_c_a_l _L_i_m_i_t_s:

     We  do  not  understand  why so many #defined names are
missing their vowels.  For example, why use SHRT instead  of
SHORT?   The  difference  in  keystrokes is minimal, and the
standard guarantees that #defined names can be 31 characters



                           - 6 -


University of Waterloo                          August, 1986


long.  This criticism applies to many of the names chosen by
the designers.

     Technically  speaking,  the definition of FLT_ROUNDS is
incorrect.  The beginning of the section  states  that  each
macro must be at least the given value.  The given value for
FLT_ROUNDS is 0, but  the  value  -1  is  also  said  to  be
meaningful.   Also,  the  alternatives  for  this  value are
"rounds", "chops", and "indeterminate".  This overlooks  the
fact  that  "chops"  could  mean  "truncate towards zero" or
"truncate towards  negative  infinity".   We  conclude  that
there  should  actually be four alternatives for FLT_ROUNDS,
not just three.

_3._1._2._1  _S_c_o_p_e_s _o_f _I_d_e_n_t_i_f_i_e_r_s:

     The Rationale says that the behavior  is  undefined  if
you  use  an  identifier  outside  its  scope.  The standard
itself says nothing about this possibility.

_3._1._2._2  _L_i_n_k_a_g_e_s _o_f _I_d_e_n_t_i_f_i_e_r_s:

     According to the rules for  declarations  that  include
the keyword eeexxxttteeerrrnnn, it is not possible to declare

          extern i;
            ...
          static i;

inside a file and have the first declaration of "i" refer to
the static (internal linkage) "i".  The Rationale says  this
decision  was made in order to allow one-pass compilers, but
in  fact,  one-pass  compilers  are  possible  without  this
restriction.   All  that the one-pass compiler needs to make
this work is a loader with a bit of intelligence.

     We believe that  this  is  contrary  to  the  principle
stated  in  Section  1.2  of the Rationale: Existing code is
important, existing implementations are not.  Ruling out the
above  construct  will  break  a  good  many  existing  UNIX
programs, since the existing  UNIX  compilers  allow  eeexxxttteeerrrnnn
declarations to be resolved to ssstttaaatttiiiccc objects that have file
scope and internal linkage.  Therefore, we believe the above
construct should be made legal.

     We  note also that if the eeexxxttteeerrrnnn definition occurs in a
function and the  ssstttaaatttiiiccc  outside  a  function,  we  have  a
different situation.  For example, consider






                           - 7 -


University of Waterloo                          August, 1986


          f()
          {
              extern int i;
                   ...
          }
          static float i;

According  to  the second paragraph of the Semantics section
in 3.5.1, an eeexxxttteeerrrnnn definition inside a function  refers  to
an  object  that  is  defined somewhere with file scope.  It
cannot refer to the ssstttaaatttiiiccc definition  (because  that  comes
later),  so  it  must refer to some definition with external
linkage.  As soon as the ssstttaaatttiiiccc declaration  is  encountered
however,  all subsequent references in the file refer to the
static variable.  This is odd, to say the least.

_3._1._2._3  _N_a_m_e _S_p_a_c_e _o_f _I_d_e_n_t_i_f_i_e_r_s:

     The Rationale states that the intention is to _p_e_r_m_i_t as
many  separate name spaces as possible.  In fact, we believe
it _r_e_q_u_i_r_e_s as many separate name spaces as possible.

     The standard says that all tags (structure, union,  and
enum tags) should be folded together.  We don't see why this
is necessary.  Distinguishing the different  types  of  tags
will  not  break  any  existing  programs,  but folding them
together  may  break  programs  that  were  written  for  an
implementation that _d_i_d distinguish the different tags.

_3._1._2._5  _T_y_p_e_s:

     An  unsigned and signed integer take up the same amount
of memory.  The standard should also  state  they  have  the
same alignment requirements.  This assumption is true of all
machines we know, and  allows  simpler  coding  of  portable
programs.

     There  is also the implication at various points in the
standard that an integral zero consists of all 0-bits.   For
example,  Footnote  71 (to "calloc") implies that every zero
except pointers and floating point types  consists  entirely
of  0-bits.   Furthermore,  the range of values available to
the unsigned type overlaps the range of non-negative  values
for the signed type.

     This  argues  that  the  document should state that all
values which can be represented by both signed and  unsigned
integers   (i.e.  the  non-negative  integers  that  can  be
represented by sssiiigggnnneeeddd iiinnnttt) have the same bit pattern.   This
is true for all common representation schemes: one's comple-
ment, two's complement, and signed  magnitude.   We  believe



                           - 8 -


University of Waterloo                          August, 1986


that the signed-unsigned algorithm stated in 3.2.1.2 tacitly
assumes that this equivalence is true.  The equivalence also
legitimizes many of  the  bit  operations  that  take  place
inside existing C programs.

     The explanation of pointer types should be considerably
expanded.  Our reading of the standard shows several assump-
tions  about  various  pointer  types  that are never stated
explicitly.  We believe  that  users  would  understand  the
language  better if these assumptions were stated explicitly
in this section.

     For example, footnote 36  to  section  3.5.2.2  assumes
that  the  alignment  and  size  of  all  pointers to struc-
ture/union types will be the same.  We believe this  assump-
tion  is  valid,  but  it  should  be  stated  explicitly in
3.1.2.5.

     Similarly, if A is a pointer to type T,  it  should  be
true that

          (char *) (A + 1) == ((char *) A) + sizeof(T)

(If  this  were  not  true,  "malloc"  would  be  in serious
trouble.) This should be stated explicitly.

     As another example, given that the alignment of  signed
and  unsigned  integers  is equal, and given that arrays are
made  up  of  contiguous  objects,  a  statement  like   the
following is true.

          int *p;
          (int *) ( (unsigned *)p + 10) == p+10

It would  be  helpful  if  the  standard  or  the  Rationale
actually pointed this out.

     Also,  we  cannot  find  an  explicit definition of the
phrase "pointer to object".   We  assume  that  it  means  a
pointer  type  which is not a pointer to a function or vvvoooiiiddd,
but we could not find such a definition.

_3._1._3._2  _I_n_t_e_g_e_r _C_o_n_s_t_a_n_t_s:

     According to the standard, an unsuffixed octal  or  hex
integer  constant  can  be  interpreted  as either signed or
unsigned.  Certain constants will be interpreted  as  signed
on   some  machines  and  unsigned  on  others,  because  of
differences in machine word size.  Due to the drastic effect
an  unsigned  operand  may have (e.g. in a comparison opera-
tion), there must be some way to ensure  that  a  number  is
taken as signed.  We suggest an "s" suffix.


                           - 9 -


University of Waterloo                          August, 1986


     Since the sign is not  part  of  the  definition  of  a
constant,  the  "number"  -32768  will  be treated as a long
integer, even though  it  fits  into  16  bits.   This  will
surprise  programmers  who  use  it as the smallest possible
short integer.

_3._2._2._1  _A_r_r_a_y_s, _f_u_n_c_t_i_o_n_s, _a_n_d _p_o_i_n_t_e_r_s:

     The second paragraph of this section states

     Except when used as an operand that may  or  shall
     be  a  function locator, an identifier declared as
     "function  returning  type"  is  converted  to  an
     expression  that  has  type  "pointer  to function
     returning type".

The way we read this, it appears that we can  say  something
like

          extern int f();
          (*f)();

Since the "*" operator may not take a function locator,  the
function locator is regarded as a pointer, and therefore the
"*" operator accepts it.  By  applying  recursion,  it  then
seems legal to say

          (**f)()
          (***f)()
          (****f)()

and so on.

     The  Rationale  should point out that function pointers
cannot be cast into other pointer types, and that  the  only
thing  that  can  be  assigned  to  a  function pointer is a
pointer of the same type or (vvvoooiiiddd *) 000.

     We believe the document has a built-in assumption  that
any  pointer  cast  to  (vvvoooiiiddd *) yields a unique value.  (If
this assumption is not true, a function like "memcpy"  could
not  work.)  We  think  this  assumption  should  be  stated
explicitly.

_3._3._2._2  _F_u_n_c_t_i_o_n _C_a_l_l_s:

     The second paragraph of the Semantics section should be
changed to the following:

     If    the   postfix   expression   preceding   the
     parentheses in a function call consists solely  of



                           - 10 -


University of Waterloo                          August, 1986


     an identifier, and if no declaration is  in  scope
     for  this identifier, the identifier is implicitly
     declared exactly as if,  in  the  innermost  block
     containing the function call, the declaration

          extern int identifier;

     appeared.

This  prevents  implicit  declaration when the function call
has a form like

          (f)()    or
          (*f)()


_3._3._3._2  _A_d_d_r_e_s_s _a_n_d _I_n_d_i_r_e_c_t_i_o_n _O_p_e_r_a_t_o_r_s:

     Consider an array declared with

          int A[10];

By 3.3.3.4, we have

          sizeof(A) == sizeof(int) * 10

We also have

          (char *)(A+1) == (char *)A + sizeof(int)

What is the value of

          (char *)( (&A) + 1)

Is it

          ( (char *)A ) + sizeof(int)
                   or
          ( (char *)A ) + 10 * sizeof(int)

3.3.3.2 implies the second (i.e. that &A is a pointer to  an
array of 10 ints), but does not state it precisely.

_3._3._4  _C_a_s_t _O_p_e_r_a_t_o_r_s:

     This  section  says that a pointer to type ccchhhaaarrr has the
least strict alignment.  It should also  make  some  comment
saying  that  a  pointer to vvvoooiiiddd is the most _g_e_n_e_r_a_l pointer
type, and therefore shares the least strict  alignment  with
ccchhhaaarrr.




                           - 11 -


University of Waterloo                          August, 1986


_3._3._6  _A_d_d_i_t_i_v_e _O_p_e_r_a_t_o_r_s:

     A very close reading of  this  section  indicates  that
arithmetic  with (vvvoooiiiddd *) pointers is illegal.  However, the
point is very subtle and could easily be missed.  We suggest
that  it  be  emphasized.   The same point should be made in
3.3.8 (on relational operators).

_3._3._1_5  _C_o_n_d_i_t_i_o_n_a_l _E_x_p_r_e_s_s_i_o_n:

     The standard states that you can  have  expressions  of
the form

          i ? p : v

where  "p"  is  a  pointer  type and "v" is a (vvvoooiiiddd *).  The
result of this expression is said to be a (vvvoooiiiddd *).

     It seems to us that  this  is  the  wrong  way  around.
Instead,  the  result of the expression should have the type
of the pointer "p".  For example, consider

          char *cp;
          int *ip;
            ...
          ip = cp ? cp : malloc(10);

Since  the  result of "malloc" is (vvvoooiiiddd *) the result of the
right hand side of the assignment will be  (vvvoooiiiddd  *).   This
will  be  quietly assigned to "ip", even if the actual value
of the expression is "cp".  To avoid  such  quiet  problems,
the result should be the pointer type that is not (vvvoooiiiddd *).

_3._3._1_6._1  _S_i_m_p_l_e _A_s_s_i_g_n_m_e_n_t:

     The  standard  must  be  more  clear  on assignments of
"pointers to functions".  Suppose A and B are both  pointers
to  functions returning iiinnnttt but the functions have different
prototypes (or one function has a prototype  and  the  other
doesn't).   Is  A=B  legal?   Guidelines  for  compatibility
between function pointers should be established.  We believe
the  guidelines should follow the rules for type equivalence
given in 3.5.5.

     The standard says that assigning overlapping objects to
one  another is undefined (and therefore illegal).  While we
recognize that  there  are  many  instances  when  assigning
overlapping  objects  to  one  another cannot be done safely
(e.g. when objects are referenced with pointers), there  are
some  instances  where we believe it is a mistake to say the
operation is illegal.  In particular,  many  of  our  own  C
programs use the operation


                           - 12 -


University of Waterloo                          August, 1986


          union {
              float f;
              int i;
          } u;
            ...
          u.f = u.i;

According  to  the  standard,  this  operation  will  become
illegal.

     We might point out the odd effect that

          u.f = (float) u.i;

would still seem to be legal, even if the assignment without
the  cast  is  not.  The cast operation presumably takes the
value of "u.i", converts it, and stores it in some temporary
storage,  so  assigning  it  to "u.f" causes no overlap.  If
this really is  intended,  the  standard  or  the  Rationale
should comment on it.

     The  difference between ccchhhaaarrr, uuunnnsssiiigggnnneeeddd ccchhhaaarrr, and sssiiigggnnneeeddd
ccchhhaaarrr must be discussed.  If a program declares

          char *p;
          unsigned char *u;
          signed char *s;

is it possible to make assignments like

          p = u;
          u = p;
          u = s;
          s = u;
          p = s;
          s = p;

This  question  arises  because  ccchhhaaarrr  may be signed in some
implementations and unsigned in others.  As a  result,  some
of  the above assignments will be valid on some machines but
not on others.

     We suggest that the  assignment  rules  be  changed  to
allow  the  (uncast) assignment of ccchhhaaarrr to uuunnnsssiiigggnnneeeddd ccchhhaaarrr and
vice versa.  The same should  apply  to  pointers  to  these
types.

     Note  that  people writing portable programs will never
use the ccchhhaaarrr type; they will use sssiiigggnnneeeddd ccchhhaaarrr when  they  are
using  the  value arithmetically and uuunnnsssiiigggnnneeeddd ccchhhaaarrr when they
are using the character as a  character.   Using  the  plain



                           - 13 -


University of Waterloo                          August, 1986


ccchhhaaarrr type will be non-portable.   However,  this  runs  into
other problems.  In particular, suppose someone writes

          unsigned char a[] = "string";
          unsigned char *cp;
              ...
          cp = "abc";

These operations  will  work  on  a  system  where  ccchhhaaarrr  is
unsigned,  but  not  if ccchhhaaarrr is signed.  To make such opera-
tions possible, it must be possible to intermix  plain  ccchhhaaarrr
and uuunnnsssiiigggnnneeeddd ccchhhaaarrr types in the ways shown above.

_3._3._1_6._2  _C_o_m_p_o_u_n_d _A_s_s_i_g_n_m_e_n_t:

     According to the standard, an operation like

          int i;
          i /= 3.5;

would be performed using floating point division.   However,
the  Berkeley  C  compiler  uses integer division.  For this
reason, this should be marked as a quite change.

_3._3._1_7  _C_o_m_m_a _O_p_e_r_a_t_o_r:

     The standard  states  that  the  comma  operator  is  a
sequence  point, but it is not clear what point of the comma
operation is _t_h_e point.  For example, consider  the  expres-
sion

          A = ((B=1),B) + ((B=2),B) + ((B=3),B);

What  should A equal?  (We note that the Berkeley C compiler
assigns the value 9 to A in the expression above.) Does  the
sequence  point  take place at the comma (i.e. when only the
left half of the expression has been evaluated) or  does  it
take place when both sides of the comma have been evaluated?
Are there actually two sequence points?  The  same  sort  of
problem obviously occurs with

          func( (b=1,b) , (b=2,b) );

In  fact, the question generalizes.  Several operators (e.g.
"&&", "||") are said to be sequence points, when the  opera-
tion  actually  has  several  "points"  to it.  The standard
should be more explicit, e.g.

     There is a sequence point after the evaluation  of
     the left operand.

or


                           - 14 -


University of Waterloo                          August, 1986


     There is a sequence point after the evaluation  of
     the result of the operator.

The  second  paragraph  of  3.3 makes some effort to address
this problem, but it is too nebulous to be much help.

_3._4  _C_o_n_s_t_a_n_t _E_x_p_r_e_s_s_i_o_n_s:

     We  point  out  that  if  the   "offsetof"   macro   is
implemented  as suggested in the rationale, it will not be a
constant expression according to the rules of this  section.
Since  we  like the suggested implementation, we suggest the
definition  of  constant  expressions   be   modified.    In
particular,  the  standard  should  say that the implementa-
tion's behavior is _u_n_d_e_f_i_n_e_d if a constant  expression  does
not  comply with the given rules.  This gives an implementa-
tion the  freedom  to  support  an  expanded  definition  of
constant expressions if desired.

_3._5  _D_e_c_l_a_r_a_t_i_o_n_s:

     For  the sake of readability, it should not be legal to
enclose an entire declarator in parentheses, as in

          int (x);

A function prototype containing such a declaration  is  very
deceptive.  For example,

          int f(int (x));

means  that  "f" has an integer parameter named "x"...unless
"x" happens to be the name  of  a  type  as  declared  in  a
tttyyypppeeedddeeefff  statement,  in  which case the argument of "f" is a
function that takes an argument of type "x" and  returns  an
integer.   Confusion  can  be  avoided  if  such  extraneous
parentheses are not allowed.

     We were surprised  that  the  standard  allows  storage
class  specifiers  to be intermixed with type specifiers, as
in

          const int extern long a;

We were even more surprised to discover that the Berkeley  C
compiler  already supports such constructs.  We don't really
understand why it is necessary to support this sort of thing
--  we  would be surprised if any existing programs make use
of it.  An implementation that accepts this kind of code has
a  good  deal  of  trouble  generating  comprehensible error
messages, since it cannot be so rigid  in  its  approach  to



                           - 15 -


University of Waterloo                          August, 1986


parsing.  _A_l_l programmers  will  receive  poorer  diagnostic
messages  in  the  interests  of  catering  to  the very few
programmers who would want to ignore  very  well-established
code-writing conventions.

_3._5._1  _S_t_o_r_a_g_e-_C_l_a_s_s _S_p_e_c_i_f_i_e_r_s:

     The  semantic description of the rrreeegggiiisssttteeerrr storage class
should be reworded to the following:

     A   declaration   with   storage-class   specifier
     rrreeegggiiisssttteeerrr  is an aaauuutttooo declaration with a suggestion
     that the object will be frequently  accessed,  and
     thus  that the compiler should attempt to speed up
     access to the object.  One restriction applies  to
     an  object  declared  with storage-class specifier
     rrreeegggiiisssttteeerrr:  the  unary  "&"  (address-of)  operator
     must  not  be  applied  to  it.  Since the program
     cannot  legitimately  generate  a  pointer  to  an
     object  with the storage-class specifier rrreeegggiiisssttteeerrr,
     a frequently-used  optimization  is  to  keep  the
     object  in  fast  storage which cannot be accessed
     through a pointer, e.g. a hardware register.

By rephrasing the definition this way, you give the rrreeegggiiisssttteeerrr
storage-class  more  meaning.   In  particular, you open the
door to compilers that perform  global  optimizations  using
the fact that rrreeegggiiisssttteeerrr variables can never have their values
changed by indirection through a pointer.  The compiler  can
optimize the use of rrreeegggiiisssttteeerrr variables because it can always
know when the register values are used and changed.

     As currently defined in the standard,  rrreeegggiiisssttteeerrr  is  an
all-or-nothing  optimization.   We  feel that machines which
can't give "all" (due to a shortage of registers)  shouldn't
be forced to give "nothing".

     The  standard  might  also  make some statement on what
implementations should do  if  there  are  several  rrreeegggiiisssttteeerrr
declarations  and  only  some  of  these  can  be  used  for
optimizations.  We propose that the standard  say  that  the
declarations  which  come  lexically first will be optimized
first.  This gives  a  programmer  some  way  of  indicating
preference of optimization.

_3._5._2._1  _S_t_r_u_c_t_u_r_e _a_n_d _U_n_i_o_n _S_p_e_c_i_f_i_e_r_s:

     We suggest that the definition for "struct-declaration"
be changed to





                           - 16 -


University of Waterloo                          August, 1986


          struct-declaration:
              type-specifier-list struct-declarator-list;
              struct-or-union-specifier;

The added possibility lets you define an unnamed element  of
this type.  The sub-elements will appear as first-level ele-
ments in the enclosing structure.  Using  this  scheme,  the
example in 3.3.2.3 could become

          struct {
              int type;
              union {
                  int intnode;
                  double doublenode;
              };
          } u;
          /* ... */
          u.type = 1;
          u.doublenode = 3.14;
          /* ... */
          if (u.type == 1)
              /* ... */ sin(u.doublenode) /* ... */


_3._5._2._2  _S_t_r_u_c_t_u_r_e _a_n_d _U_n_i_o_n _T_a_g_s:

     The form

          struct y;

now has a special meaning.  Suppose we define

          typedef struct y z;

Does the code

          z;

have the same effect as

          struct y;

We note that you can  use  pointers  to  structures  without
having  to define the structure itself.  Do you ever have to
define a structure's contents in a particular source file?

_3._5._2._3  _E_n_u_m_e_r_a_t_i_o_n _T_y_p_e_s:

     If we have





                           - 17 -


University of Waterloo                          August, 1986


          enum E1 { e1 } var;
          enum E2 { e2 };

is it legal to say

          var = e2;

The  answer  is almost certainly yes...but we would be happy
if we were allowed to give a warning or an error message for
the  operation, if there is no explicit cast.  Similarly, we
would like to give a warning for things like

          var = e2 + 1;


_3._5._2._4  _c_o_n_s_t _a_n_d _v_o_l_a_t_i_l_e:

     According to our reading of the standard, the following
code is illegal.

          f1() {
              extern const x;
                  ...
          }
          f2() {
              extern x;
                  ...
          }
          int x;

On the other hand, it would be very convenient if one  func-
tion  could  declare  an object cccooonnnsssttt while another did not.
This would let a function indicate when it did not intend to
change  the  value  of an external object, and thereby allow
local optimizations.  The actual definition  of  the  object
would  establish  whether or not the object really was cccooonnnsssttt
(and therefore suitable for allocation in read-only memory).

     The same principle would hold for vvvooolllaaatttiiillleee.

_3._5._3._3  _F_u_n_c_t_i_o_n _D_e_c_l_a_r_a_t_o_r_s:

     The last sentence of the Semantics section reads

     If the list is empty  in  a  function  declaration
     that  is  part of a function definition, the func-
     tion has no parameters.

What does this say about a function definition like





                           - 18 -


University of Waterloo                          August, 1986


          int (*F(int a))() {...

Since the empty identifier list appears as part of  a  func-
tion definition, the function pointed to by F's return value
takes no arguments.  This rules out returning a  pointer  to
an arbitrary integer function.

_3._5._5  _T_y_p_e _D_e_f_i_n_i_t_i_o_n_s _a_n_d _T_y_p_e _E_q_u_i_v_a_l_e_n_c_e:

     The  standard should discuss structs that have the same
tag but different internal structures.

     We also have some questions about the situation where a
tttyyypppeeedddeeefff  declares  a  named  type  with  the  same name as a
variable defined in an enclosing scope.  Inside the scope of
the  named  type,  is the variable completely invisible?  Or
can the variable be visible in contexts where  the  compiler
can clearly determine that the named type is not valid?

     As another questionable construction, consider the code

          typedef int X;
          typedef X *Y;
          f(void)
          {
              typedef char X;
              Y b;

The definition of X inside the function  clearly  supercedes
the  external  definition of X.  However, it is not clear if
"b" is a pointer to an integer (using the definition of X at
the  time  Y  was defined) or a pointer to a character (as X
was defined at the time "b" was declared).

     An even more subtle situation is

          struct X { /* definition 1 */ };
          typedef struct X *Y;
               ...
          f(void)
          {
              struct X { /* new definition */ };
              Y Z;
              ...

Is  Z  a  pointer to the old X structure or the new one?  We
believe that most people would expect Z to be a  pointer  to
the  old  X  structure.   However,  a  strict reading of the
definition of tttyyypppeeedddeeefff suggests otherwise.  The standard says
that  a  typedef  type is not a new type; it is a name for a
type that could be defined in another way.  Since a  pointer



                           - 19 -


University of Waterloo                          August, 1986


to the old X structure could _n_o_t be defined in  another  way
after  the  declaration  of the new X structure, the typedef
type Y would have to refer to the new structure.   To  avoid
such  hair-splitting,  the  standard  should state precisely
what happens in such a case.

_3._5._6  _I_n_i_t_i_a_l_i_z_a_t_i_o_n:

     Is the following initialization legal?

          int f(int a)
          {
              const int b = a*2;

     Consider

          struct X {
              int a,b;
          };
          f() {
              struct X Z;
              int junk = (Z.a=1,Z.b=2,7);
              int more_declarations;

Is this allowed?  Can we initialize  an  aaauuutttooo  structure  in
this  way?  Can we use the side effects in an initializer to
initialize another object?  (We note that the  designers  of
the  standard  ruled out the use of non-constant expressions
to initialize  auto  aggregates  precisely  because  of  the
problem  of side effects.  The above example shows that side
effects are still possible.)

     Can auto initializers make use  of  external  variables
with  the  same  name  as the symbol being initialized?  For
example, is the following valid?

          int i = 1;
          f() {
              char i = i * 2;
              ...

This  sort of construction is allowed and used in Berkeley C
code.

     It appears that the standard says that the following is
legal.

          int i = {{{{{10}}}}};






                           - 20 -


University of Waterloo                          August, 1986


Is this really intended?

     The paragraph beginning at line 542 (about  initializa-
tion  of subaggregates inside aggregates) is very confusing.
At the very least, it should be reworded to be  more  clear.
We  also  believe  that it might not say what you mean it to
say, but it's too hard to construe for us to be  sure.   For
example, how is the following interpreted?

          int a[4][5][6] =
          {
              { 1, 2 },
              { 3, 4, 5 },
              { 6, 7, 8, 9 }
          };


_3._6._4._2  _T_h_e _s_w_i_t_c_h _S_t_a_t_e_m_e_n_t:

     The  Rationale  says  that  ranges  in case labels were
rejected  because  many  current  compilers  would  generate
excessive  amounts of code.  This does not seem to be a good
reason for rejecting something that could be  quite  useful.
Making  a  compiler  generate  the  equivalent iiifff code for a
switch  range  is  trivial  compared  with   (for   example)
requiring  both  signed  and  unsigned  characters.  This is
indeed a minor extension.

     It is our belief that a compiler  is  usually  able  to
produce better code for case ranges than a programmer trying
to do it by hand using iiifff statements.  Therefore  efficiency
is actually improved by supporting case ranges.

     If  you  do not want to sanctify case ranges as part of
the standard, the committee should still recognize that case
ranges  are  likely  to be common extensions to the language
and should be listed in Section 5.6.4.  More to  the  point,
the  committee should develop some syntax for case labels so
that implementations that want to offer the extension can do
so in a consistent way.

     The  ".."  notation  mentioned  in the Rationale is not
acceptable because of the tokenizing  rules:  1..3  will  be
interpreted  as  the two floating point numbers 1.0 and 0.3.
We  would  suggest  using  the  tilde  as  the  case   range
separator, as in

          case 1~10: ...






                           - 21 -


University of Waterloo                          August, 1986


This does not introduce a new operator and yet it is easy to
parse  because  tilde  has no binary meaning.  It also looks
good (i.e. the visual appearance suggests its meaning).

     To  improve  ssswwwiiitttccchhh  statements  even  more,  we  would
recommend  provisions  for "open-ended" case ranges as well,
of the form

          case >n:
          case >=n:
          case <n:
          case <=n:

These avoid the example

          case 0..65535:

given in the Rationale, since the case just becomes

          case >=0:

Moreover,  _t_h_i_s  form  is  completely  portable,  since it's
independent of the number of bits in the switch variable.

_3._7._1  _F_u_n_c_t_i_o_n _D_e_f_i_n_i_t_i_o_n_s:

     The standard seems to allow extraneous declarations  in
a function heading, as in

          f(a,b,c)
          int a;
          typedef struct X ...;
          int b;
          int c;
          { ...

Was this the intention?  It strikes us as a poor idea.

     Also note that the UNIX C compiler currently allows the
form

          int (*f())(a,b,c) {...

in function definitions, but the standard will require

          int (*f(a,b,c))() {...

We believe this is a quiet change.






                           - 22 -


University of Waterloo                          August, 1986


_3._8._1  _C_o_n_d_i_t_i_o_n_a_l _I_n_c_l_u_s_i_o_n:

     The directive #elif  should  be  renamed  to  the  more
mnemonic  #elseif.   As  an  alternative, the compiler might
recognize the following.

          #else if
          #else ifdef
          #else ifndef

     If an undefined identifier appears in  an  #if  expres-
sion, _a_n_d _i_f _i_t_s _v_a_l_u_e _w_a_s _n_e_e_d_e_d, an error should be given.
Thus if A is not defined,

          #if A

gives an error.  However, if B is defined and non-zero

          #if (B||A)

does  not  give  an  error,  because  the  value  of  A   is
irrelevant.

     Note  that the confusion of an undefined symbol meaning
"zero" does not arise from something simple like

          #if UNDEF_SYMBOL

but from code like

          #define X (5*y)
          int y = 0;
             ...
          printf("%d ",X);
          #if X
          printf("is non-zero");
          #else
          printf("is zero");
          #endif

In the #iiifff directive, the X is replaced with  "(5*y)".   The
preprocessor then checks to see if "y" is a #defined symbol.
It isn't so, it turns into zero and the final result of  the
#iiifff condition is zero, even though X itself was defined.  If
the definition of X is changed to some different  expression
(e.g. a simple constant), the #iiifff condition suddenly becomes
true.







                           - 23 -


University of Waterloo                          August, 1986


     If the user really wants to  assume  "undefined"  means
zero, he or she should write

          #if defined(A)&&A

or

          #ifndef A
          #define A 0
          #endif


_3._8._3  _M_a_c_r_o _R_e_p_l_a_c_e_m_e_n_t:

     What happens if a macro with parameters is invoked with
the  wrong  number of parameters or no parameters?  Is it an
error, or is the text preserved?

     The fourth paragraph on page 80 (lines 7 through 12) is
very  difficult  to  understand.  An example would certainly
help clarify what it is trying to say.

_S_e_c_t_i_o_n _4: _G_e_n_e_r_a_l _N_o_t_e_s:

     It should be stated that if a program mixes  macro  and
non-macro  invocations  of  the  same library functions, the
results are unpredictable.  For example, characters  written
with   the   "putchar"  macro  and  the  "putchar"  function
intermixed may come out in the wrong order (or perhaps won't
come out at all).

     Many  functions  return  pointers  to  values  that are
created by the system.  For example,  "strerror"  returns  a
pointer  to  a  string  that  the  system sets up.  Are such
values placed in static storage areas or in memory that  has
been  dynamically  allocated  (by  "malloc")?  The answer to
this question must be stated exactly in each  case  to  make
for uniformity across systems.  The difference is important,
since static storage makes a function dangerous  to  use  in
exception  handlers.  In addition, storage allocated through
"malloc" can be freed if  it  is  no  longer  needed,  while
static storage cannot be.

_4._1._1  _T_e_r_m_s _a_n_d _C_o_m_m_o_n _D_e_f_i_n_i_t_i_o_n_s:

     The  file <stddef.h> should be required for stand-alone
operation.  However,  it  should  not  mention  the  "errno"
value.    All   the   other   contents   of  <stddef.h>  are
characteristics of the hardware and the implementation.  The
"errno"  value is related to the library and should have its
own header <errno.h>.



                           - 24 -


University of Waterloo                          August, 1986


     The standard implies that  headers  which  need  symbol
definitions  that  are  "officially"  in  other headers will
redefine the symbols.  For example, <stdlib.h> needs to  use
"size_t"  in function prototypes, so it will include its own
definition of "size_t".  We feel that it  makes  more  sense
for  <stdlib.h>  to  explicitly  #include  the  header  that
defines "size_t" rather  than  giving  its  own  definition.
Multiple definitions of the same symbol always mean trouble.

     A similar problem is raised with the functions "strtod"
and  "strtol".   Their  definition  implies  that  including
<stdlib.h>  is  all  you  have  to  do to use the functions.
However, the user may also need to use the symbols HUGE_VAL,
ERANGE,   LONG_MAX,   LONG_MIN,  and  "errno".   Should  the
<stdlib.h> file make these available (by defining the values
directly  or  including  the  appropriate  header  files) or
should the user have  to  include  the  appropriate  headers
explicitly?  The standard should answer this question.

_4._1._2  _H_e_a_d_e_r_s:

     The first paragraph contains the sentence

     If  the  program  redefines  a  reserved  external
     identifier, even with  a  semantically  equivalent
     form, the behavior is implementation-defined.

The  term  "implementation-defined"  should  be  changed  to
"undefined".  By definition, "implementation-defined" refers
to  behavior  of a correct program construct.  We believe it
is too broad-sweeping to say that redefinition of a reserved
external  identifier  should  always  be allowed; therefore,
"undefined" is the better term, giving  implementations  the
choice  to  accept  or  not  accept  the  construct.   Also,
"implementation-defined"  implies  that  the  implementation
must   document   how  it  behaves.   The  ramifications  of
redefining a library symbol  may  be  too  unpredictable  to
document.

_4._3._1._9  _T_h_e _i_s_s_p_a_c_e _F_u_n_c_t_i_o_n:

     The  "isspace"  function should also test for the line-
feed character if it is not identical with the new-line.

_4._5._4._6  _T_h_e _m_o_d_f _F_u_n_c_t_i_o_n:

     We feel "modf" should behave in the same way  as  float
to integer conversions.  This means that "*iptr" should have
the same value as





                           - 25 -


University of Waterloo                          August, 1986


          (double)(long) value

when this  operation  does  not  cause  an  overflow.   This
definition  is  more consistent in the (-1,0) range than the
definition  proposed  in  the  standard.   Even   when   the
"(double)(long)"  conversion would cause an overflow, "modf"
should still behave as if it  is  performing  this  sort  of
conversion, in the interests of consistency.

     If  the integer part of "value" is exactly equal to the
most  negative  long  integer,  a   problem   arises.    The
"(double)(long)"  approach  is likely to give one lower than
the most negative integer.  The "modf" code should recognize
this problem and issue an EDOM error in such cases.

_4._5._6._5  _T_h_e _f_m_o_d _F_u_n_c_t_i_o_n:

     "fmod"  should follow the same principle as "modf".  In
the expression

          x == i*y + f

the sign of "f" should be such that

          i == (long) (x/y)

Alternatively, you might declare that "f"  is  always  posi-
tive.   Either alternative is better than declaring that "f"
has the same sign as "x".

_4._7  _S_i_g_n_a_l _H_a_n_d_l_i_n_g:

     Does the SIGABRT signal catch other  abnormal  termina-
tions  besides one raised by "raise" or "abort"?  We believe
it should not.

_4._7._2._1  _T_h_e _r_a_i_s_e _F_u_n_c_t_i_o_n:

     Must "raise" be able to generate _e_v_e_r_y valid signal, or
is  the  implementation  allowed  to  restrict  the  sort of
signals that "raise" can send?  Is it allowed to issue  more
than the standard signals?

_4._8._1  _V_a_r_i_a_b_l_e _A_r_g_u_m_e_n_t _L_i_s_t _A_c_c_e_s_s _M_a_c_r_o_s:

     The standard does not explain why these routines should
be implemented as macros.  We realize  that  the  reason  is
that  the parameters aren't necessarily expressions, but the
standard should say this; otherwise, it just sounds  like  a
petty rule.




                           - 26 -


University of Waterloo                          August, 1986


_4._9._1  _I/_O _I_n_t_r_o_d_u_c_t_i_o_n:

     What happens if the BUFSIZ default value depends on the
type  of device that is connected to the I/O stream?  Making
this a fixed constant may be inadvisable.

_4._9._2  _S_t_r_e_a_m_s:

     The sentence beginning at line 59 should read

     Data  read  in  from  a  text  stream   will   not
     necessarily  compare  equal  to the data that were
     earlier written out to  that  stream,  unless  the
     data consist only of complete _n_o_n-_n_u_l_l lines, _w_i_t_h
     _n_o _t_r_a_i_l_i_n_g _b_l_a_n_k_s, and composed only of printable
     characters  and  the control characters horizontal
     tab, new-line, vertical tab, and form feed.

     Also, we do not know why  the  backspace  was  excluded
from  the set of characters that could be safely written and
read on a text stream.

     The committee obviously believes that binary files will
map  into  some machine-dependent idea of what a binary file
is.  This is not necessarily so.  For  example,  it  is  not
obvious  how  to  map the binary file concept into a record-
based file system.  Such systems can have random  access  to
records, but if records do not have a fixed length, there is
no simple relationship between the UNIX  concept  of  random
access and the file system's.

     The  committee  says that the contents of a binary file
stream  will  be   exactly   what   is   written   with   an
implementation-defined  number  of  NUL characters appended.
This is a  curious  change  on  existing  UNIX  file  system
concepts.   One  of  the most important principles of binary
file streams on UNIX is that you can write a file, then read
it  and  get back _e_x_a_c_t_l_y what was written.  The addition of
extra NUL characters violates this principle.

     Evidently,  the  designers  allowed   the   extra   NUL
characters  in  order to accommodate systems that might need
to pad files out to a certain length.  However,  it  is  not
clear  that  the freedom to add NUL characters is sufficient
to satisfy arbitrary file  system  requirements.   The  file
system  may  be  just as upset at extra NUL characters as it
would be with data that was not padded to  some  appropriate
boundary.  For this reason, we feel that the standard should
simply state that reading from a binary  file  stream  gives
precisely  what was written to the file stream, and leave it
up to the implementation to figure out how to provide such a
service.


                           - 27 -


University of Waterloo                          August, 1986


     It is not  the  business  of  a  portable  standard  to
describe   how   to  perform  non-portable  operations.   In
particular, we believe it is a mistake to encourage the  use
of  binary  streams  when  creating files in system-specific
formats.  A program that builds formatted files in  a  byte-
by-byte  manner  will  certainly  not be portable to systems
that use different file formats.  If  someone  does  try  to
port such a program, it is better for the program to fail in
a very obvious way than to write out a distorted version  of
some  other  system's  file  format.   If  an implementation
believes users will need to create certain kinds of  system-
specific  files,  the  implementation should provide its own
routines to accomplish such tasks.

_4._9._6._1  _T_h_e _f_p_r_i_n_t_f _F_u_n_c_t_i_o_n:

     The description of the "%f"  specifier  says  that  the
output should have six decimal places (if there is no preci-
sion field) and that the number should  be  widened  to  the
appropriate number of digits.  Since the IEEE floating point
standards indicate that floating point  numbers  may  be  as
great  as  10**308,  the  standard  may result in widening a
floating point number to as many as 314 (308+6) digits.   We
recommend  that implementations be allowed to use scientific
notation ("%e" format) in cases  where  the  other  approach
would  widen the value beyond the maximum possible number of
significant digits.  This would probably require the defini-
tion  of a macro in <float.h> to indicate the maximum number
of significant digits.

     The standard explicitly states that the  "#"  qualifier
has  no  effect  on  "%s".   We  see  no  reason why this is
necessary.  In fact, we believe that a  natural  interpreta-
tion  of  "%#s"  would be to print out a string using escape
sequences for non-printable characters.  While this behavior
need  not  be  required by the standard, we don't see why it
should be explicitly ruled out when it would  clearly  be  a
useful  facility.   The  same  point  applies to "%#c".  All
things being considered, it would be easier to say that  the
use   of  "#"  in  "%c",  "%d",  "%i",  "%s",  and  "%u"  is
implementation-defined.

     The Environmental Limit section reads

     The minimum value for  the  number  of  characters
     produced  by  any  single  conversion  shall be at
     least 509.

Obviously, what you really mean is

     Implementations may place a maximum on the  number



                           - 28 -


University of Waterloo                          August, 1986


     of characters produced by any  single  conversion,
     but this maximum cannot be less than 509.

     It   seems   perverse   that   lllooonnnggg  dddooouuubbbllleee  conversion
specifiers must use an upper case 'L' while lllooonnnggg  ones  must
use  lower  case.  It is more sensible to allow either upper
or lower case in both instances.

_4._9._6._2  _T_h_e _f_s_c_a_n_f _F_u_n_c_t_i_o_n:

     The  last  sentence  of  the  first   paragraph   seems
redundant.  The excess arguments will obviously be evaluated
before they are passed to "fscanf".  What you mean to say is
that  no  error  occurs if too many arguments are specified,
but the excess arguments are ignored.

     It seems odd that "fscanf" returns EOF if  input  items
cannot  be  read.   EOF  is conceptually a special character
value (though of course, it is an integer).  Since  "fscanf"
returns  an  integer  in all other cases, it would make more
sense for "fscanf" to return -1.

_4._9._6._7-_9  _v_f_p_r_i_n_t_f, _v_p_r_i_n_t_f, _v_s_p_r_i_n_t_f:

     The Rationale states that a format for  variable-length
argument   lists   was   rejected   because   the  functions
"vfprintf",  etc.  were  "more  controlled".   This  comment
confuses   us,   because  we  don't  understand  what  "more
controlled" means.  Very clearly,  the  "vfprintf"  approach
offers less freedom and therefore is less useful.

     We  suggest  that  "printf"  and  friends  obtain a new
specifier "%v", which accepts two arguments:  a  new  format
string  and a "va_list" of items to format.  This is similar
to the existing "%r" construct on UNIX systems.

     Given the "%v" specifier, writing functions to  perform
the  work of "vprintf" and friends is trivial.  However, the
opposite  is  _n_o_t  true  --  "vprintf"  and   friends   have
significant  difficulty  in  simulating  many of the results
that are possible with "%v".

     The "%v" approach is simply faster, more readable,  and
more  versatile  than  using  "vprintf"  and  friends.   For
example, a call to "printf" could take several normal  argu-
ments,  followed  by  a  "va_list"  argument  pointing  to a
variable list, followed  by  more  normal  arguments.   This
avoids  the  problem  of having to make three calls, one for
the normal arguments, one for the variable list, and one for
the remaining normal arguments.




                           - 29 -


University of Waterloo                          August, 1986


_4._9._1_0._2  _T_h_e _f_e_o_f _F_u_n_c_t_i_o_n:

     The semantics of the EOF "indicator" are based  on  the
UNIX  stream I/O implementation.  Not all systems treat end-
of-file in this manner, so we suggest adopting the following
simple and consistent rule:

     "feof"  should return TRUE if and only if the next
     "getchar" will return  EOF  and  the  most  recent
     "getchar" also returned EOF.

(The  second  part  of  the  provision  is  needed  to avoid
Pascal's problem of having to read ahead.)

     Thus  "fseek"  should  _n_o_t  clear  the  EOF  indicator;
instead, it should re-evaluate it.  After a call like

          ungetc(non_EOF_character);

"feof" should return FALSE.

     If  a program reaches end-of-file, then another program
grows the file, it should be possible  to  continue  reading
without explicitly clearing the EOF indicator.

_4._1_0._1._4  _T_h_e _s_t_r_t_o_d _F_u_n_c_t_i_o_n:

     What  do  "strtod"  and  related  functions  assign  to
"*endptr" if there is a range error?

_4._1_0._3  _M_e_m_o_r_y _M_a_n_a_g_e_m_e_n_t _F_u_n_c_t_i_o_n_s:

     The standard states that  pointer  values  returned  by
"malloc"  et  al may be assigned to a pointer to any type of
object, then used to access such  an  object  in  the  space
allocated.   We suggest that this be changed to read "may be
assigned to a pointer to any type of object  _w_h_o_s_e  _s_i_z_e  _i_s
_l_e_s_s  _t_h_a_n  _t_h_e  _a_m_o_u_n_t  _o_f  _m_e_m_o_r_y _r_e_q_u_e_s_t_e_d".  This allows
greater  efficiency  of  memory  allocation,  especially  on
machines  that  have  a  high alignment requirement for some
data types.  For  example,  some  machines  require  32-byte
alignment for their highest precision floating point, but it
is silly to hand out memory in 32 byte chunks when the  user
only requests a few bytes.

     It  would  also  be  useful  to  have  a  library func-
tion/macro similar to "malloc" that would take both a length
and  an  alignment as arguments.  This would allow for finer
allocation of memory, to shorter alignment boundaries.





                           - 30 -


University of Waterloo                          August, 1986


     In order  to  make  such  a  function/macro  useful  in
portable   programs,  an  aaallliiigggnnnooofff  operator  would  be  very
convenient.  This operator would behave in much the same way
as sssiiizzzeeeooofff:  it would return an integral value indicating the
alignment of a type or object.  For example,  if  a  machine
has  words  containing four bytes and a particular type must
be aligned on a word boundary, the result of  aaallliiigggnnnooofff  would
be  4  (indicating four-byte alignment).  The actual type of
the result of aaallliiigggnnnooofff would be  implementation-defined  like
"size_t".

     Note  that  aaallliiigggnnnooofff would allow programs to write their
own efficient portable memory allocators.  Memory  could  be
"nibbled"  away  in  alignments  suitable  to  whatever data
object needed the storage.  It would not be necessary to get
the largest possible alignment for _e_v_e_r_y object.

_4._1_0._4._3  _T_h_e _g_e_t_e_n_v _F_u_n_c_t_i_o_n:

     The   description  of  this  function  should  read  as
follows.

     The  "getenv"  function  searches  an  _e_n_v_i_r_o_n_m_e_n_t
     _l_i_s_t,  provided  by  the  host environment, for an
     entry identified  by  the  string  pointed  to  by
     "name".   The  set  of  environment  names and the
     method  for  altering  the  environment  list  are
     implementation-defined.
          The "getenv" function returns a pointer to  a
     string  containing  the  value associated with the
     given name.

Our point is that the

          name=value

format is strictly a UNIX concept and need  not  be  grafted
onto other techniques for handling environment variables.

     The  standard  should decide whether the returned value
is stored in a static storage area or  in  storage  obtained
through "malloc".

_4._1_0._4._4  _T_h_e _o_n_e_x_i_t _F_u_n_c_t_i_o_n:

     Why isn't the "onexit" defined as

          int onexit(void (*f)(void));






                           - 31 -


University of Waterloo                          August, 1986


This simplifies the definition considerably.

_4._1_0._4._5  _T_h_e _s_y_s_t_e_m _F_u_n_c_t_i_o_n:

     The explanation of "system" should be expanded to  make
it  more  clear that passing a null pointer is a query about
the existence of a command processor.

_4._1_0._6._2  _T_h_e _d_i_v _F_u_n_c_t_i_o_n:

     We certainly recognize the need to  implement  a  well-
specified  integer  division and remainder operation, but we
do not believe the given "div" function suits the need.

     First, "div" is an inappropriate name  for  a  function
that performs both a division and a remainder operation.  In
fact, we believe that the function should _n_o_t  perform  both
operations.  Instead, you should have

          int _div(int numer,int denom);
          int _rem(int numer,int denom);

This approach has several advantages.

(a)  You  do  not  have  the  overhead  of  calculating  the
     remainder  when  you want the quotient, and vice versa.
     While it is true that many machines generate a quotient
     and remainder simultaneously, this practice is far from
     universal.  VAX machines, for example, can only perform
     division.   To calculate A%B, the machine must make the
     calculation A-(B*(A/B)).  It is expensive to  calculate
     this number when it may not even be needed.

(b)  On   some   machines,   the   two  functions  could  be
     implemented  as  macros.   With   a   single   function
     returning a structure, macros could never be used, even
     if the hardware did the division and  remainder  opera-
     tions in the prescribed manner.

     We  also  note  that  the  operation  prescribed by the
standard's "div" function is the  less  useful  of  the  two
alternatives.   In  our  experience,  the operation that you
usually want to perform is the one that always gives a posi-
tive remainder.  For example, it is much more common to want
(-2)/3 to have a quotient of -1 and a remainder of  +1  than
to  have  a quotient of 0 and a remainder of -1.  You almost
always want to  move  negative  quotients  towards  negative
infinity, not towards zero.






                           - 32 -


University of Waterloo                          August, 1986


_4._1_1._3._2  _T_h_e _s_t_r_n_c_a_t _F_u_n_c_t_i_o_n:

     It seems odd that "strncat" always adds a trailing '\0'
but "strncpy" does not.

_4._1_1._4  _C_o_m_p_a_r_i_s_o_n _F_u_n_c_t_i_o_n_s:

     In  the  interests  of  portability,  we  believe  that
character comparisons for "memcmp", "strcmp", and  "strncmp"
should   be   made   using  uuunnnsssiiigggnnneeeddd  ccchhhaaarrr  instead  of  the
implementation-defined approach specified in the standard.

_4._1_1._5._6  _T_h_e _s_t_r_s_p_n _F_u_n_c_t_i_o_n:

     For greater  uniformity,  the  name  of  this  function
should  be changed to "strpspn".  This emphasizes the way it
parallels "strpbrk".

_4._1_1._6._2  _T_h_e _s_t_r_e_r_r_o_r _F_u_n_c_t_i_o_n:

     The standard should be more explicit about the  connec-
tion  between  the  "errnum" argument for "strerror" and the
possible values of "errno".

_4._1_2._1  _C_o_m_p_o_n_e_n_t_s _o_f _T_i_m_e:

     Again, we wonder why vowels have fallen into disrepute.
CLK_TCK could easily be named _CLK_TICK or _CLOCK_TICK.

     It  should  be  explicitly  stated  that values of type
"time_t" may not represent time in meaningful units and  may
not even give values that are uniformly distributed.

_4._1_2._2._1  _T_h_e _c_l_o_c_k _F_u_n_c_t_i_o_n:

     "clock"  is  a  poor  name  for a function that returns
processor time.   A  name  like  "processor_time"  would  be
better.

     The  description  of  "clock" says it returns processor
time used since some point in time related only  to  program
invocation.   We  believe  that  it  should  instead  return
processor time accumulated  since  some  previous  point  in
time,  e.g.  the  time  when  the user logged on.  To time a
particular  program,  the  user  would  make  two  calls  to
"clock":   one  at the beginning of execution and one at the
end (or whenever a time check is required).







                           - 33 -


University of Waterloo                          August, 1986


     The reason for our suggestion  is  that  many  non-UNIX
systems  have  no  system  call  to get per-process timings.
Instead, many just keep track of  total  session  time.   If
implementations  are  forced to support "clock" as it is now
described, many  implementations  will  have  to  put  "time
check"  code  into  the  set-up routine for every C program.
This seems very inefficient, especially because  "clock"  is
not the sort of function that will be used frequently.

     If  a  program calls another process using the "system"
function, it may be more efficient on some systems  for  the
processor  time  of  the child process to be included in the
parent's time, while on other systems it is  more  efficient
not  to  include  the child's CPU time.  Thus, this behavior
should be implementation-defined.

_4._1_2._2._4  _T_h_e _t_i_m_e _F_u_n_c_t_i_o_n:

     The standard states that "time" returns

          ((time_t)-1)

if the current time is not available.  However, -1 may  well
be a valid time value on many systems.

     If   you   are   going   to  select  a  reserved  value
arbitrarily, choosing 0 makes more sense,  since  it  allows
tests of the form

          if (time(p)) ...

A   better  solution  would  be  to  create  a  macro  named
_TIME_UNAVAILABLE with

          #define _TIME_UNAVAILABLE ( (time_t) X )

where X is some implementation-defined value.  "time"  would
return this value if the time was undefined.

_4._1_2._3  _T_i_m_e _M_a_n_i_p_u_l_a_t_i_o_n _F_u_n_c_t_i_o_n_s:

     It  has  always been a nuisance to get the current time
of day in string format because you must  declare  your  own
variable  of  type  "time_t".   The library needs a function
that behaves like "ctime" but which is declared with

          char *timefunc(time_t timer);

We could then use





                           - 34 -


University of Waterloo                          August, 1986


          timefunc( time( (time_t) 0 ) )

to get the current time-of-day string.

_S_u_m_m_a_r_y:

     In order to avoid a deluge of reserved words, all newly
introduced   symbols  should  follow  a  simple  rule,  e.g.
beginning with an underscore.  Ambiguities  in  the  defini-
tions  of  structures, unions, and tttyyypppeeedddeeefff constructs should
be clarified or eliminated.

     If you have any questions or comments about any of  the
material  in  this  document,  please  contact Peter Fraser,
manager  of  the  Software  Development  Group,   at   (519)
888-4546.






































                           - 35 -




More information about the Comp.lang.c mailing list