C/C++ coding/doc standards (LONG!)

loki mwjester at wsuiar.uucp
Thu Oct 18 23:43:06 AEST 1990


In article <8143 at scolex.sco.COM> markd (Mark Diekhans) writes:
>Can anyone point me to C and/or C++ coding standards or guidelines.....

Late last year, someone posted to the net the Indian Hill style guide, as
revised by Henry Spencer and others, in troff form.  As an exercise in
teaching myself LaTeX, I did the conversion to cstyle.tex.  The file is
appended below.

Note that stuff which was commented out in the troff file does not appear
here, so it may not be as complete as the one mentioned in someone's message
on ftp locations.  Many thanks to the original poster who put it up!
(Sorry, I forgot who!)

------------------ snip, snip ------------------------------------------------

\documentstyle [twoside,11pt,titlepage] {article}
\setlength{\parskip}{.1in}
\title{\Huge\bf Recommended C Style\\
and\\
Coding Standards}
\author{L.W. Cannon\\
R.A. Elliott\\
L.W. Kirchhoff\\
J.H. Miller\\
J.M. Milner\\
R.W. Mitze\\
E.P. Schan\\
N.O. Whittington\\
\it Bell Labs\\
\\
Henry Spencer\\
\it Zoology Computer Systems\\
\it University of Toronto\\
\\
David Keppel\\
\it EECS, UC Berkeley\\
\it CS, University of Washington}
\date{November 18, 1989}
\begin{document}
\hoffset=.5in
\maketitle
\begin{abstract}
 This document is an updated version of the Indian Hill C Style and Coding 
Standards paper, with modifications by the last two authors. It describes a 
recommended coding standard for C programs. The scope is coding style, not 
functional organization. 
\end{abstract}
\pagenumbering{roman}
\pagestyle{headings}
\tableofcontents
\newpage
\pagenumbering{arabic}
\section{Introduction}
 This document is a modified version of a document from a committee formed at
Indian Hill to establish a common set of coding standards and recommendations
for the Indian Hill community. The scope of this work is C coding style, rather
than the functional organization of programs or general issues such as the use
of gotos. We\footnote{The opinions in this document do not reflect the opinions
of all authors.  This is still an evolving document.  Please send comments and
suggestions to pardo at cs.washington.edu or
(rutgers,cornell,ucsd,ubc-cs,tektronix)!uw-beaver!june!pardo}
  have tried to combine previous work [1,6,8] on C style into a
uniform set of standards that should be appropriate for any project using C,
although parts are biased towards particular systems. Of necessity, these
standards cannot cover all situations. Experience and informed judgement count
for much. Programmers who encounter unusual situations should consult (1)
experienced C programmers or (2) code written by experienced C programmers,
preferably following these rules. 

 The standards in this document are not of themselves required, but individual
institutions or groups may adopt part or all of them as a part of program
acceptance. It is therefore likely that others at your institution will code in
a similar style. Ultimately, the goal of these standards is to increase
portability, reduce maintenance, and above all improve clarity. 

 Many of the style choices here are somewhat arbitrary. Mixed coding style is
harder to maintain than bad coding style. When changing existing code it is
better to conform to the style (indentation, spacing, commenting, naming
conventions) of the existing code than it is to blindly follow this document. 

{\em ``To be clear is professional; not to be clear is unprofessional.'' ---
Sir Ernest  Gowers}.
\newpage
\section{File Organization}
 A file consists of various sections that should be separated by several blank
lines. Although there is no maximum length limit for source files, files with
more than about 1000 lines are cumbersome to deal with. The editor may not have
enough temp space to edit the file, compilations will go more slowly, etc. Many
rows of asterisks, for example, present little information compared to the time
it takes to scroll past, and are discouraged. Lines longer than 80 columns are
not handled well by all terminals and should be avoided if possible.
Excessively long lines which result from deep indenting are often a symptom of
poorly-organized code. 

\subsection{File Naming Conventions}

 File names are made up of a base name, and an optional period and suffix. The
first character of the name should be a letter and all characters (except the
period) should be all lowercase letters and numbers. The base name should be
8 or fewer characters and the suffix should be 3 or fewer characters (four, if
you include the period). These rules apply to both program files and default
files used and produced by the program (e.g., ``rogue.sav''). 

 Some compilers and tools require use certain suffix conventions for names of
files [5]. The following suffixes are required: 
\begin{itemize}
\item C source file names must end in {\em .c}
\item Assembler source file names must end in {\em .s}
\end{itemize}
The following conventions are universally followed:
\begin{itemize}
\item Relocatable object file names end in {\em .o}
\item Include header file names end in {\em .h}
\footnote {An alternate convention that may be preferable in multi-language
environments is to suffix both the language type and {\em .h} (e.g. ``foo.c.h''
or ``foo.ch'').}.
\item Yacc source file names end in {\em .y}
\item Lex source file names end in {\em .l}
\end{itemize}
 C++ has compiler-dependent suffix conventions, including {\em .c}, {\em ..c},
{\em .cc}, {\em .c.c}, and {\em .C}. Since much C code is also C++ code, there
is no clear solution. 

 In addition, it is conventional to use `Makefile' (not `makefile') for the
control file for {\em make} (for systems that support it) and `README' for a
summary of the contents of the directory or directory tree. 

\subsection{Program Files}

 The suggested order of sections for a program file is as follows: 
\begin{enumerate}
\item First in the file is a prologue that tells what is in that file. A
description of the purpose of the objects in the files (whether they be
functions, external data declarations or definitions, or something else) is
more useful than a list of the object names. The prologue may optionally
contain author(s), revision control information, references, etc. 
\item Any header file includes should be next. If the include is for a
non-obvious reason, the reason should be commented. In most cases, system
include files like {\em stdio.h} should be included before user include files. 
\item Any defines and typedefs that apply to the file as a whole are next. One
normal order is  to have ``constant'' macros first, then ``function'' macros,
then typedefs and enums. 
\item Next come the global (external) data declarations, usually in the order:
externs, non-static globals, static globals. If a set of defines applies to a
particular piece of global data (such as a flags word), the defines should be
immediately after the data declaration or embedded in structure declarations,
indented to put the {\em defines} one level deeper than the first keyword of the
declaration to which they apply.
\item The functions come last, and should be in some sort of meaningful order.
Like functions  should appear together. A ``breadth-first'' approach (functions
on a similar level of  abstraction together) is preferred over depth-first
(functions defined as soon as possible  before or after their calls).
Considerable judgement is called for here. If defining large  numbers of
essentially-independent utility functions, consider alphabetical order. 
\end{enumerate}
\subsection {Header Files}

Header files are files that are included in other files prior to compilation
by the C preprocessor. Some are defined at the system level like {\em stdio.h}
which must be included by any program using the standard I/O library. Header
files are also used to contain data declarations and defines that are needed by
more than one program. Header files should be functionally organized, i.e.,
declarations for separate subsystems should be in separate header files. Also,
if a set of declarations is likely to change when code is ported from one
machine to another, those declarations should be in a separate header file. 

 Avoid private header filenames that are the same as library header filenames.
The statement \#include ''math.h'' will include the standard library math
header file if the intended one is not found in the current directory. If this
is what you want to happen, comment this fact. Don't use absolute pathnames for
header files. Use the $<$name$>$ construction for getting them from a standard
place, or define them relative to the current directory. The ``include-path''
option of the C compiler (-I on many systems) is the best way to handle
extensive private libraries of header files; it permits reorganizing the
directory structure without having to alter source files. 

 Defining variables in a header file is often a poor idea. Frequently it is a
symptom of poor partitioning of code between files. Some objects like typedefs
and initialized data definitions cannot be seen twice by the compiler in one
compilation. On some systems, repeating uninitialized declarations without
the {\em extern} keyword also causes problems. Repeated declarations can happen
if include files are nested and will cause the compilation to fail. 

 Header files should not be nested. The prologue for a header file should,
therefore, describe what other headers need to be \#included for the header to
be functional. In extreme cases, where a large number of header files are to be
included in several different source files, it is acceptable to put all common
\#includes in one include file. 

 It is common to put the following into each .h file to prevent accidental
double-inclusion. 
\begin{verbatim}
#ifndef EXAMPLE_H
#define EXAMPLE_H
\end{verbatim}
  ... {\em /* body of example.h file */}
\begin{verbatim}
#endif /* EXAMPLE_H */ 
\end{verbatim}

This double-inclusion mechanism should not be relied upon, particularly to
perform nested includes. 
\newpage
\section{Comments}

{\em ``When the code and the comments disagree, both are probably wrong.'' ---
Norm Schreyer}

 The comments should describe what is happening, how it is being done, what
parameters mean, which globals are used and which are modified, and any
restrictions or bugs. Avoid, however, comments that are clear from the code.
Such information rapidly gets out of date. Comments that disagree with the code
are of negative value. Short comments should be {\em what} comments, such as
``compute mean value'', rather than {\em how} comments such as ``sum of values
divided by n''. C is not assembler; putting a comment at the top of a 3-10 line
section telling what it does overall is often more useful than a comment on
each line describing micrologic. 

 Comments should justify offensive code. The justification should be that
something bad will happen if unoffensive code is used. Just making code faster
is not enough to rationalize a 
hack; the performance must be {\em shown} to be unacceptable without the hack.
The comment should explain the unacceptable behavior and describe why the hack
is a ``good'' fix. 

 Comments that describe data structures, algorithms, etc., should be in block
comment form with the opening /* in column one, a * in column 2 before each
line of comment text, and the closing */ in columns 2-3. An alternative is to
have ** in column 1-2, and put the closing */ also in 1-2. 

\begin{verbatim}
/*
 *      Here is a block comment.
 *      The comment text should be tabbed or spaced over uniformly.
 *      The opening slash-star and closing star-slash are alone on a line.
 */

/*
** Alternate format for block comments
*/ 
\end{verbatim}

 Note that {\em grep *} will catch all block comments in the file
\footnote{Some automated program-analysis packages use different characters
before comment lines as a marker for lines  with specific items of information.
In particular, a line with a `-' in a comment preceding a function is sometimes
assumed to be a one-line summary of the function's purpose.}. Very long
block comments such as drawn-out discussions and copyright notices often start
with /* in column one, no leading * before lines of text, and the closing */ in
columns 1-2. Block comments inside a function are appropriate, and they should
be tabbed over to the same tab setting as the code that they describe. One-line
comments alone on a line should be indented to the tab setting of the code that
follows.

\begin{verbatim}
 if (argc > 1) {
        /* Get input file from command line. */
        if (freopen(argv[1], "r", stdin) == NULL) {
                perror (argv[1]);
        }
 }
\end{verbatim}

 Very short comments may appear on the same line as the code they describe, and
should be tabbed over to separate them from the statements. If more than one
short comment appears in a block of code they should all be tabbed to the same
tab setting. 

\begin{verbatim}
 if (a == 2) {
        return(TRUE);                 /* special case */
 }  else  {
        return(isprime(a));           /* works only for odd a */
 }
\end{verbatim}
\newpage
\section{Declarations}

 Global declarations should begin in column 1. All external data declaration
should be preceded by the extern keyword. If an external variable is an array
that is defined with an explicit size, then the array bounds must be repeated
in the extern declaration unless the size is always encoded in the array (e.g.,
a read-only character array that is always null-terminated). Repeated size
declarations are particularly beneficial to someone picking up code written by
another. 

 The ``pointer'' qualifier, `*', should be with the variable name rather than
with the type. 
\begin{verbatim}
 char     *s, *t, *u; 
\end{verbatim}
instead of 
\begin{verbatim}
 char*    s, t, u; 
\end{verbatim}

 Unrelated declarations, even of the same type, should be on separate lines. A
comment describing the role of the object being declared should be included,
with the exception that a list of \#defined constants do not need comments if
the constant names are sufficient documentation. The names, values, and
comments should be tabbed so that they line up underneath each other. Use the
tab character rather than blanks. For structure and union template
declarations, each element should be alone on a line with a comment describing
it. The opening brace (\{) should be on the same line as the structure tag,
and the closing brace (\}) should be in column 1. 

\begin{verbatim}
 struct boat {
        int     wllength;       /* water line length in meters */
        int     type;           /* see below */
        long    sailarea;       /* sail area in square mm */
 }; 

 /*
  * defines for boat.type
  */
 #               define KETCH (1)
 #               define YAWL  (2)
 #               define SLOOP (3)
 #               define SQRIG (4)
 #               define MOTOR (5) 
\end{verbatim}

These defines are sometimes put right after the declaration of type, within the
struct declaration, with enough tabs after the `\#' to indent define one level
more than the structure member declarations. When the actual values are
unimportant, the enum facility is better \footnote{enums might be better
anyway}.

\begin{verbatim}
 enum bt_t { KETCH, YAWL, SLOOP, SQRIG, MOTOR };
 struct boat {
        int             wllength;       /* water line length in meters */
        enum bt_t       type;           /* what kind of boat */
        long            sailarea;       /* sail area in square mm */
 };
\end{verbatim}

 Any variable whose initial value is important should be explicitly
initialized, or at the very least should be commented to indicate that C's
default initialization to zero is being relied upon. The empty initializer,
``\{\}'', should never be used. Structure initializations should be fully
parenthesized with braces. Constants used to initialize longs should be
explicitly long. 

\begin{verbatim}
 int            x = 1;
 char           *msg = "message";
 struct boat    winner[] = {
        { 40, YAWL, 6000000L },
        { 28, MOTOR, 0L },
        { 0 },
 }; 
\end{verbatim}

 In any file which is part of a larger whole rather than a self-contained
program, maximum use should be made of the static keyword to make functions and
variables local to single files.
Variables in particular should be accessible from other files only when there
is a clear need that cannot be filled in another way. Such usages should be
commented to make it clear that another file's variables are being used; the
comment should name the other file. If your debugger hides static objects you
need to see during debugging, declare them as STATIC and \#define STATIC as
needed. 

 The most important few types should be highlighted by typedeffing them, even
if they are only integers, as the unique name makes the program easier to read
(as long as there are only a {\em few} things typedeffed to integers!).
Structures may be typedeffed when they are declared. Give the struct and
the typedef the same name. 

\begin{verbatim}
 typedef struct splodge_t {
         int sp_count;
         char *sp_name, *sp_alias;
 } splodge_t; 
\end{verbatim}

 The return type of functions should always be declared. If function prototypes
are available, use them. One common mistake is to omit the declaration of
external math functions that return double. The compiler then assumes that the
return value is an integer and the bits are dutifully converted into a
(meaningless) floating point value. 
\newpage
\section{Function Declarations}

 Each function should be preceded by a block comment prologue that gives a
short description of what the function does and (if not clear) how to use it.
Discussion of non-trivial design decisions and side-effects is also
appropriate. Avoid duplicating information clear from the code. 

 The function return value should be alone on a line, indented one
stop\footnote{``Tabstops'' can be blanks (spaces) inserted by your editor
in clumps of 2, 4, or 8. Use actual tabs where possible.}. Do
not default to int; if the function does not return a value then
it should be given return type void\footnote
{\#define void or \#define void int for compilers without the void
keyword.}. If the value returned requires a long explanation,
it should be given in the prologue; otherwise it can be on the same line as the
return type, tabbed over. The function name (and the formal parameter list)
should be alone on a line, in column 1. Destination (return value) parameters
should generally be first (on the left). All formal parameter declarations,
local declarations and code within the function body should be tabbed over one
stop. The opening brace of the function body should be alone on a line
beginning in column 1. 

 Each parameter should be declared (do not default to int). In general each
variable declaration should be on a separate line with a comment describing the
role played by the variable in the function. Loop counters called ``i'', and
string pointers called ``s'' are typically excluded. If a group of functions
all have a like parameter or local variable, it helps to call the repeated
variable by the same name in all functions. Like parameters should also appear
in the same place in the various argument lists. 

 Comments for parameters and local variables should be tabbed so that they line
up underneath each other. Local variable declarations should be separated
from the function's statements by a blank line. 

 Be careful when you use or declare functions that take a variable number of
arguments (``varargs''). There is no truly portable way to do varargs in C.
Better to design an interface that uses a fixed number of arguments. If you
must have varargs, use the library macros for declaring functions with variant
argument lists. 

 If the function uses any external variables (or functions) that are not
declared globally in the file, these should have their own declarations in the
function body using the {\em extern} keyword. 

 Avoid local declarations that override declarations at higher levels. In
particular, local variables should not be redeclared in nested blocks. Although
this is valid C, the potential confusion is enough that {\em lint} will complain
about it when given the -h option.
\newpage
\section{Whitespace}
\begin{verbatim}
 int i;main(){for(;i["]<i;++i){--i;}"];read('-'-'-',i+++"hell\
 o, world!\n",'/'/'/'));}read(j,i,p){write(j/p+p,i---j,i/i);}
\end{verbatim}
{\em Dishonorable mention, Obfuscated C Code Contest, 1984.  Author requested
anonymity.}

 Use whitespace generously, vertically and horizontally. Indentation and
spacing should reflect the block structure of the code; e.g., there should be
at least 2 blank lines between the end of one function and the comments for the
next. 

 A long string of conditional operators should be split onto separate lines. 
\begin{verbatim}
 if (foo->next==NULL && totalcount<needed && needed<=MAX_ALLOT
        && server_active(current_input)) { ... 
\end{verbatim}
might be better as 
\begin{verbatim}
 if (foo->next == NULL
        && totalcount < needed
        && neeeded <= MAX_ALLOT
        && server_active(current_input))  {   ... 
\end{verbatim}
Similarly, elaborate for loops should be split onto different lines. 
\begin{verbatim}
 for (curr = *listp, trail = listp;
        curr != NULL;
        trail = &(curr->next), curr = curr->next )
 {
        ... 
\end{verbatim}

Other complex expressions, particularly those using the ternary (?:) operator,
are best split on to several lines, too. 
\begin{verbatim}
 c = (a == b)
        ? d + f(a)
        : f(b) - d; 
\end{verbatim}
\newpage
\section{Examples}
\begin{verbatim}
 /*
  *     Determine if the sky is blue by checking that it isn't night.
  *     CAVEAT: Only sometimes right. May return TRUE when the answer
  *     is FALSE.
  *     NOTE: Uses `hour' from `hightime.c'. Returns `int' for
  *     compatibility with the old version.
  */
        int                              /* TRUE or FALSE */
  skyblue()
  {
        extern int      hour;            /* current hour of the day */ 

        if (hour < MORNING || hour > EVENING) {
                return (FALSE);          /* black */
        } else {
                return (TRUE);           /* blue */
        }
  }


 /*
  *      Find the last element in the linked list
  *      pointed to by nodep and return a pointer to it.
  *      Return NULL if there is no last element.
  */
         node_t *
 tail(nodep)
         node_t          *nodep;         /* pointer to head of list */
 {
         register node_t *np;            /* advances to NULL */
         register node_t *lp;            /* follows one behind np */ 

         if (nodep == NULL)
                 return (NULL);
         np = lp = nodep;
         while ((np = np->next) != NULL) {
                 lp = np;
         }
         return (lp);
 }  
\end{verbatim}
\newpage
\section{Simple Statements}
 There should be only one statement per line unless the statements are very
closely related. 
\begin{verbatim}
 case FOO:   oogle (zork);  boogle (zork);  break;
 case BAR:   oogle (bork);  boogle (zork);  break;
 case BAZ:   oogle (gork);  boogle (bork);  break;
\end{verbatim}

Always document a null body for a for or while statement so that it is clear
that the null body is intentional and not missing code. 
\begin{verbatim}
 while (*dest++ = *src++)
         ;         /* VOID */ 
\end{verbatim}

Do not default the test for non-zero, i.e. 
\begin{verbatim}
 if (f() != FAIL) 
\end{verbatim}
is better than 
\begin{verbatim}
 if (f()) 
\end{verbatim}
even though FAIL may have the value 0 which C considers to be false. An
explicit test will help you out later when somebody decides that a failure
return should be -1 instead of 0. Explicit comparison should be used even if
the comparison value will never change; e.g., ``if (!(bufsize \% sizeof(int)))''
should be written instead as ``if ((bufsize \% sizeof(int)) == 0)'' to reflect
the numeric (not boolean) nature of the test. A frequent trouble spot is using
strcmp to test for string equality, where the result should {\em never} 
{\em ever} be defaulted. The preferred approach is to define a macro
 {\em STREQ}. 

\begin{verbatim}
 #define STREQ(a, b) (strcmp((a), (b)) == 0) 
\end{verbatim}

The non-zero test is often defaulted for predicates and other functions or
expressions which meet the following restrictions: 
\begin{itemize}
\item Returns 0 for false, nothing else. 

\item Is named so that the meaning of (say) a `true' return is absolutely
obvious. Call a predicate isvalid or valid, not checkvalid. 
\end{itemize}

 It is common practice to declare a boolean type ``bool'' in a global include
file. The special names improve readability immensely. 

\begin{verbatim}
 typedef int     bool;
 #define FALSE   0
 #define TRUE    1
\end{verbatim}
or 
\begin{verbatim}
 typedef enum { NO=0, YES } bool; 
\end{verbatim}

Even with these declarations, do not check a boolean value for equality with 1
(TRUE, YES, etc.); instead test for inequality with 0 (FALSE, NO, etc.). Most
functions are guaranteed to return 0 if false, but only non-zero if true. Thus,
\begin{verbatim}
 if (func() == TRUE) { ... 
\end{verbatim}
must be written 
\begin{verbatim}
 if (func() != FALSE) { ... 
\end{verbatim}

 There is a time and a place for embedded assignment statements. In some
constructs there is no better way to accomplish the results without making the
code bulkier and less readable.

\begin{verbatim}
 while ((c = getchar()) != EOF) {
         process the character
 }
\end{verbatim}

The ++ and -- operators count as assignment statements. So, for many purposes,
do functions with side effects. Using embedded assignment statements to
improve run-time performance is also possible. However, one should consider
the tradeoff between increased speed and decreased maintainability that results
when embedded assignments are used in artificial places. For example, 
\begin{verbatim}
 a = b + c;
 d = a + r; 
\end{verbatim}
should not be replaced by 
\begin{verbatim}
 d = (a = b + c) + r; 
\end{verbatim}
even though the latter may save one cycle. In the long run the time difference
between the two will decrease as the optimizer gains maturity, while the
difference in ease of maintenance will increase as the human memory of what's
going on in the latter piece of code begins to fade. 

 Goto statements should be used sparingly, as in any well-structured code. The
main place where they can be usefully employed is to break out of several
levels of switch, for, and while nesting, although the need to do such a thing
may indicate that the inner constructs should be broken out into a separate
function, with a success/failure return code. 

\begin{verbatim}
        for (...) {
                while (...) {
                        ...
                        if (disaster)
                                goto error; 

                }
        }
        ...
error:
        clean up the mess 
\end{verbatim}

When a goto is necessary the accompanying label should be alone on a line and
tabbed one stop to the left of the code that follows. The goto should be
commented (possibly in the block header) as to its utility and purpose.
Continue should be used sparingly and near the top of the loop. Break is less
troublesome. 
\newpage
\section{Compound Statements}
 A compound statement is a list of statements enclosed by braces. There are
many common ways of formatting the braces. Be consistent with your local
standard, if you have one, or pick one and use it consistently. When editing
someone else's code, {\em always} use the style used in that code. 

\begin{verbatim}
 control {
        statement;
        statement;
 } 
\end{verbatim}

The style above is called ``K\&R style'', and is preferred if you haven't
already got a favorite. With K\&R style, the {\em else} part of an
{\em if-else} statement and the {\em while} part of a {\em do-while} statement
should appear on the same line as the close brace. With most other styles, the
braces are always alone on a line.

 When a block of code has several labels (unless there are a lot of them), the
labels are placed on separate lines. The fall-through feature of the C
{\em switch} statement, (that is, when there is no break between a code
segment and the next case statement) must be commented for future
maintenance. A lint-style comment/directive is best. 

\begin{verbatim}
 switch (expr) {
         case ABC:
         case DEF:
                 statement;
                 break;
         case UVW:
                 statement;
                 /*FALLTHROUGH*/
         case XYZ:
                 statement;
                 break;
 } 
\end{verbatim}

 Here, the last break is unnecessary, but is required because it prevents a
fall-through error if another case is added later after the last one. The
default case, if used, should be last and does not require a break. 

 Whenever an if-else statement has more than one statement in the if or else
section, the statements of both the if and else sections should both be
enclosed in braces (called {\em fully} {\em bracketed} {\em syntax}). 

\begin{verbatim}
 if (expr) {
         statement;
 } else {
         statement;
         statement;
 } 
\end{verbatim}

An {\em if-else} with many {\em else} {\em if} statements should be
written with the {\em else} conditions left-justified.

\begin{verbatim}
 if (STREQ (reply, "yes")) {
         statements for yes
         ...
 } else if (STREQ (reply, "no")) {
         ...
 } else if (STREQ (reply, "maybe")) {
         ...
 } else {  
         statements for default
         ...
 } 
\end{verbatim}

The format then looks like a generalized {\em switch} statement and the
tabbing reflects the switch between exactly one of several alternatives
rather than a nesting of statements. 

The following code is very dangerous: 

\begin{verbatim}
 #ifdef CIRCUIT
 #       define CLOSE_CIRCUIT(circno)     { close_circ(circno); }
 #else
 #       define CLOSE_CIRCUIT(circno)
 #endif

         ...
         if (expr)
                 statement;
         else
                 CLOSE_CIRCUIT(x)
         ++i;
\end{verbatim}

Note that on systems where CIRCUIT is not defined the statement ``++i;'' will
only get executed when expr is false! This example points out both the value
of naming macros with CAPS and of making code fully-bracketed. 
\newpage
\section{Operators}

 Generally, all binary operators except `.' and `-$>$' should be
separated from their operands by blanks. Some judgement is called
for in the case of complex expressions, which may be clearer if the
``inner'' operators are not surrounded by spaces and the ``outer''
ones are. In addition, keywords that are followed by expressions in
parentheses should be separated from the left parenthesis by a blank.
(Sizeof is an exception.) Blanks should also appear after commas in
argument lists to help separate the arguments visually. On the other
hand,macro definitions with arguments must not have a blank between
the name and the left parenthesis. The C preprocessor requires the
left parenthesis to be immediately after the macro name or else the
argument list will not be recognized. Unary operators should not be
separated from their single operand. 

If you think an expression will be hard to read, consider breaking
it across lines. Splitting at the lowest-precedence operator near the
break is best. Since C has some unexpected precedence rules,
expressions involving mixed operators should be parenthesized. Too many
parenthesis, however, can make a line {\em harder} to read because
humans aren't good at parenthesis-matching. 

 There is a time and place for the binary comma operator, but generally it
should be avoided. The comma operator is most useful to provide multiple
initializations or operations, as in {\em for} statements. Complex
expressions, for instance those with nested ?: (ternary) operators,
can be confusing and should be avoided if possible. There are some
macros like getchar where both the ternary operator and comma operators
are useful. The logical expression operand before the ?: should be
parenthesized and both return values must be the same type. 
\newpage
\section{Naming Conventions}

 Individual projects will no doubt have their own naming conventions. There are
some general rules however.
\begin{itemize}
\item Names with leading and trailing underscores are reserved for system
purposes and should not be used for any user-created names. Most systems use
them for names that the user should not have to know. If you must have your
own private identifiers, begin them with a letter or two identifying the
package to which they belong.

\item \#define constants should be in all CAPS. 

\item Enum tags are Capitalized or in all CAPS 

\item Function, structure tag, typedef, and variable names should be in lower
case. 

\item Many macro ``functions'' are in all CAPS. Some macros (such as getchar and
putchar)  are in lower case since they may also exist as functions. Lower-case
macro names are
only acceptable if the macros behave like a function call, that is, they
evaluate their parameters exactly once and do not assign values to named
parameters. Sometimes it is impossible to write a macro that behaves like a
function even though the arguments are evaluated exactly once.

\item Avoid names that differ only in case, like foo and Foo. Similarly, avoid
foobar and foo\_bar. The potential for confusion is considerable. 
\end{itemize}

 In general, global names (including enums) should have a common prefix
identifying the module that they belong with. They may alternatively be grouped
in a global structure. Typedeffed names often have ``\_t'' appended to their
name. 

 Avoid names that might conflict with various standard library names. Some
systems will include more library code than you want. Also, your program may be
extended someday. 
\newpage
\section{Constants}

 Numerical constants should not be coded directly. Symbolic constants make the
code easier to change and easier to read. At the very least, any directly-coded
numerical constant must have a comment explaining the derivation of the value. 

 The \#define feature of the C preprocessor should be used to give constants
meaningful names. Defining the value in one place also makes it easier to
administer large programs since the constant value can be changed uniformly by
changing only the \#define. The enumeration data type is a better way to declare
variables that take on only a discrete set of values, since additional type
checking is often available. 

 Constants should be defined consistently with their use; e.g. use 540.0 for a
float instead of 540 with an implicit float cast. There are some cases where
the constants 0 and 1 may appear as themselves instead of as defines. For
example if a for loop indexes through an array, then 
\begin{verbatim}
 for (i = 0; i < ARYBOUND; i++) 
\end{verbatim}
is reasonable while the code 
\begin{verbatim}
 qval = opens(door[i], 7);
 if (qval == 0)
         error("can't open %s\n", door[i]);
\end{verbatim}
is not. In the last example qval is a pointer. When a value is a
pointer it should be compared to NULL instead of 0. NULL is available
either as part of the standard I/O library's header file {\em stdio.h}
or in {\em stdlib.h} for newer systems. Even simple values like 1 or 0
are often better expressed using defines like TRUE and FALSE
(sometimes YES and NO read better). 

 Simple character constants should be defined as character literals rather than
numbers. Non-text characters are discouraged as non-portable. If non-text
characters are necessary, particularly if they are used in strings, they
should be written using a escape character of three octal digits rather than
one (e.g. '$\backslash 007$'). Such usage should be considered machine-dependent
and treated as such.
\newpage
\section{Macros}

 Complex expressions can be used as macro parameters, and operator-precedence
problems can arise unless all occurrences of parameters have parentheses around
them. There is little that can be done about the problems caused by side
effects in parameters except to avoid side effects in expressions (a good idea
anyway) and, when possible, to write macros that evaluate their parameters
exactly once. There are times when it is impossible to write macros that act
exactly like functions. 

 Some macros also exist as functions (e.g., getc and fgetc). The macro should
be used in implementing the function so that changes to the macro will be
automatically reflected in the function. Care is needed when interchanging
macros and functions since functions pass their parameters by value whereas
macros pass their arguments by name substitution. Carefree use of macros
requires care when they are defined. 

 Macros should avoid using globals, since the global name may be covered by a
local declaration. Macros that change named parameters (rather than the storage
they point at) or may be used as the left-hand side of an assignment should
mention this in their comments. Macros that take no parameters but reference
variables, are long, or are aliases for function calls should be given an empty
parameter list, e.g., 

\begin{verbatim}
 #define OFF_A() (a_global+OFFSET)
 #define BORK() (zork())
 #define SP3() if (b) { av+=1; bv+=1; cv+=1; } 
\end{verbatim}

 Macros save function call/return overhead, but when a macro gets long, the
effect of the call/return becomes negligible, so a function should be used
instead. 

 In some cases it is appropriate to make the compiler insure that a macro is
terminated with a semicolon. 

\begin{verbatim}
 if (x==3)
        SP3();
 else
        BORK(); 
\end{verbatim}

If the semicolon is omitted after the call to SP3, then the else will
(silently!) become associated with the if in the SP3 macro. With the semicolon,
the else doesn't match any if! The macro SP3 can be written safely as 

\begin{verbatim}
 #define SP3() do { av+=1; bv+=1; cv+=1; } while (0)
\end{verbatim}

Writing out the enclosing do-while by hand is awkward and some compilers and
tools may complain that there is a constant in the ``while'' conditional. A
macro for declaring statements may make programming easier. 

\begin{verbatim}
 #ifdef lint
        static int ZERO;
 #else
 #      define ZERO 0
 #endif
 #define STMT(stuff )       do { stuff } while (ZERO) 
\end{verbatim}

Declare SP3 with 

\begin{verbatim}
 #define SP3()       STMT( if (bool) { av+=1; bv+=1; cv+=1; } )
\end{verbatim}

Using STMT will help prevent small typos from silently changing programs. 

Except for hacks such as the above, macros should contain keywords
only if the entire macro is surrounded by braces. 
\newpage
\section{Debugging}

 If you use enums, the first tag should have a non-zero value, or the first tag
should indicate an error. 
\begin{verbatim}
 enum { STATE_ERR, STATE_START, STATE_NORMAL, STATE_END } state_t;
 enum {VAL_NEW=1, VAL_NORMAL, VAL_DYING, VAL_DEAD } value_t; 
\end{verbatim}
Uninitalized values will then often ``catch themselves''. 

 Check for error return values, even from functions that ``can't'' fail.
Consider that close() and fclose() can and do fail, even when all prior file
operations have succeeded. Write your own functions so that they test for
errors and return error values or abort the program in a well-defined way.
Include a lot of debugging and error-checking code and leave most of it in 
the finished product. Check even for ``impossible'' errors. [8] 

 Use the assert facility to insist that each function is being passed
well-defined values, and that intermediate results are well-formed. 

 Build in the debug code using as few \#ifdefs as possible. For instance, if
``mm\_malloc'' is a debugging memory allocator, then MALLOC will select the
appropriate allocator, avoids littering the code with \#ifdefs, and makes clear
the difference between allocation calls being debugged and extra memory that is
allocated only during debugging. 

\begin{verbatim}
 #ifdef DEBUG
 #       define MALLOC(size) (mm_malloc(size))
 #else
 #       define MALLOC(size) (malloc(size))
 #endif
\end{verbatim}

 Check bounds even on things that ``can't'' overflow. A function that writes on
to variable-sized storage should take an argument maxsize that is the size of
the destination. If there are times when the size of the destination is
unknown, some `magic' value of maxsize should mean ``no bounds checks''. When
bound checks fail, make sure that the function does something useful such as
abort or return an error status. 

\begin{verbatim}
 /*
  *      INPUT: A null-terminated source string `src' to copy from and
  *      a `dest' string to copy to. `maxsize' is the size of `dest'
  *      or UINT_MAX if the size is not known. `src' and `dest' must
  *      both be shorter than UINT_MAX, and `src' must be no longer than
  *      `dest'.
  *      OUTPUT: The address of `dest' or NULL if the copy fails.
  *      `dest' is modified even when the copy fails.
  */

        char *
 copy (dest, maxsize, src)
        char *dest, *src;
        unsigned maxsize;
 {
        char *retval = dest;

        while (*dest++ = *++src && maxsize-- > 0)
                ;               /* VOID */

        if (maxsize == 0)
                retval = NULL;

        return (retval);
 }
\end{verbatim}

 In all, remember that a program that produces wrong answers twice as fast is
infinitely slower. The same is true of programs that crash occasionally or
clobber valid data. 

{\em ``C Code. C code run. Run, code, run... PLEASE!!!'' ---  Barbara Toungue}
\newpage
\section{Conditional Compilation}

 Conditional compilation is useful for things like machine-dependencies,
debugging, and for setting certain options at compile-time. Beware of
conditional compilation. Various controls can easily combine in unforseen
ways. If you \#ifdef machine dependencies, make sure that when no machine is
specified, the result is an error, not a default. If you \#ifdef optimizations,
the default should be the unoptimized code rather than an uncompilable program.
Be sure to test the unoptimized code. 

 Put \#ifdefs in header files instead of source files when possible. Use the
\#ifdefs to define macros that can be used uniformly in the code. For instance,
a header file for checking memory allocation might look like (omitting
definitions for REALLOC and FREE): 

\begin{verbatim}
 #ifdef DEBUG
        extern char *mm_malloc();
 #      define MALLOC(size) (mm_malloc(size))
 #else
        extern char *malloc();
 #      define MALLOC(size) (malloc(size))
 #endif 
\end{verbatim}

 Conditional compilation should generally be on a feature-by-feature basis.
Machine or operating system dependencies should be avoided in most cases. 
\begin{verbatim}
 #ifdef BSD4
        long t = time(((long *)NULL);
 #endif 
\end{verbatim}
The preceding code is poor for two reasons: there may be 4BSD systems for which
there is a better choice, and there may be non-4BSD systems for which the above
is the best code. Instead, use define symbols such as TIME\_LONG and
TIME\_STRUCT and define the appropriate one in a configuration file such as
config.h. 
\newpage
\section{Portability}

{\em ``C combines the power of assembler with the portability of assembler.''
 --- Bill Thacker, misquoted by anonymous.}

 The advantages of portable code are well known. This section gives some
guidelines for writing portable code. Here, ``portable'' means that a source
file can be compiled and executed on different machines with the only change
being the inclusion of possibly different header files and the use of different
compiler flags. The header files will contain \#defines and typedefs that may
vary from machine to machine. In general, a new ``machine'' is different
hardware, a different operating system, a different compiler, or any
combination of these. Reference [1] contains useful information on both style
and portability. The following is a list of pitfalls to be avoided and
recommendations to be considered when designing portable code: 
\begin{itemize}
\item Write portable code first, worry about detail optimizations only on
machines where they  prove necessary. Optimized code is often obscure.
Optimizations for one machine may  produce worse code on another. Document
performance hacks and localize them as much  as possible. Documentation should
explain how it works and why it was needed (e.g.,  ``loop executes 6 zillion
times''). 

\item Recognize that some things are inherently non-portable. Examples are code
to deal with particular hardware registers such as the program status word,
and code that is designed to support a particular piece of hardware, such as
an assembler or I/O driver. Even in these cases there are many routines and
data organizations that can be made machine independent.

\item Organize source files so that the machine-independent code and the
machine-dependent code are in separate files. Then if the program is to be
moved to a new machine, it is a much easier task to determine what needs to be
changed. Comment the machine dependence in the headers of the appropriate
files.

\item Any behavior that is described as ``implementation defined'' should be
treated as a machine (compiler) dependency. Assume that the compiler or
hardware does it some completely screwy way. 

\item Pay attention to word sizes. Objects may be non-intuitive sizes, Pointers
are not always the same size as ints, the same size as each other, or freely
interconvertible. The following table shows bit sizes for basic types in C
for various machines and compilers.

\vspace{.25in}
\begin{center}
\begin{tabular}{|l|l|l|l|l|l|l|l|}
\hline
type & pdp11 & vax & 68000 & Cray-2 & Unisys & Harris & 80386 \\
 & series & & family & & 1100 & H800 &  \\ \hline
char & 8 & 8 & 8 & 8 & 9 & 8 & 8 \\
short & 16 & 16 & 8/16 & 64(32) & 18 & 24 & 8/16 \\
int & 16 & 32 & 16/32 & 64(32) & 36 & 24 & 16/32 \\
long & 32 & 32 & 32 & 64 & 36 & 48 & 32 \\
char* & 16 & 32 & 32 & 64 & 72 & 24 & 16/32/48 \\
int* & 16 & 32 & 32 & 64(24) & 72 & 24 & 16/32/48 \\
int(*) & 16 & 32 & 32 & 64 & 576 & 24 & 16/32/48 \\ \hline
\end{tabular}
\end{center}
\vspace{.25in}

 Some machines have more than one possible size for a given type. The size you
get can depend both on the compiler and on various compile-time flags. The
following table shows ``safe'' type sizes on the majority of systems. Unsigned
numbers are the same bit size as signed numbers. 

\vspace{.25in}
\begin{center}
\begin{tabular}{|l|c|c|}
\hline
Type & Minimum & No Smaller \\
 & \# Bits & Than \\
\hline
char & 8 &  \\
short & 16 & char \\
int & 16 & short \\
long & 32 & int \\
float & 24 &  \\
double & 38 & float \\
any * & 14 &  \\
char * & 15 & any * \\
void * & 15 & any * \\
\hline
\end{tabular}
\end{center}
\vspace {.25in}

\item The void* type is guaranteed to have enough bits of precision to hold a
pointer to any data  object. The void(*)() type is guaranteed to be able to
hold a pointer to any function. Use  these types when you need a generic
pointer. (Use char* and char(*)(), respectively, in  older compilers). Be sure
to cast pointers back to the correct type before using them. 

\item Even when, say, a void* and a char* are the same {\em size}, they may
have different {\em formats}.  For example, the following will fail on some
machines that have sizeof(int*) equal to  sizeof(char*). The code fails
because free expects a char* and gets passed an int*. 
\begin{verbatim}
 int *p = (int *) malloc (sizeof(int));
 free (p);
\end{verbatim}

\item Note that the {\em size} of an object does not guarantee the
{\em precision} of that object. The Cray-2 may use 64 bits to store an int,
but a {\em long} cast into an
int and back to a long may be truncated to 32 bits. 

\item The integer constant zero may be cast to any pointer type. The resulting
pointer is called a {\em null pointer} for that type, and is different from any
other pointer of that type. A null pointer always compares equal to the
constant zero. A null pointer might {\em not} compare equal with a variable
that has the value zero. Null pointers are {\em not} always stored with all
bits zero. Null pointers for two different types are sometimes different. A
null pointer of one type cast in to a pointer of another type will be cast in
to the null pointer for that second type. 

\item On ANSI compilers, when two pointers of the same type access the same
storage, they will compare as equal. When non-zero integer constants are cast
to pointer types, they may become identical to other pointers. On non-ANSI
compilers, pointers that access the same storage may compare as different. The
following two pointers, for instance, may or may not compare equal, and they
may or may not access the same storage.
\begin{verbatim}
 ((int *) 2 )
 ((int *) 3 ) 
\end{verbatim}

If you need `magic' pointers other than NULL, either allocate some storage or
treat the pointer as a machine dependence. 

\begin{verbatim}
 extern int x_int_dummy;                /* in x.c */
 #define X_FAIL (NULL)
 #define X_BUSY (&x_int_dummy)

 #define X_FAIL (NULL)
 #define X_BUSY MD_PTR1                 /* MD_PTR1 from "machine.h" */ 
\end{verbatim}

\item Floating-point numbers have both a {\em precision} and a {\em range}.
These are independent of the size of the object. Thus, overflow (underflow)
for a 32-bit floating-point number will happen at different values on
different machines. Also, 4.99999999999 times 5.00000000001 will yield two
different numbers on two different machines. Differences in rounding and
truncation can give surprisingly different answers.

\item On some machines, a double may have {\em less} range or precision than
a float. 

\item On some machines the first half of a double may be a float with similar
value. Do {\em not} depend on this. 

\item Watch out for signed characters. On the VAX, for instance, characters are
sign extended when used in expressions, which is not the case on many other
machines. Code that assumes signed/unsigned is unportable. For example, a[c]
won't work if c is supposed to be positive and is instead signed and negative.
If you must assume signed or unsigned characters, comment them as SIGNED or
UNSIGNED.

\item Avoid assuming ASCII. If you must assume, document and localize. Remember
that characters may hold (much) more than 8 bits. 

\item Code that takes advantage of the two's complement representation of
numbers on most machines should not be used. Optimizations that replace
arithmetic operations with equivalent shifting operations are particularly
suspect. If absolutely necessary, machine-dependent code should be \#ifdeffed
or operations should be performed by \#ifdeffed macros. You should weigh the
time savings with the potential for obscure and difficult bugs when your code
is moved. 

\item In general, if the word size or value range is important, typedef
``sized'' types. Large programs should have a central header file which
supplies typedefs for commonly-used width-sensitive types, to make it easier
to change them and to aid in finding width-sensitive code. Unsigned types
other than unsigned int are highly compiler-dependent. If a simple loop
counter is being used where either 16 or 32 bits will do, then use int, since 
it will get the most efficient (natural) unit for the current machine. 

\item Data {\em alignment} is also important. For instance, on various
machines a 4-byte integer may start at any address, start only at an even
address, or start only at a multiple-of-four address. Thus, a particular
structure may have its elements at different offsets on different machines,
even when given elements are the same size on all machines. Indeed, a
structure of a 32-bit pointer and an 8-bit character may be 3 sizes on 3
different machines. As a corollary, pointers to objects may not be
interchanged freely; saving an integer through a pointer to 4 bytes
starting at an odd address will sometimes work, sometimes cause a core
dump, and sometimes fail silently (clobbering other data in the process).
Pointer-to-character is a particular trouble spot on machines which do
not address to the byte. Alignment considerations and loader peculiarities
make it very rash to assume that two consecutively-declared variables are
together in memory, or that a variable of one type is aligned appropriately
to be used as another type.

\item The bytes of a word are of increasing significance with increasing address
on machines  such as the VAX (little-endian) and of decreasing significance
with increasing address on  other machines such as the 68000 (big-endian).
Hence any code that depends on the  left-right orientation of bits in a word
deserves special scrutiny. Bit fields within structure  members will only be
portable so long as two separate fields are never concatenated and  treated as
a unit. [1,3] Actually, it is nonportable to concatenate {\em any} two
variables. 

\item There may be unused holes in structures. Suspect unions used for type
cheating.  Specifically, a value should not be stored as one type and retrieved
as another. An explicit tag field for unions may be useful. 

\item Different compilers use different conventions for returning structures.
This causes a  problem when libraries return structure values to code compiled
with a different compiler.  Structure pointers are not a problem. 

\item Do not make assumptions about the parameter passing mechanism, especially
pointer sizes and parameter evaluation order, size, etc. The following code,
for instance, is {\em very} nonportable.
\begin{verbatim}
        c = foo (*cp++, *cp++);

        char
 foo (c1, c2, c3)
        char c1, c2, c3;
 {
        char bar = *(&c1 + 1);
        return (bar);                  /* often won't return c2 */
 }
\end{verbatim}

 This example has lots of problems. The stack may grow up or down (indeed,
there need  not even be a stack!). Parameters may be widened when they are
passed, so a char might  be passed as an int, for instance. Arguments may be
pushed left-to-right, right-to-left, in  arbitrary order, or passed in
registers (not pushed at all). The order of evaluation may  differ from the
order in which they are pushed. One compiler may use several (incompatible)
calling conventions.

\item On some machines, the null character pointer ((char *)0) is treated the
same way as a  pointer to a null string. Do {\em not} depend on this. 

\item Do not modify string constants\footnote{Some libraries attempt to modify
and then restore read-only string variables. Programs sometimes won't port
because of these broken libraries. The libraries are getting better.}.
One particularly notorious (bad) example is 
\begin{verbatim}
 s = "/dev/tty??";
 strcpy (&s[8], ttychars); 
\end{verbatim}

\item The address space may have holes. Simply {\bf computing} the address of an
unallocated element in an array (before or after the actual storage of the
array) may crash the program.  If the address is used in a comparison,
sometimes the program will run but clobber data,  give wrong answers, or loop
forever. The only exception is that a pointer into an array of  objects may
legally point to the first element after the end of the array. This ``outside''
 pointer may not be dereferenced. 

\item Only the == and != comparisons are defined for all pointers of a given
type. It is only portable to use $<$, $<=$, $>$, or $>=$ to compare pointers
when they both point in to (or to the first element after) the same array. It is
likewise only portable to use arithmetic operators on pointers that both point
into the same array or the first element afterwards.

\item Word size also affects shifts and masks. The following code will clear
only the three right-most bits of an {\em int} on {\em some} 68000s. On other
machines it will also clear the upper two bytes. 
\begin{verbatim}
 x &= 0177770
\end{verbatim}
 Use instead 
\begin{verbatim}
 x &= ~07
\end{verbatim}
 which works properly on all machines\footnote{The or operator ( $|$ ) does not
have these problems, nor do bitfields.}.

\item Side effects within expressions can result in code whose semantics are
compiler-dependent, since C's order of evaluation is explicitly undefined in
most places. Notorious  examples include the following. 
\begin{verbatim}
 a[i] = b[i++];
\end{verbatim}
In the above example, we know only that the subscript into b has not been
incremented.  The index into a could be the value of i either before or after
the increment. 
\begin{verbatim}
 struct bar_t { struct bar_t *next; } bar;  bar->next = bar = tmp; 
\end{verbatim}
In the second example, the address of ``bar-$>$next'' may be computed before the
value is assigned to ``bar''. Compilers do differ. 

\item Be suspicious of numeric values appearing in the code (``magic numbers'').

\item Avoid preprocessor tricks. Tricks such as using /**/ for token pasting and
macros that rely on argument string expansion will break reliably. 
\begin{verbatim}
 #define FOO(string) (printf("string = %s",(string)))  ...  FOO(filename); 
\end{verbatim}
Will only sometimes be expanded to 
\begin{verbatim}
 (printf("filename = %s",(filename))) 
\end{verbatim}
Be aware, however, that tricky preprocessors may cause macros to break
{\em accidentally} on  some machines. Consider the following two versions
of a macro.
\begin{verbatim}
 #define LOOKUP(c)      (a['c'+(c)])         /* Sometimes breaks. */
 #define LOOKUP(chr)    (a['c'+(chr)])       /* Works. */
\end{verbatim}
The first version of LOOKUP can be expanded in two different ways and will
cause code to break mysteriously. 

\item Become familiar with existing library functions and defines.
(But not {\em too} familiar. The internal details of library facilities,
as opposed to their external interfaces, are subject to change without
warning. They are also often quite unportable.) You should not be writing
your own string compare routine, terminal control routines, or making your
own defines for system structures. ``Rolling your own'' wastes your time
and makes your code less readable, because another reader has to figure
out whether you're doing something special in that reimplemented stuff to
justify its existence. It also prevents your program from taking advantage
of any microcode assists or other means of improving performance of system
routines. Furthermore, it's a fruitful source of bugs. If possible, be aware
of the {\em differences} between the
common libraries (such as ANSI, POSIX, and so on). 

\item Use {\em lint}\/\footnote{{\em Lint} is not available on many systems.}.
It is a valuable tool for finding machine-dependent constructs as well as other
inconsistencies or program bugs that pass the compiler. If your compiler has
switches to turn on warnings, use them. 

\item Suspect labels inside blocks with the associated switch or goto outside
the block.

\item Wherever the type is in doubt, parameters should be cast to the
appropriate type. Always cast NULL when it appears in non-prototyped function
calls. Do not use function calls as a place to do type cheating. C has
confusing promotion rules, so be careful.

\item Use explicit casts when doing arithmetic that mixes signed and unsigned
values.

\item The inter-procedural goto, longjmp, should be used with caution. Many
implementations ``forget'' to restore values in registers. Declare critical
values as volatile if you can or comment them as VOLATILE.

\item Some linkers convert names to lower-case and some only recognize the first
six letters as unique. Programs may break quietly on these systems.

\item Beware of compiler extensions. If used, document and consider them as
machine dependencies. 

\item A program cannot generally execute code in the data segment or write in
to the code segment. Even when it can, there is no guarantee that it can do so
reliably. 
\end{itemize}
\newpage
\section{ANSI C}

 Modern C compilers support some or all of the ANSI proposed standard C. Write
code to run under standard C whenever possible and use features such as
function prototypes, constant storage, and volatile storage. Standard C
improves program performance by giving better information to optimizers.
Standard C improves portability by insuring that all compilers accept the same
input language and by providing mechanisms that try to hide machine
dependencies or emit warnings about code that may be machine-dependent.

\subsection{Compatibility}

 Write code that is easy to port to older compilers. For instance,
conditionally \#define new (standard) keywords such as const and volatile in a
global {\em .h} file. Standard compilers predefine the preprocessor symbol
\_\_STDC\_\_. The void* type is hard to get right simply, since some older
compilers understand void but not void*. It is easiest to create a new
(machine- and compiler- dependent) VOIDP type, usually char* on older
compilers.
\begin{verbatim}
 #ifdef __STDC__
        typedef void *VOIDP;
 #      define COMPILER_SELECTED
 #endif
 #ifdef A_TARGET
 #      define const
 #      define volatile
 #      define void int
        typedef char *VOIDP;
 #      define COMPILER_SELECTED
 #endif
 #ifdef ...
        ...
 #endif
 #ifdef COMPILER_SELECTED
 #      undef COMPILER_SELECTED
 #else
        { NO TARGET SELECTED! }
 #endif 
\end{verbatim}

\subsection{Formatting }

 The style for ANSI C is the same as for regular C, with two notable
exceptions: storage qualifiers and parameter lists. 

 Because const and volatile have strange binding rules, each const or volatile
object should have a separate declaration. 
\begin{verbatim}
 int const *s;          /* YES */
 int const *s, *t;      /* NO */ 
\end{verbatim}

 Prototyped functions merge parameter declaration and definition in to one
list. Parameters should be commented in the function comment. 
\begin{verbatim}
 /*
  *      `bp': boat trying to get in.
  *      `stall': a list of stalls, never NULL.
  *       returns stall number, 0 => no room.
  */
        int
 enter_pier (boat_t const *bp, stall_t *stall)
 {
        ... 
\end{verbatim}
\subsection{Prototypes}

 Function prototypes should be used to make code more robust and to make it run
faster. Unfortunately, the prototyped {\bf declaration}
\begin{verbatim}
 extern void bork (char c); 
\end{verbatim}
is incompatible with the {\bf definition}
\begin{verbatim}
        void
 bork (c)
        char c;
   ... 
\end{verbatim}
The prototype says that c is to be passed as the most natural type for the
machine, probably a byte. The non-prototyped (backwards-compatible) definition
implies that c is always passed as an int\footnote{Such automatic type
promotion is called widening. For older compilers, the
widening rules require that all char and short parameters are passed as ints
and that float parameters are passed as doubles.}.
If a function has promotable
parameters then the caller and callee must be compiled identically. Either
both must use function prototypes or neither can use prototypes. The problem
can be avoided if parameters are promoted when the program is designed. For
example, bork can be defined to take an int parameter. 

 The above declaration works if the definition is prototyped. 
\begin{verbatim}
        void
 bork (char c)
 {
        ... 
\end{verbatim}
Unfortunately, the prototyped syntax will cause non-ANSI compilers to reject
the program. 

It {\em is} easy to write external declarations that work with both
prototyping and with older compilers\footnote{Note that using PROTO
violates the rule ``don't change the syntax via macro substitution.''
It is regrettable that there isn't a better solution.}.
\begin{verbatim}
 #ifdef __STDC__
 #      define PROTO(x) x
 #else
 #      define PROTO(x) ()
 #endif

 extern char **ncopies PROTO((char *s, short times));
\end{verbatim}
Note that PROTO must be used with double parenthesis. 

In the end, it may be best to write in only one style (e.g., with prototypes).
When a non-prototyped version is needed, it is generated using an automatic
conversion tool.

\subsection{Pragmas}

Pragmas are used to introduce machine-dependent code in a controlled way.
Obviously, pragmas should be treated as machine dependencies. Unfortunately,
the syntax of ANSI pragmas makes it impossible to isolate them in
machine-dependent headers. 

Pragmas are of two classes. Optimizations may safely be ignored. Pragmas that
change the system behavior (``required pragmas'') may not. Required pragmas
should be \#ifdeffed so that compilation will abort if no pragma is selected. 

 Two compilers may use a given pragma in two very different ways. For instance,
one compiler may use ``haggis'' to signal an optimization. Another might use it
to indicate that a given statement, if reached, should terminate the program.
Thus, when pragmas are used, they must always be enclosed in machine-dependent
\#ifdefs. Pragmas must always be \#ifdefed out for non-ANSI compilers. Be sure
to indent the octothorpe (\#) on the \#pragma, as older preprocessors will halt
on it otherwise. 
\begin{verbatim}
 #if defined(__STDC__) && defined(USE_HAGGIS_PRAGMA)
         #pragma (HAGGIS)
 #endif
\end{verbatim}

``The `\#pragma' command is specified in the ANSI standard to have an arbitrary
implementation-defined effect. In the GNU C preprocessor, `\#pragma' first
attempts to run the game `rogue'; if that fails, it tries to run the game
`hack'; if that fails, it tries to run GNU Emacs displaying the Tower of
Hanoi; if that fails, it reports a fatal error. In any case, preprocessing
does not continue.'' --- {\em Manual for the GNU C preprocessor}
for GNU CC 1.34.
\newpage
\section{Special Considerations}

 This section contains some miscellaneous do's and don'ts. 
\begin{itemize}
\item Don't change syntax via macro substitution. It makes the program
unintelligible to all but the perpetrator. 

\item Don't use floating-point variables where discrete values are needed. Using
a float for a loop counter is a great way to shoot yourself in the foot.
Always test floating-point numbers as $<$= or $>$=, never use an exact
comparison (== or !=). 

\item Compilers have bugs. Common trouble spots include structure assignment and
bitfields.  You cannot generally predict which bugs a compiler has. You could
write a program that avoids all constructs that are known broken on all
compilers. You won't be able to write anything useful, you might still
encounter bugs, and the compiler might get fixed in the meanwhile. Thus, you
should write ``around'' compiler bugs only when you are forced to use a
particular buggy compiler.

\item Do not rely on automatic beautifiers. The main person who benefits from
good program style is the programmer him/herself, and especially in the early
design of handwritten algorithms or pseudo-code. Automatic beautifiers can
only be applied to complete, syntactically correct programs and hence are
not available when the need for attention to  white space and indentation is
greatest. Programmers can do a better job of making clear the complete visual
layout of a function or file, with the normal attention to detail of a careful
programmer (in other words, some of the visual layout is dictated by intent
rather than syntax and beautifiers cannot read minds). Sloppy programmers
should learn to be careful programmers instead of relying on a beautifier to
make their code readable.  Finally, since beautifiers are non-trivial programs
that must parse the source, a sophisticated beautifier is not worth the
benefits gained by such a program. Beautifiers are best for gross formatting
of machine-generated code. 

\item Accidental omission of the second ``='' of the logical compare is a
problem. Use explicit tests. Avoid assignment with implicit test.
\begin{verbatim}
 abool = bbool;
 if (abool) { ... 
\end{verbatim}
 When embedded assignment is used, make the test explicit so that it doesn't
get ``fixed''  later. 
\begin{verbatim}
 while ((abool = bbool) != FALSE) { ... 


 while (abool = bbool) { ... /* VALUSED */ 


 while (abool = bbool, abool) { ... 
\end{verbatim}

\item Comment explicitly variables that are changed out of the normal control
flow, or other code that is likely to break during maintenance. 

\item Modern compilers will put variables in registers automatically. Use the
register sparingly to indicate the variables that you think are most critical.
In extreme cases, mark the 2-4 most critical values as register and mark the
rest as REGISTER. The latter can be \#defined to register on those machines
with many registers.
\end{itemize}
\newpage
\section{Lint}

{\em Lint} is a C program checker [2] that examines C source files to detect and
report type incompatibilities, inconsistencies between function definitions and
calls, potential program bugs, etc. The use of {\em lint} on all programs is
strongly recommended, and it is expected that most projects will require
programs to use {\em lint} as part of the official acceptance procedure. 

It should be noted that the best way to use {\em lint} is not as a barrier that
must be overcome before official acceptance of a program, but rather as a tool
to use during and after changes or additions to the code. {\em Lint} can find
obscure bugs and insure portability before problems occur. Many messages from
{\em lint} really do indicate something wrong. One fun story is about is about
a program that was missing an argument to `fprintf'.
\begin{verbatim}
 fprintf ("Usage: foo -bar <file>"); 
\end{verbatim}
The {\em author} never had a problem. But the program dumped core every time an
ordinary user made a mistake on the command line. Many versions of {\em lint}
will catch this.

The -h, -p, -a, -x, and -c options are worth learning. All of them will
complain about some legitimate things, but they will also pick up many botches.
Note that -p checks function-call type-consistency for only a subset of Unix
library routines, so programs should be linted both with and without -p for the
best ``coverage''.

{\em Lint} also recognizes several special comments in the code. These comments
both shut up {\em lint} when the code otherwise makes it complain, and they also
document special code. 
\newpage
\section{Make}

 One other very useful tool is {\em make} [7]. During development, {\em make}
recompiles only those modules that have been changed since the last time
{\em make} was used. Some common conventions include: 
\vspace{.25in}

\begin{tabular}{r@{--}l}
all & always makes all binaries\\
clean & remove all intermediate files\\
debug & make a test binary 'a.out' or 'debug' \\
depend & make transitive dependencies \\
install & install binaries \\
lint & run lint \\
print/list & make a hard copy of all source files \\
shar & make a shar of all source files \\
spotless & make clean, use revision control to put away sources.  \\
 & Note: doesn't remove Makefile, although it is a source file \\
sources & undo what spotless did \\
tags & run ctags, (using the -t flag is suggested) \\
rdist & distribute sources to other hosts \\
{\em file.c} & check out the named file 
\end{tabular}
\vspace{.25in}

In addition, command-line defines can be given to define either Makefile values
(such as ``CFLAGS'') or values in the program (such as ``DEBUG''). 
\newpage
\section{Project Dependent Standards}

 Individual projects may wish to establish additional standards beyond those
given here. The following issues are some of those that should be addressed by
each project program administration group. 
\begin{itemize}
\item What additional naming conventions should be followed? In particular,
systematic prefix conventions for functional grouping of global data and also
for structure or union member names can be useful. 

\item What kind of include file organization is appropriate for the project's
particular data hierarchy? 

\item What procedures should be established for reviewing {\em lint} complaints?
A tolerance level needs to be established in concert with the {\em lint} options
to prevent unimportant complaints from hiding complaints about real bugs or
inconsistencies.

\item If a project establishes its own archive libraries, it should plan on
supplying a lint library file [2] to the system administrators. The lint
library file allows {\em lint} to check for compatible use of library
functions. 

\item What kind of revision control needs to be used? 
\end{itemize}
\newpage
\section{Conclusion}

 A set of standards has been presented for C programming style. Among the most
important points are: 
\begin{itemize}
\item The proper use of white space and comments so that the structure of the
program is evident from the layout of the code. The use of simple
expressions, statements, and functions so that they may be understood
easily. 

\item To keep in mind that you or someone else will likely be asked to modify
code or make it run on a different machine sometime in the future. Craft code
so that it is portable to obscure machines. Localize optimizations since they
are often confusing and may be ``pessimizations'' on other machines. 

\item Many style choices are arbitrary. Having a style that is consistent
(particularly with group standards) is more important than following absolute
style rules. Mixing styles is worse than using any single bad style. 

\end{itemize}

 As with any standard, it must be followed if it is to be useful. If you have
trouble following any of these standards don't just ignore them. Talk with
your local guru, or an experienced programmer at your institution. 

\newpage
\appendix
\section
{References}
\begin{enumerate}
\item B.A. Tague, {\em C Language Portability}, Sept 22, 1977. This document
issued by department 8234 contains three memos by R.C. Haight, A.L. Glasser,
and T.L. Lyon dealing with  style and portability. 

\item S.C. Johnson, {\em Lint, a C Program Checker}, USENIX UNIX$\dagger$
Supplementary Documents,  November 1986. 

\item R.W. Mitze, {\em The 3B/PDP-11 Swabbing Problem}, Memorandum for File, 1273- 
770907.01MF, September 14, 1977. 

\item R.A. Elliott and D.C. Pfeffer, {\em 3B Processor Common Diagnostic
Standards - Version 1},  Memorandum for File, 5514-780330.01MF, March 30, 1978. 

\item R.W. Mitze, {\em  An Overview of C Compilation of UNIX User Processes on
the 3B}, Memorandum for File, 5521-780329.02MF, March 29, 1978. 

\item B.W. Kernighan and D.M. Ritchie, {\em The C Programming Language},
Prentice-Hall, 1978. 

\item S.I. Feldman, {\em Make: A Program for Maintaining Computer Programs},
USENIX UNIX Supplementary Documents, November 1986. 

\item Ian Darwin and Geoff Collyer, Can't Happen or /* NOTREACHED */ or Real
Programs Dump Core, USENIX Association Winter Conference, Dallas 1985
Proceedings. 

\item Brian W. Kernighan and P. J. Plaugher {\em The Elements of Programming
Style}, McGraw-Hill, 1974.

\item J. E. Lapin, {\em Portable C and Unix System Programming},
Prentice-Hall, 1987.
\end{enumerate}


$\dagger$ UNIX is a trademark of Bell Laboratories. 
\newpage
\section{Stuff to Remember}
\begin{center}
{\Large The Ten Commandments for C Programmers}\\
\vspace {.25in}
Henry Spencer\\
\end{center}
\begin{enumerate}
\item Thou shalt run lint frequently and study its pronouncements with care, for
verily its perception and judgement oft exceed thine. 

\item Thou shalt not follow the NULL pointer, for chaos and madness await thee
at its end. 

\item Thou shalt cast all function arguments to the expected type if they
are not of that type already, even when thou art convinced that this is
unnecessary, lest they take cruel vengeance upon thee when thou least expect
it. 

\item If thy header files fail to declare the return types of thy library
functions, thou shalt declare them thyself with the most meticulous care, lest
grievous harm befall thy program. 

\item Thou shalt check the array bounds of all strings (indeed, all arrays),
for surely where thou  typest ``foo'' someone someday shall type
``supercalifragilisticexpialidocious''. 

\item If a function be advertised to return an error code in the event of
difficulties, thou shalt check for that code, yea, even though the checks
triple the size of thy code and produce aches in thy typing fingers, for if
thou thinkest ``it cannot happen to me'', the gods shall surely punish thee
for thy arrogance. 

\item Thou shalt study thy libraries and strive not to re-invent them without
cause, that thy code may be short and readable and thy days pleasant and
productive.

\item Thou shalt make thy program's purpose and structure clear to thy
fellow man by using the One True Brace Style, even if thou likest it not,
for thy creativity is better used in solving problems than in creating
beautiful new impediments to understanding. 

\item Thy external identifiers shall be unique in the first six characters,
though this harsh discipline be irksome and the years of its necessity stretch
before thee seemingly without end, lest thou tear thy hair out and go mad on
that fateful day when thou desirest to make thy program run on an old system. 

\item Thou shalt foreswear, renounce, and abjure the vile heresy which claimeth
that ``All the  world's a VAX'', and have no commerce with the benighted
heathens who cling to this  barbarous belief, that the days of thy program may
be long even though the days of thy current machine be short. 
\end{enumerate}

--  
pardo at cs.washington.edu \\
{rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo 

\end{document}



More information about the Comp.lang.c mailing list