Notes on Writing Portable Programs in C: part2.tex

George V. Reilly gvr at cs.brown.edu
Fri Nov 30 18:35:43 AEST 1990


% You must concatenate part1.tex and part2.tex together to form
% portableC.tex before LaTeXing.

%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
\section{Using Floating-Point Numbers}
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

To say that the implementation of numerical algorithms that
exhibit the same behavior across a wide variety of platforms is
difficult, is an understatement. This section provides very
little help but we hope it is worth reading. Any additional
suggestions and information are {\em very much\/} appreciated as
we would like to expand this section.


%=============================================================================
\subsection{Machine Constants}
%=============================================================================

One problem when writing numerical algorithms is obtaining
machine constants. Typical values one needs are:

\begin{itemize}

\item
The radix of the floating-point representation.

\item
The number of digits in the floating-point significand expressed
in terms of the radix of the representation.

\item
The number of bits reserved for the representation of the exponent.

\item
The smallest positive floating-point number $\epsilon$ such that
$ 1.0 + \epsilon \neq 1.0$.

\item
The smallest non-vanishing normalized floating-point power of
the radix.

\item
The largest finite\footnote{Some representations have reserved
values for $+{\it inf}$ and $-{\it inf}$.} floating-point
number.

\end{itemize}

On Suns, they can be obtained in \file{<values.h>}.  The ANSI~C
Standard recommends that such constants be defined in the header
file \file{<float.h>}.

Suns and standards apart, these values are not always readily
available, \e.g. in Tektronix workstations running UTek. One
solution is to use a modified version of a program that can be
obtained from the network which is called \cmd{machar}.
\cmd{Machar} is described in \cite{machar} and can obtained by
anonymous FTP from the \id{netlib}.\footnote{Email (Internet)
address is \id{netlib at ornl.gov}. For more information, send a
message containing the line \<send index> to that address.}

It is straightforward to modify the C~version of \cmd{machar} to
generate a C~preprocessor file that can be included directly by
C~programs.

There is also a publicly available program called
\file{config.c\/} that attempts to determine many properties of
the C~compiler and machine that it is run on. It can generate
the ANSI~C header files \file{<float.h>} and \file{<limits.h>}
among other useful features. This program was submitted to
\ng{comp.sources.misc}.\footnote{The archive site of
\ng{comp.sources.misc} is \site{uunet.uu.net}.} The latest
version, 4.2, is available by FTP from \site{mcsun.eu.net} in
directory \file{misc} and is called \file{config42.c} (the next
version, 4.3, will be called \file{enquire.c}). Version~4.2 is
also distributed with \cmd{gcc}, where it is called
\file{hard-params.c}.

%=============================================================================
\subsection{Floating-Point Arguments}
%=============================================================================

In the days of K\&R {\cite{KR1}} one was ``encouraged'' to use
\<float> and \<double> interchangeably\footnote{In fact one wonders
why they even bothered to define two representations for
floating-point numbers considering the rules applied to them.}
since all expressions with such data types where always
evaluated using the \<double> representation --- a real
nightmare for those implementing efficient numerical algorithms
in~C\@. This rule applied, in particular, to floating-point
arguments and for most compilers around, it does not matter
whether one defines the argument as \<float> or \<double>.

According to the ANSI~C Standard, such programs will continue to
exhibit the same behavior {\em as long as one does not prototype
the function}. Therefore, when prototyping functions, make sure
that the prototype is included when the function definition is
compiled so the compiler can check if the arguments match.


%=============================================================================
\subsection{Floating-Point Arithmetic}
%=============================================================================

Be careful when using the \<==> and \<!=> operators to compare
floating-point types. Expressions such as
\begin{center}
\<if (\fe1 == \fe2)>
\end{center}
will seldom be satisfied due to {\em rounding errors}.  To get a
feeling about rounding errors, try evaluating the following
expression using your favorite C~compiler \cite{fparith}:
\[
10^{50} + 812 - 10^{50} + 10^{55} + 511 - 10^{55} = 812 + 511 = 1323
\]

Most computers will produce zero regardless of whether one uses
\<float> or \<double>. Although the {\em absolute error\/} is
large, the {\em relative error\/} is quite small and probably
acceptable for many applications.

It is rather better to use expressions such as
$\left| \fe1 - \fe2 \right| \leq K$ or
$\bigl| \left| {\fe1}/{\fe2} \right| - 1.0 \bigr| \leq K$
(if $\fe2 \neq 0.0$), where $0 < K < 1$ is a function of:
\begin{enumerate}
\item
The floating type, \e.g. \<float> or \<double>,
\item
the machine architecture (the machine constants defined in the
previous section), and
\item
the precision of the input values and the rounding errors
introduced by the numerical method used.
\end{enumerate}

Other possibilities exist and the choice depends on the application.

The development of reliable and robust numerical algorithms is a
very difficult undertaking. Methods for certifying that the
results are correct within reasonable bounds must usually be
implemented.  A reference such as \cite{NRC} is always useful.

\begin{itemize}

\item
Keep in mind that the \<double> representation does not
necessarily increase the {\em precision}. Actually, in some
implementations the precision decreases, but the {\em range\/}
increases.

\item
Do not use \<double> unnecessarily, since in many cases there is
a large performance penalty. Furthermore, there is no point in
using higher precision, if the additional bits that would be
computed are garbage anyway.  The precision one needs depends
mostly on the precision of the input data and the numerical
method used.

\end{itemize}


%=============================================================================
\subsection{Exceptions}
%=============================================================================

Floating-point exceptions (overflow, underflow, division by
zero, etc) are not signaled automatically in some systems. In
that case, they must be explicitly enabled.

{\em Always\/} enable floating-point exceptions, since they may
be an indication that the method is unstable. Otherwise, one
must be sure that such events do not affect the output.


%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
\section{VMS}
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

In this section, we will report some common problems encountered
when porting a C~program to a VMS environment and which we have
not mentioned previously.


%=============================================================================
\subsection{File Specifications}
%=============================================================================

Under VMS, one can use two flavors of command interpreters: DCL
and DEC/Shell. The syntax of file specifications under DCL
differs significantly from the Unix syntax.

Some C~run-time library functions in VMS that take file
specifications as arguments or return file specifications to the
caller, will accept an additional argument indicating which
syntax is preferred. It is useful to use these run-time library
functions via macros as follows:

\begin{verbatim}
#ifdef  VMS
#  ifndef VMS_CI        /* Which Command Interpreter to use */
#    define VMS_CI  0   /* 0 for DEC/Shell, 1 for DCL */
#  endif

#  define  Getcwd(buff,siz)   getcwd((buff),(siz),VMS_CI)
#  define  Getname(fd,buff)   getname((fd),(buff),VMS_CI)
#  define  Fgetname(fp,buff)  fgetname((fp),(buff),VMS_CI)

#else  /* !VMS */
#  define  Getcwd(buff,siz)   getcwd((buff),(siz))
#  define  Getname(fd,buff)   getname((fd),(buff))
#  define  Fgetname(fp,buff)  fgetname((fp),(buff))

#endif /* !VMS */
\end{verbatim}

More pitfalls await the unaware who accept file specifications
from the user or take them from environment values (\e.g. using
the \<getenv> function).


%=============================================================================
\subsection{Miscellaneous}
%=============================================================================

\begin{description}

\item[\<end>, \<etext>, \<edata>:]
these global symbols are not available under VMS\@.

\item[\<struct> assignments:]
VAX~C allows assignment of different types of \<struct>s if both types 
have the same size. {\em This is not a portable feature.}

\item[The system function:]
the \<system> function under VMS has the same {\em
functionality\/} as the Unix version, except that one must take
care that the command interpreter also provides the same
functionality. If the user is using DCL, then the application
must send a DCL-like command.

\item[The linker:]
what follows applies only to modules stored in
libraries.\footnote{This does not really belong in this
document, but whenever one is porting a program to a VMS
environment one is bound to come across this strange behavior
which can result in a lot of wasted time.} If none of the global
{\em functions\/} are explicitly used (referenced by another
module), then the module is not linked {\em at all}. It does not
matter whether one of the global {\em variables\/} is used. As a
side effect, the initialization of variables is not done.

The easiest solution is to force the linker to add the module
using the \cmd{/INCLUDE} command modifier. Of course, there is
the possibility that the command line may exceed
256~characters\ldots(*sigh*).

\end{description}


%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
\section{General Guidelines}
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


%=============================================================================
\subsection{Types and Pointers} \label{tp}
%=============================================================================

\begin{description}

\item[Type sizes:]
{\em Never\/} make any assumptions about the size of a given
type, especially pointers \cite{style}. Statements such as \<x
\&= 0177770> make implicit use of the size of~\<x>. If the
intention is to clear the lowest three bits, then it is best to
use \<x \&= \twiddle07>. The first alternative will also clear
the high-order 16~bits if~\<x> is 32~bits wide.

\item[Byte ordering:] \label{byteorder}
There are two possibilities for byte ordering: {\em
little-endian\/} and {\em big-endian\/} architectures. This
problem is illustrated by the code below:

\begin{verbatim}
   long int str[2] = {0x41424344, 0x0}; /* ASCII "ABCD" */
   printf ("%s\n", (char *)&str);
\end{verbatim}

A little-endian (\e.g. VAX) will print ``\<DCBA>'' whereas a
big-endian (\e.g. MC68000 microprocessors) will print
``\<ABCD>''\@. (As a side note, there is also {\em PDP-endian\/}
that would print ``\<BADC>'', followed by many smileys.)

Note: The example will only function correctly if \<sizeof(long~int)>
 is 32~bits. Although not portable, it serves well as an example for the
given problem.

\item[Alignment constraints:]
Beware of alignment constraints when allocating memory and using
pointers. Some architectures restrict the addresses that certain
operands may be assigned to (that is, addresses of the form~$2^kE$,
where~$k > 0$).  Code such as

\begin{verbatim}
   char *s = "bla"; /* allocated by compiler */
   int  *v = (int *)s;
\end{verbatim}

would most probably fail if the alignment constraints of \<int>
types are more strict than those of \<char> types (the usual
case for RISC architectures). The code would not fail due to
alignment constraints if the memory indicated by~\<s> had been
allocated by \<malloc> and friends.

\item[Pointer formats:] \cite{style}
Pointers to objects may have the same size but different
formats. This is illustrated by the code below:

\begin{verbatim}
   int *p = (int *) malloc(...); ... free(p);
\end{verbatim}

This code may malfunction in architectures where \<int~*> and
\<char~*> have different representations because \<free> expects
a pointer of the latter type.

Pointers to different types of objects may have different sizes as well.
For instance, there are platforms where a \<char~*> is larger than an
\<int~*> or where a pointer to a function
will not fit in, \e.g. \<char~*> or \<void~*> (although such cross-assignments
work on many platforms, \<void~*> is only guaranteed to be
large enough to hold a pointer to any {\em data\/} object). Therefore,
it is not portable to assign to an object of type \<void~*> a pointer
to a function. Pointers to functions are further discussed below.

\item[Pointers to functions]
If you need a generic function pointer, then use \<void(*)(void)>.
Be sure to cast the pointer back to the original type before using it.
That is, the type signature of the function pointer at the point that
the function is called must {\em exactly\/} match the type signature at
the point at which the function is defined.

For example, it is not possible to (portably) use \<varargs>
functions\footnote{There is a difference between variadic
functions defined by the Standard and the pre-Standard \<varargs>
as defined by \file{varargs.h} which is still widely used.
Here we are referring to the former, and the differences between
both are explored in~\S\ref{ansic}.}
(that is, functions that take a variable number of arguments) and
fixed-argument functions interchangeably, even if the overlapping types
match (that is, even if the first~$n$ arguments to the fixed-argument
function are the same as the first~$n$ arguments to the \<varargs>
function).
For instance, a function that is declared as having an integer as the first
argument and an optional (integer) second argument cannot be called as
a function that takes two integer arguments.
Similarly, \<varargs> functions of various type signatures cannot be
interchanged.
Such type cheating will break on systems that use different
conventions for calling fixed-argument and \<varargs> functions
and on systems that use different conventions for passing the fixed
and \<varargs> parts of the argument lists.

As a corollary, it is necessary that the definitions of
external variadic functions be available at the point of their
usage, \e.g. library functions such as \<printf>.

\item[Pointer operators:] \cite{style}
Only the operators \<==> and \<!=> are defined for all pointers
of a given type. The remaining comparison operators (\<<>,
\<<=>, \<\GT>, and \<\GE>) can only be used when both operands
point into the same array or to the first element after the
array. The same applies to arithmetic operators on
pointers.\footnote{One of the reasons for these rules is that in
some architectures, pointers are represented as a pair of values
and only equality is a well-defined operator for arbitrary pairs
of values. The other operators are only well-defined when one of
the values of both pairs is guaranteed to match, in which case
the situation is analogous to ``ordinary'' architectures.}

\item[\<NULL> pointer:]
{\em Never\/} redefine the \<NULL> symbol. The \<NULL> symbol
should always be the {\em constant\/} zero. A null pointer of a
given type will always compare equal to the {\em constant\/}
zero, whereas comparison with a {\em variable\/} with value zero
or to some non-zero constant has implementation-defined
behavior. (In other words, the constant zero has two meanings.)

A null pointer of a given type will always convert to a null
pointer of another type if implicit or explicit conversion is
performed.  (See `Pointer Operators' above.)

The contents of a null pointer may be anything the implementor
wishes, and dereferencing it may cause strange things to
happen\ldots.

\end{description}


%=============================================================================
\subsection{Compiler Differences}
%=============================================================================


%-----------------------------------------------------------------------------
\subsubsection{Conversion Rules}
%-----------------------------------------------------------------------------

In arithmetic expressions, integral types may be converted in
two ways: {\em unsigned-preserving\/} or {\em value-preserving}.
In the unsigned-preserving model, \<char>s, \<short>s, and
bit-fields are converted to \<unsigned int> or \<signed int> if
the original types have the modifiers \<unsigned> or \<signed>,
respectively.

The Standard determines that the value-preserving model must be
used, meaning that \<unsigned> values are promoted to \<signed
int>, or simply \<int>, if it can represent all the values of
the original type; otherwise it is converted to \<unsigned int>.
(See \S3.2 of the Standard.)

The following example illustrates the problem.  On a machine
with a 16-bit \<short~int>, and 32-bit \<int>, the code fragment

\begin{verbatim}
   unsigned short int x = 1;
   if (x < -1) printf ("unsigned-preserving");
   else printf ("value-preserving");
\end{verbatim}

prints \<unsigned-> or \<value-preserving> accordingly. Plenty
of other examples can be derived, such as initializing~\<x>
with~$2^{15}$ and using the predicate \<(x*x*2~\GT~0)>. The expression
\<x*x*2> would probably result in the same bit pattern in both
models but would cause arithmetic overflow in the
value-preserving model.

%-----------------------------------------------------------------------------
\subsubsection{Compiler Limitations}
%-----------------------------------------------------------------------------
% particularly IBM PC, GNU Compiler on Sun-4's, VMS compiler, etc.

In practice, much too frequently one runs into several, unstated
compiler limitations:

\begin{itemize}

\item
Some of these {\em limitations\/} are {\em bugs}. Many of these
bugs are in the optimizer and therefore when dealing with a new
environment it is best to explicitly disable optimization until
one gets the application ``going''.

\item
Some compilers cannot handle large modules or ``large''
statements.\footnote{Programs that generate other programs,
\e.g. \cmd{yacc}, can generate, for instance, very large
\<switch> statements.} Therefore, it is advisable to keep the
size of modules within reasonable bounds.  Besides, large
modules are more cumbersome to edit and understand.

\end{itemize}

% arl:	o	MSC has serious problem .. when you write big modules, or
%		yacc or other generator generates them, compiler can't
%		handle them ... ugh
%	o	MSC can't also handle big switch statements ... so if
%		you want to write big state machine ... tough luck

%-----------------------------------------------------------------------------
\subsubsection{ANSI~C}\label{ansic}
%-----------------------------------------------------------------------------

The Standard has introduced and officialized current practice,
but as we all know not many compilers conform to the Standard.
Among the features that are not yet widely supported, we mention
here only a few:

\begin{description}

\item[Constant suffixes:]
Many compilers allow for suffixes to be appended to constants,
such as~\<10L> to indicate a \<long> constant. The Standard
allows further typing of constants, such as~\<10UL> to indicate
an \<unsigned~long> constant.  However, multiple suffixes are
not supported by many compilers.

\item[New types:]
Besides the type \<void~*> which is mentioned in the next
section, the Standard has introduced the type \<long~double>.

\item[Variadic functions:]
Variadic functions, as defined by the
Standard, differ significantly from \file{<varargs.h>}.
Besides the ellipsis notation, it is required by the Standard
that the first argument be identified and that \file{<stdarg.h>}
be used instead (see~\S\ref{varargsh}). Therefore, it is not possible
to define a variadic function which takes no arguments.

\end{description}

%-----------------------------------------------------------------------------
\subsubsection{Miscellaneous}\label{misc}
%-----------------------------------------------------------------------------

\begin{description}

\item[\<char> types:]
When \<char> types are used in expressions, most implementations
will treat them as \<unsigned> {\em but there are many others
that treat them as\/ \<signed>} (\e.g. VAX~C and HP-UX).  It is
advisable to always cast \<char>s when they are used in
arithmetic expressions.

\item[Initialization:]
Do not rely on the initialization of \<auto> variables and
of memory returned by \<malloc>. In particular, since not all
\<NULL> pointers are represented by a bit pattern of all-zeroes,
it is good practice to always initialize pointers appropriately.

The \<calloc> library function returns an area of memory that
has been cleared to zero.  Although this can be used to
initialize arrays and \<struct>s on many architectures, not all
architectures represent \<NULL> pointers internally with a zero
bit-pattern.  Similarly, it is not safe to assume that all
architectures represent the floating-point constant~\<0.0> using
a zero bit-pattern.

The semantics of many library functions differ from system to
system.  Also, the specifications of some library functions have
been changed in the ANSI~C Standard.  For example, \<realloc> is
now required to behave like \<malloc> when called with a \<NULL>
argument; formerly, many implementations would dump core if
handed \<NULL>\@.

\item[Bit fields:]
Some compilers, \e.g. VAX~C, require that bit fields within
\<struct>s be of type \<int> or \<unsigned>. Furthermore, the
upper bound on the length of the bit field may differ among
different implementations.

\item[\<sizeof:>]
\begin{enumerate}
\item
The result of \<sizeof> may be \<unsigned> or \<signed>.
\item
If~\<p> is a pointer, then \<sizeof(*p)> is allowed by the
Standard and many compilers even if~\<p> does not contain a
valid address such as \<NULL>\@. However, some compilers
dereference the pointer causing programs to crash.
\end{enumerate}

\item[\<void> and \<void *>:]
Some very old compilers do not recognize \<void> [{\em sic\/}].
Although required by the Standard, some compilers recognize
\<void> but fail to recognize \<void~*>. The following code
might prove useful:

\begin{verbatim}
#if __STDC__
#  define  HAS_VOIDP
#endif
#ifdef HAS_VOIDP
   typedef void *voidp;
#else
   typedef char *voidp;
#endif
\end{verbatim}

\item[Functions as arguments:]
When calling functions passed as arguments, always dereference
the pointer. In other words, if~\<f> is a pointer to a function,
use~\<(*f)()> instead of simply~\<(f)()>, because some compilers
may not recognize the latter.

\item[String constants:]
Do not modify string constants since many implementations place
them in read-only memory. Furthermore, that is what the Standard
requires --- and that is how a constant should behave!

Note: In statements such as ``\<char~*s = "string">'', \<"string"> is
a string constant, whereas in ``\<char~s[] = "string"> it is not
and it is legal to modify~\<s>.

\item[\<struct> comparisons:]
Some compilers might allow for \<struct>s to be compared for
equality or inequality. Such an extension is not included in the
Standard (meaning it is not portable).

\item[Initialization of aggregates:]
Some compilers cannot initialize \<auto> aggregate types.
Statements such as:

\begin{verbatim}
{
   typedef struct {double x,y} Interval;
   Interval range = {0.0,0.0};
   ...
}
\end{verbatim}

are not allowed by some compilers unless the modifier \<static>
is used or if \<range> has file scope. Although declaring all
such variables \<static> would handle most situations, the most
portable solution is to add code that performs the
initialization.

\item[Nested comments:]
Nested comments were never allowed in the C~language, but they
are allowed by some compilers. Nested comments are used by some
to comment out source code containing comments. However, the
same effect can be obtained using an \<\#if~0> and \<\#endif>
pair.

\item[Shift operators:]
When shifting \<signed int>s right, the vacated bits might be
filled with zeroes or with copies of the sign bit. \<unsigned
int>s will be filled with zeroes.

\item[Division and remainder:]
When both operands are non-negative, then the remainder
is non-negative and smaller than the divisor; if not,
it is guaranteed only that the absolute value of the
remainder is smaller than the absolute value of the
divisor.	% See K&R II, p. 205

\end{description}


%=============================================================================
\subsection{Files}
%=============================================================================

%-----------------------------------------------------------------------------
\subsubsection{General Guidelines}
%-----------------------------------------------------------------------------

Remember that not all operating systems share Unix's simple
notion of a file as a stream of bytes.  MS-DOS, for instance,
has text files and binary files; it is important to open files
in the correct mode.  VMS has many different file types and each
file is viewed as being a collection of structured records. 
% RMS == Horror-Mess!

MS-DOS provides a ``poor man's'' implementation of pipes and
redirection.  It does not expand wildcards, however.  The user
must do the wildcard expansion using \<findfirst> and
\<findnext>.  Under VMS, the user must also expand wildcards, and
parse \<argv> for redirection directives manually.
% I think that all of the above is correct.  Actually, DECUS C can do
% redirection and wildcard expansion automatically, I think.

Different operating systems use widely different syntax to
specify pathnames.  This is a potential source of problems.
Some compilers may provide run-time pathname translation to
translate between Unix syntax and the host's syntax.

%-----------------------------------------------------------------------------
\subsubsection{Source Files}
%-----------------------------------------------------------------------------
\begin{itemize}

\item
Keep files reasonably small in order not to upset some
compilers.

\item
File names should not exceed 14~characters (many
System~V-derived system impose this limit, whereas in
BSD-derived systems a limit of~15 is usually the case).  In some
implementations this limit can be as low as 8~characters.  These
limits are often {\em not\/} imposed by the operating system but
by system utilities such as \cmd{ar}.

\item
Do not use special characters especially multiple dots (dots
have a very special meaning under VMS).

\end{itemize}


%=============================================================================
\subsection{Miscellaneous}
%=============================================================================

\begin{description}

\item[System dependencies:]
Isolate system-dependent code in separate modules and use
conditional compilation.

\item[Utilities:]
Utilities for compiling and linking such as \cmd{Make} simplify
considerably the task of moving an application from one
environment to another. Even better, use \cmd{Imake} since
\cmd{Make} files are very unportable. \cmd{Imake} is distributed
with the X~Window System by MIT\@. One of the authors of this
document has used it extensively with very good results.

Many of the tools and libraries that one takes for granted on
Unix, such as \cmd{lex}, \cmd{yacc}, \cmd{curses}, \cmd{sed},
\cmd{awk}, and the various shells, are often not available on
other operating systems.  Public-domain versions of most of the
useful tools are available at many archive sites.  However, the
so-called copyleft restrictions on many of these programs may
prove to be problematic to some would-be porters.
% I think, correct me if I'm wrong!

\item[Name space pollution:]
Minimize the number of global symbols in the application. One of
the benefits is the lower probability that any conflicts will
arise with system-defined functions.

\item[Character sets:]
Do not assume that the character set is ASCII\@.  If the character
set in question is not [American] English, then other characters
will also be alphabetic, and their lexicographic ordering will
not necessarily have any relationship to their positions within
the character set.  If the character set is Asian, then
``characters'' may be of type \<wchar\_t>, not \<char>, and
will, in general, require two or more bytes of storage each.
The library string functions should be capable of handling these
correctly.  Code that iterates through arrays of \<char>s may
need to be changed to handle multibyte characters correctly.

If the program's messages are likely to be translated into other
languages, take care to modularize the code for easy
translation.  Consider keeping all text in a ``language'' file.
Be aware that carefully formatted reports and printing routines
may need major surgery.

% Finns must know more about this stuff than most Anglophones!

\item[Binary Data:]
Great care must be taken when reading and writing binary data.
For example, a file of floating-point numbers in binary format
written by machine~$A$ is unlikely to be usable on machine~$B$.

\end{description}


%=============================================================================
\subsection{Writing Portable Code}
%=============================================================================

Write code under the assumption that it will be ported to many
strange machines.  It is considerably easier to port code to a
new environment when the code has been written with porting in
mind, than it is to ``retrofit'' portability.

One school of thought advocates ``Port early, port often.''
That is, whenever the code reaches a certain level of stability
on the development system, port it to other systems.  This
method has the advantage that portability problems are
discovered early, and the possible disadvantage that potentially
far more time could be spent in porting than would be the case
if the code were just ported once, when complete.

Code in ANSI~C whenever possible.  Many of the extensions ---
prototypes, stronger type-checking, etc.\ --- enhance
portability.  The more widely ANSI~C is used, the quicker it
will gain acceptance.  Of course, this may not be an option if
the code must be ported to platforms without ANSI~C compilers.
The short-term solution is to use the various tricks discussed
in~\cite{style} and elsewhere; the long-term solution is to
force vendors to release ANSI~C compilers for their systems.
Alternatively, a converter such as \cmd{protoize} (available
via anonymous FTP from \site{prep.ai.mit.edu}) can convert
between ANSI and non-ANSI programs.

Make complete, correct declarations; don't let parameters
default to \<int>.  Include all of the necessary header files.
Declare functions with no return value as \<void>.  Check the
results of system calls.

Use \cmd{lint}.  Programs that fail to pass \cmd{lint} quietly
will undoubtedly be difficult to port.  Compile code with as
many different compilers as possible with all warnings enabled.

\cite{style} has more to say about this.


%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
\section{Further Reading}
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

One can argue that portability and ``well-written'' code go
hand-in-hand.  Loosely defined, well-written code is one that is
``easy'' to understand {\em and\/} ``easy'' to maintain, and
there are several style guides in the public domain expressing
various views on the subject.

Besides the style guide mentioned in the foreword, there are a
few more that can be obtained in \site{cs.toronto.edu}
[128.100.1.65] in \file{\twiddle{}ftp/doc/programming}.  We also
recommend \file{standards.text} from the Free Software
Foundation which can be found in various sites, \e.g.
\site{prep.ai.mit.edu} [18.71.0.38] in
\file{\twiddle{}ftp/pub/gnu}.

For those who have access to the Usenet newsgroup
\ng{comp.lang.c}, we highly recommend reading the Frequently
Asked Questions List (known as the {\em FAQL\/}) which is posted
at the beginning of every month.


%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
\section{Acknowledgements}
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

We are grateful for the early help of A.~Louko (HTKK/Lsk) and
J.~Helminen (HTKK). The following persons have commented on and
corrected previous revisions of this document: Geoffrey
H.~Cooper and Guy Harris.  Special thanks go to Steven
Pemberton, the main author of \file{config.c}, for making
available such a useful tool.  We thank all the contributors to
the Usenet newsgroups \ng{comp.std.c} and \ng{comp.lang.c} from
where we have taken a lot of information. Some information
within was obtained from \cite{HP}\@.


%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
\section{Trademarks}
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

{
\footnotesize
DEC, PDP-7, VMS and VAX are trademarks of Digital Equipment Corporation. \\
HP is a trademark of Hewlett-Packard, Inc.\\
MC68000 is a trademark of Motorola.\\
{\sc PostScript} is a registered trademark of Adobe Systems, Inc.\\
Sun is a trademark of Sun Microsystems, Inc. \\
Unix is a registered trademark of AT\&T\@. \\
X Window System is a trademark of MIT\@.\\
}

\newcommand{\newblock}{}
\bibliography{portableC}

\end{document}



More information about the Comp.lang.c mailing list