Notes on Writing Portable Programs in C: part1.tex
George V. Reilly
gvr at cs.brown.edu
Fri Nov 30 18:34:10 AEST 1990
% You must concatentate part1.tex and part2.tex together to form
% portableC.tex
% remove [portableC] from the \documentstyle command below if you
% prefer the old format. However, be sure to somehow include the
% section marked `% incorporate any additional commands I find necessary'.
\documentstyle[portableC]{article}
\pagestyle{headings}
\begin{document}
\bibliographystyle{alpha}
% The number between brackets is the minor revision number which
% must be removed when we finally agree on the contents.
\title{{\bf Notes on Writing\\Portable Programs in C}\\
{\small (Nov 1990, 8th Revision)}
}
\author{A. Dolenc%
\protect\thanks{Internet: \id{ado at sauna.hut.fi}.}
\\ A. Lemmke \\
{\em Helsinki University of Technology} \\
D. Keppel%
\protect\thanks{Internet: \id{pardo at cs.washington.edu}.} \\
{\em CS\&E, University of Washington} \\
{\normalsize and} \\
G. V. Reilly%
\protect\thanks{Internet: \id{gvr at cs.brown.edu}.} \\
{\em Dept.\ of Computer Science, Brown University}
}
\maketitle
{
\abstract
\parskip=4pt plus 1pt
\parindent=0pt
This documents describes the features and non-features of
different C~preprocessors, compilers, and environments. As such,
it is an incomplete document, growing as information is gathered.
It contains some material concerning ANSI~C but it is not a
substitute for the Standard itself.
We assume the reader is familiar with the C~programming language.
\endabstract
}
\pagebreak
\tableofcontents
\pagebreak
\parskip=4pt plus 1pt
\parindent=0pt
\raggedbottom
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
\section{Foreword}
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
We will call a program {\em portable\/} if adapting it to a new
environment is easier than rewriting it for that environment.
This document is mainly for those who have {\em never\/} ported
a program to another platform --- a specific hardware and
software environment --- and, evidently, for those who plan to
write large systems which must be used across different vendor
machines. If you have already done some porting, you may not
find the information herein very useful.
We suggest that \cite{style} be read in conjunction with this
document.\footnote{\cite{style} can be obtained via {\em
anonymous FTP\/} from \site{cs.washington.edu} in
\file{\twiddle{}ftp/pub/cstyle.tar.Z}\@.} Posters to the newsgroup
\ng{comp.lang.c} have repeatedly recommended \cite{MH} and
\cite{AK} (none of the information herein has been taken from
those two references).
{\bf Disclaimer:} We will attempt to keep the information herein
updated, but it can happen that some of it may be incorrect at
the time of reading. The code fragments presented are intended
to make applications ``more'' portable, meaning that they may
fail with some compilers and/or environments.
{\footnotesize
This document can be obtained via anonymous FTP from
\site{sauna.hut.fi} [130.233.251.253] in
\file{\twiddle{}ftp/pub/CompSciLab/doc}. The files
\file{portableC.tex}, \file{portableC.sty},
\file{portableC.bib}, and \file{portableC.ps.Z} are the \LaTeX\
source and style files, {\sc Bib}\TeX\ and the compressed {\sc PostScript},
respectively. Alternatively, there is a site in the US
from which one can obtain all four
files, \site{cs.washington.edu} [128.95.1.4] in
\file{\twiddle{}ftp/pub/cport.tar.Z}\@. All files are in the
public domain. Comments, suggestions, flames, eggs, and requests
for copies via e-mail should be directed to
\id{ado at sauna.hut.fi}.
}
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
\section{Introduction}
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
The aim of this document is to collect the experience of several
people who have had to write and/or port programs written in~C
to more than one platform.
In order to keep this document within reasonable bounds, we must
restrict ourselves to programs which must execute under
Unix-like operating systems and those which implement a
reasonable Unix-like environment. The only exception we will
consider is VMS\@.
A wealth of information can be obtained from programs that have
been written to run on several platforms. This is the case of
publicly available software such as that developed by the Free
Software Foundation and the MIT X~Consortium.
When discussing portability, one focuses on two issues:
\begin{description}
\item[The language,]
which includes the preprocessor and the syntax and the semantics
of the language.
\item[The environment,]
which includes the location and contents of header files and the
run-time library.
\end{description}
We include in our discussions the standardization efforts upon
the language and the environment. Special attention will be
given to floating-point representations and arithmetic, to
limitations of specific compilers, and to VMS\@.
Our main focus will be {\em boiler-plate\/} problems. Systems
programming, \e.g. raw I/O from terminals, and twisted code
associated with bizarre interpretations of \cite{ansi} ---
henceforth referred to as the Standard --- are not extensively
covered in this document.\footnote{We regard this document as a
living entity growing as needed and as information is gathered.
Future versions of this document may contain a lot of such
information.}
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
\section{Standardization Efforts}
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
All standards have a good side and an evil side. Due to the
nature of this document, we are forced to focus our attention on
the latter.
The American National Standards Institute (ANSI) has recently
approved of a standard for the C~programming language
\cite{ansi}. The Standard concentrates on the syntax and
semantics of the language and specifies a minimum environment
(the name and contents of some header files and the
specification of some run-time library functions).
Copies of the ANSI~C Standard (ANSI X3.159--1989) can be
obtained from the following address:
{\small
\begin{center}
\begin{tabular}{l}
American National Standards Institute\\
Sales Department\\
1430 Broadway\\
New York, NY 10018\\
(Voice) (212) 642--4900\\
(Fax) (212) 302--1286\\
\end{tabular}
\end{center}
}
%=============================================================================
\subsection{ANSI~C}
%=============================================================================
%-----------------------------------------------------------------------------
\subsubsection{Translation Limits}
%-----------------------------------------------------------------------------
We first bring to the reader's attention the fact that the
Standard states some environmental limits. These limits are {\em
lower bounds}, meaning that a correct (compliant) compiler may
refuse to compile an otherwise-correct program that exceeds one
of those limits.\footnote{Maybe there {\em are\/} people out
there who still write compilers in FORTRAN after all\ldots.}
Below are the limits that we judge to be the most important. The
ones related to the preprocessor are listed first.
\begin{itemize}
\item
{\em 8~nesting levels of conditional inclusion.}
\item
{\em 8~nesting levels for \<\#include>d files.}
\item
{\em 32~nesting levels of parenthesized expressions
within a full expression.} This will probably occur when using
macros.
\item
{\em 1024~macro identifiers simultaneously.} Can happen
if one includes too many header files.
\item
{\em 509~characters in a logical source line.}
This is a serious restriction if it applies {\em after\/}
preprocessing. Since a macro expansion always results in one
line, this affects the maximum size of a macro. It is unclear
what the Standard means by a logical source line in this context
and in most implementations this limit will probably apply {\em
before\/} macro expansion.
\item
{\em 6~significant initial characters in an external
identifier.} Usually this constraint is imposed by the
environment, \e.g. the linker, and not by the compiler.
\item
{\em 127~members in a single structure or union.}
\item
{\em 31~parameters in one function call.} This may cause
trouble with functions that accept a variable number of
arguments. Therefore, it is advisable that when designing such
functions that either the number of parameters be kept within
reasonable bounds or that alternative interfaces be supplied,
\e.g. using arrays.
\end{itemize}
It is really unfortunate that some of these limits may force a
programmer to code in a less elegant way. We are of the opinion
that the remaining limits stated in the Standard can usually be
obeyed if one follows ``good'' programming practices.
However, these limits may break programs that {\em generate\/}
C~code such as compiler-compilers and many \C++~compilers.
%-----------------------------------------------------------------------------
\subsubsection{Unspecified and Undefined Behavior}
%-----------------------------------------------------------------------------
The following are examples of unspecified and undefined
behavior:
\begin{enumerate}
\item
The order in which the function designator and the arguments
in a function call are evaluated.
\item
The order in which the preprocessor concatenation operators
\<\#> and \<\#\#> are evaluated during macro substitution.
\item
The representation of floating-point types.
\item
An identifier is used that is not visible in the current scope.
\item
A pointer is converted to something
other than an integral or pointer type.
\end{enumerate}
The list is long. One of the main reasons for explicitly
defining what is {\em not\/} covered by the Standard is to allow
the implementor of the C~environment to make use of the most
efficient alternative.
%=============================================================================
\subsection{POSIX}
%=============================================================================
% arl: We should order the release9 (10 ?) manual \ldots maybe LK does ?
The objective of the POSIX working group P1003.1 is to define a
common interface for Unix. Granted, the ANSI~C standard does
specify the contents of some header files and the behavior of
some library functions but it falls short of defining a useful
environment. This is the task of P1003.1.
We do not know how far P1003.1 addresses the problems presented
in this document as at the moment we lack proper documentation.
Hopefully, this will be corrected in a future release of this
document.
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
\section{Preprocessors}
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Preprocessors can behave differently in several ways. For those
who need them, there are good publicly available preprocessors
that are ANSI~C--compliant. One such preprocessor is the one
distributed with the X~Window System developed by the MIT
X~Consortium.
%=============================================================================
\subsection{Command Options}
%=============================================================================
The interpretation of the \<-I> command option can differ from
one system to another. Besides, it is not covered by the
Standard. For example, the directive \<\#include "dir/file.h">
in conjunction with \<-I..> would cause most preprocessors in a
Unix-like environment to search for \file{file.h} in
\file{../dir}, but under VMS, \file{file.h} is only searched for
in the subdirectory \file{dir} in the current working directory.
%=============================================================================
\subsection{\<\#pragma> and \<\#elif>}
%=============================================================================
Directives are very much the same in all preprocessors, except
that some preprocessors may not know about the \<defined>
operator in a \<\#if> directive nor about the \<\#pragma> and
\<\#elif> directives.
The \<\#pragma> directive should pose no problems even to old
preprocessors {\em if it comes indented}.\footnote{Old
preprocessors only take directives that begin with \<\#> in the
first column.} Furthermore, it is advisable to enclose them with
\<\#ifdef>s in order to document under which platform they make
sense:
\begin{verbatim}
#ifdef <platform-specific-symbol>
#pragma ...
#endif
\end{verbatim}
Beware of \<\#pragma> directives that alter the semantics of the
program and consider the case when they are not recognized by a
particular compiler. Evidently, if the behavior of the program
relies on their correct interpretation then, in order for the
program to be portable, all target platforms must recognize them
properly.
%=============================================================================
\subsection{Concatenation}
%=============================================================================
Concatenation of symbols has two variants. One is the old K\&R
\cite{KR1} style that simply relied on the fact that the
preprocessor substituted comments such as \</**/> for nothing.
Obviously, that does not result in concatenation if the
preprocessor includes a space in the output. The ANSI~C
Standard defines the operators \<\#\#> and (implicit)
concatenation of adjacent strings. Since both styles are a fact
of life it is useful to include the following in one's header
files:\footnote{Some have suggested using \<\#if \_\_STDC\_\_>
instead of simply \<\#ifdef \_\_STDC\_\_> to test if the
compiler is ANSI-compliant because of compilers that are {\em
not}, but define \<\_\_STDC\_\_> equal to zero.}
\begin{verbatim}
#ifdef __STDC__
# define GLUE(a,b) a##b
#else
# define GLUE(a,b) a/**/b
#endif
\end{verbatim}
If needed, one could define similar macros to \<GLUE> several
arguments.\footnote{\<GLUE(a,GLUE(b,c))> would not result in the
concatenation of \<a>, \<b>, and \<c>.}
%=============================================================================
\subsection{Token Substitution}
%=============================================================================
Some preprocessors perform token substitution within quotes
while others do not. Therefore, this is intrinsically
non-portable. The Standard disallows it but provides a mechanism
to obtain the same results. The following should work with
ANSI-compliant preprocessors or with the ones that perform token
substitution within quotes:
\begin{verbatim}
#ifdef __STDC__
# define MAKESTRING(s) # s
#else
# define MAKESTRING(s) "s"
#endif
\end{verbatim}
%=============================================================================
\subsection{Miscellaneous}
%=============================================================================
\begin{itemize}
\item
We would {\em not\/} trust the following to work on {\em all\/}
preprocessors:
\begin{verbatim}
#define D define
#D this that
\end{verbatim}
The Standard does not allow such a syntax (see~\S3.8.3 \P20 in
\cite{ansi}).
\item
Many preprocessors ignored, or still ignore, text after the
\<\#else>, \<\#elif>, and \<\#endif> directives. However, the
Standard forbids anything but comments after these directives.
\item
Some preprocessors will consider it an error to \<\#undef>
something that has not been \<\#define>d, although it is allowed
to do so.
\item
Finally, we must add that the Standard has fortunately included
a \<\#error> directive with obvious semantics. Indent the
\<\#error> since old preprocessors do not recognize it.
\end{itemize}
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
\section{The Language}
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
%=============================================================================
\subsection{The Syntax}
%=============================================================================
The syntax defined in the Standard is a {\em superset\/} of the
one defined in K\&R~\cite{KR1}. It follows that if one restricts
oneself to the former, there should be no problems with an
ANSI~C--compliant compiler {\em with respect to syntax}. The
{\em semantics\/} are, however, another problem altogether and
is covered superficially in the next section.
The Standard extends the syntax with the following:
\begin{enumerate}
\item
The inclusion of the keywords \<const>, \<enum>, \<signed>,
\<void>, and \<volatile>.
\item
The inclusion of additional constant suffixes to indicate their
type.
\item
The ellipsis (``\<...>'') notation to indicate a variable number
of arguments.
\item
Function prototypes.
\item
Trigraph notation for specifying otherwise-unobtainable
characters in restricted character sets.
\end{enumerate}
We encourage the use of the reserved words \<const> and
\<volatile> since they aid in documenting the code. It is
useful to add the following to one's header files if the code
must be compiled by a non-conforming compiler as well:
\begin{verbatim}
#ifndef __STDC__
# define const
# define volatile
#endif
\end{verbatim}
However, one must then make sure that the behavior of the
application does not depend on the presence of such keywords.
(Evidently, programs that contain identifiers with those names
must be modified to conform to the Standard.)
The trigraph notation can bring unexpected results when a
program is compiled by an ANSI-compliant compiler, \e.g. strings
such as~\<"??!"> will produce~\<"|">. Watch out!
%=============================================================================
\subsection{The Semantics}
%=============================================================================
The syntax does not pose any problem with regard to
interpretation because it can be defined precisely. However,
programming languages are always described using a natural
language, \e.g. English, and this can lead to different
interpretations of the same text.
Evidently, \cite{KR1} does not provide an unambiguous definition
of the C~language otherwise there would have been no need for a
standard. Although the Standard is much more precise, there is
still room for different interpretations in situations such as
\<f(p=\&a, p=\&b, p=\&c)>. Does this mean \<f(\&a,\&b,\&c)> or
\<f(\&c,\&c,\&c)>? Even ``simple'' cases such as \<a[i] =
b[i++]> are compiler-dependent \cite{style}.
As stated in the Introduction, we would like to exclude such
topics. The reader is instead directed to the Usenet newsgroups
\ng{comp.std.c} or \ng{comp.lang.c} where such discussions take
place and from where the above example was taken. {\em The
Journal of C~Language Translation}\footnote{Address is 2051,
Swans Neck Way, Reston, Virginia 22091, USA\@.} could, perhaps,
be a good reference. Another possibility is to obtain a
clarification from the Standards Committee and the address is:
{\small
\begin{center}
\begin{tabular}{l}
X3 Secretariat, CBEMA\\
311 1st St NW Ste 500\\
Washington DC, USA\\
\end{tabular}
\end{center}
}
Finally, we mention that a complete list of the differences
between ``ordinary''~C and ANSI~C can be found in the Second
Edition of~K\&R~\cite{KR2}. A slightly less up-to-date list can
also be found in~\cite{HS}.
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
\section{Unix Flavors: System~V and BSD}
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A long time ago (1969), Unix said ``{\tt papa}'' for the first
time at AT\&T (then called Bell Laboratories, or Ma Bell for the
intimate) on a PDP-7. Everyone liked Unix very much and its
widespread use we see today is probably due to the relative
simplicity of its design and of its implementation. (It is
written, of course, mostly in~C\@.)
However, these facts also contributed to everyone developing
their own dialect. In particular, the University of Berkeley at
California distribute the so-called BSD\footnote{Berkeley
Software Distribution} Unix whereas AT\&T now distribute (sell)
System~V Unix. All other versions of Unix are descendants of one
of these major dialects.
The differences between these two major flavors should not upset
most application programs. In fact, we would even say that most
differences are just annoying.
BSD~Unix has an enhanced signal handling capability and
implements sockets. However, {\em all\/} Unix flavors differ
significantly in their raw I/O interface (that is, the \<ioctl>
system call), and this should be avoided if possible.
The reader interested in knowing more about the past and future
of Unix can consult \cite{unix1,unix2}.
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
\section{Header Files}
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Many useful system header files are in different places in
different systems, or they define different symbols. We will
assume henceforth that the application has been developed on a
BSD-like Unix and must be ported to a System~V-like Unix or VMS
or a Unix-like system with header files that comply with the
Standard.
In the following sections, we show how to handle the most simple
cases that arise in practice. Some of the code that appears
below was derived from the header file \file{Xos.h} which is
part of the X~Window System distributed by MIT\@. We have added
changes, \e.g. to support VMS\@.
Many header files are unprotected in many systems, notably those
derived from BSD version~4.2 and earlier. By ``unprotected'' we
mean that an attempt to include a header file more than once
will either cause compilation errors (\e.g. due to recursive or
nested includes) or, in some implementations, warnings from the
preprocessor stating that symbols are being redefined. It is
good practice to protect header files.
%=============================================================================
\subsection{\file{ctype.h}}
%=============================================================================
\file{ctype.h} provides {\em almost\/} the same functionality
on all systems, except that some symbols must be renamed.
\begin{verbatim}
#ifdef SYSV
# define _ctype_ _ctype
# define toupper _toupper
# define tolower _tolower
#endif
\end{verbatim}
Under Sys~V, \<toupper> and \<tolower> are also defined and will
check the validity of their arguments and perform the conversion
only if necessary. Under BSD-derived systems, one must normally
remember to check the validity of the arguments. The following
solution might be acceptable to most:
\begin{verbatim}
#ifdef SYSV
# define TOUPPER(c) toupper(c)
#else /* !SYSV */
# define TOUPPER(c) (islower(c)?toupper(c):(c))
#endif
\end{verbatim}
{\em The definitions in \file{<ctype.h>} are not portable across
character sets.}
%=============================================================================
\subsection{\file{fcntl.h} and \file{sys/file.h}}
%=============================================================================
Many files that a BSD-like system expects to find in the
\file{sys} directory are placed in \file{/usr/include} in
System~V\@. Other systems, such as VMS, do not even have a
\file{sys} directory.\footnote{Under VMS, since a path such as
\file{<sys/file.h>} will evaluate to \file{sys:file.h}, it is
sufficient to equate the logical name \file{sys} to
\file{sys\$library}.}
The symbols used in the \<open> function call are defined in
different header files in the two types of systems:
\begin{verbatim}
#ifdef SYSV
# include <fcntl.h>
#else
# include <sys/file.h>
#endif
\end{verbatim}
In some systems, \e.g. BSD~4.3 and SunOS, it does not make a
difference which one is used because both define the \<O\_xxxx>
symbols.
%=============================================================================
\subsection{\file{errno.h}}
%=============================================================================
The semantics of the error number may differ from one system to
another and the list may differ as well (\e.g. BSD systems have
more error numbers than System~V). Some systems, \e.g. SunOS,
define the global symbol \<errno> which will hold the last error
detected by the run-time library. This symbol is not {\em
declared\/} in most systems, although it is required by the
Standard that such a symbol be defined (see~\S4.1.3 of
\cite{ansi}). It is, of course, available in all Unix
implementations.
The most portable way to print error messages is to use
\<perror>.
%=============================================================================
\subsection{\file{math.h}}
%=============================================================================
System~V has more definitions in this header file than BSD-like
systems. The corresponding library has more functions as well.
This header file is unprotected under VMS and Cray, and in that
case we must do it ourselves:
\begin{verbatim}
#if defined(CRAY) || defined(VMS)
# ifndef __MATH__
# define __MATH__
# include <math.h>
# endif
#endif
\end{verbatim}
%=============================================================================
\subsection{\file{strings.h} {\em vs.\ }\file{string.h}}
%=============================================================================
Some systems cannot be treated as System~V or BSD, but are
really special cases, as one can see in the following:
\begin{verbatim}
#ifdef SYSV
# ifndef SYSV_STRINGS
# define SYSV_STRINGS
# endif
#endif
#ifdef _STDH_ /* ANSI C Standard header files */
# ifndef SYSV_STRINGS
# define SYSV_STRINGS
# endif
#endif
#ifdef macII
# ifndef SYSV_STRINGS
# define SYSV_STRINGS
# endif
#endif
#ifdef vms
# ifndef SYSV_STRINGS
# define SYSV_STRINGS
# endif
#endif
#ifdef SYSV_STRINGS
# include <string.h>
# define index strchr
# define rindex strrchr
#else
# include <strings.h>
#endif
\end{verbatim}
As one can easily observe, System~V-like Unix systems use
different names for \<index> and \<rindex> and place them in
different header files. Although VMS supports better System~V
features, it must be treated as a special case.
%=============================================================================
\subsection{\file{time.h} and \file{types.h}}
%=============================================================================
When using \file{time.h}, one must also include \file{types.h}.
The following code does the trick:
\begin{verbatim}
#ifdef macII
# include <time.h> /* on a Mac II we need this one as well */
#endif
#ifdef SYSV
# include <time.h>
#else
# ifdef vms
# include <time.h>
# else
# ifdef CRAY
# ifndef __TYPES__ /* it is not protected under CRAY */
# define __TYPES__
# include <sys/types.h>
# endif
# else
# include <sys/types.h>
# endif /* of ifdef CRAY */
# include <sys/time.h>
# endif /* of ifdef vms */
#endif
\end{verbatim}
The above is not sufficient in order for the code to be portable
since the structure that defines time values is not the same in
all systems. Different systems have vary in the way \<time\_t>
values are represented. The Standard, for instance, only
requires that it be an arithmetic type. Recognizing this
difficulty, the Standard defines a function called \<difftime>
to compute the difference between two time values of type
\<time\_t>, and \<mktime> which takes a string and produces a
value of type \<time\_t>.
%=============================================================================
\subsection{\file{varargs.h} {\em vs.\ }\file{stdarg.h}}\label{varargsh}
%=============================================================================
In some systems the definitions in both header files are
contradictory. For instance, the following will produce
compilation errors, \e.g. under VMS:
\begin{verbatim}
#include <varargs.h>
#include <stdio.h>
\end{verbatim}
This is because \file{<stdio.h>} includes \file{<stdarg.h>}
which in turn redefines all the symbols (\<va\_start>,
\<va\_end>, etc.)\ in \file{<varargs.h>}. This is incorrect behavior
because Standard header files should not include other Standard header
files. Furthermore, the method used in \file{<varargs.h>}
for defining variadic functions is incompatible with the Standard
(see~\S\ref{ansic} for more information on variadic functions).
The solution we adopt
is to always include \file{<varargs.h>} last and not to define
in the same module both functions that use \file{<varargs.h>}
and functions that use the ellipsis notation.
%=============================================================================
\subsection{\file{sys/wait.h}}
%=============================================================================
This one is lacking in some systems (\e.g. Altos and Xenix).
HP-UX does define it but one must use macros to access the
fields of the \<wait struct>, instead of using the names of the
fields. The \<wait struct> uses bit-fields and if the platform
does not define it one must do it oneself and care must be taken
with respect to byte ordering (see {\bf Byte ordering} in~\S\ref{tp}).
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
\section{Run-time Library}
%+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
% System~V vs. BSD
% The Tektronix manual has some good stuff about this
% arl: o I think hpux manuals have too. hpux is a sysV based
% system which has nowadays lots of bsd features.
% o Sun is also sysV based 'all the goodies' from bsd
% implemented os. Mostly you can program with it like
% bsd or sysV or mixed \ldots it tries (?) to support both.
% o some X11 manuals might help, because X is 'portable'
% o 88open manuals & stuff. 88open is a consortium
% which describes portability of software & binaries
% between Motorola 88k based computers.
% o we should have here something about signals too ?
% the stuff is not so portable, but in extensive hacking
% you need signals \ldots I have some information of that.
% ado: Many functions have the same functionality in various systems
% but they differ on (i) the type of value they return and (ii)
% the setting of errno. E.g., printf&friends,rewind.
This section admittedly contains very little information if compared
to \cite{MH}. We direct the reader to that reference for more information.
Time and time again, it happens that the target platform does not have all
the library functions needed by a given application. This is particularly
true with mathematical functions. We would like to remind the reader
that the sources to 4.3BSD are publicly available, and may be obtained
at several sites, \e.g. \site{funic.funet.fi} [128.214.6.100] in
\file{\twiddle{}ftp/pub/bsd-sources}, the contents of which are cloned from
\site{uunet.uu.net}. Read the copyright notices before using them.
%=============================================================================
\subsection{Mathematical Functions}
%=============================================================================
%-----------------------------------------------------------------------------
\subsubsection{\<cbrt> and \<pow>}
%-----------------------------------------------------------------------------
\<cbrt(x)> evaluates the cube root of its argument, that
is,~$x^{1/3}$. \<pow(x,y)> evaluates~$x^y$. Some systems implement
neither of these, or just the latter. In that case, one can
define \<pow> as a function of \<exp> and \<log>, and if one has
\<pow> but not \<cbrt>, one can write the latter as a function
of the former:
\begin{verbatim}
#define pow(x,y) (exp(log(x)*(y)))
#define cbrt(x) (pow((x),1./3.))
\end{verbatim}
Thus defined, \<pow> only admits strictly positive arguments. If
the argument~\<x> is negative, then a result can be evaluated
if~\<y> is an integer and one must implement such a function
oneself (a predicate which determines if~\<y> is an integer is
usually not available).
The definitions given above are a ``poor man's'' solution to the
problem but acceptable in many situations. In order to obtain
numerically robust and accurate results one must investigate
other alternatives such as obtaining the source code for the
4.3BSD implementation via anonymous FTP as mentioned at the
beginning of this Section.
It should be mentioned that if the argument~\<y> is zero then
implementations differ on the result. The 4.3BSD implementation
returns always~$1.0$; others may return undefined values, flag
an error, or return not-a-number.
%-----------------------------------------------------------------------------
\subsubsection{\<rand>}
%-----------------------------------------------------------------------------
\<rand> returns a pseudo-random integer in the range
0 to~\<RAND\_MAX>, which is guaranteed only to be at least
32,767. Do not rely on \<rand> returning results over a
much wider range.
%=============================================================================
\subsection{Memory allocation and initialization}
%=============================================================================
%-----------------------------------------------------------------------------
\subsubsection{\<alloca>}
%-----------------------------------------------------------------------------
\<alloca(n)> allocates the amount of bytes specified by~\<n>
and returns a pointer to the allocated memory. This space is
--- for all practical purposes --- automatically deallocated
(freed) when the block scope
is exited. More specifically,
the storage is deallocated {\em no sooner\/} than the exit from
the block scope; the implementation is allowed to do the freeing at
function exit, upon the next call to \<alloca>, or at any other moment
deemed appropriate. The example below illustrates {\em incorrect\/}
usage of \<alloca>:
\begin{verbatim}
foo ()
{
char *sto;
{
sto = alloca (10);
use (sto); /* Correct. */
}
use (sto); /* Error: storage may have been freed. */
}
\end{verbatim}
Conceptually, the space is allocated on a stack, so allocation can
be as fast as just adjusting the stack pointer if the machine has one,
and several regions can be freed at once by simply readjusting the stack
pointer. However, it is hard to implement \<alloca> both portably and
efficiently.
\<alloca> is not available on all platforms and as such is not required
by the Standard. However,
there are public domain implementations that work in a wide variety of
cases, but which can be slow and which can delay freeing
arbitrarily\footnote{A public domain implementation of \<alloca>
can be obtained
from the Free Software Foundation (GNU); try \site{prep.ai.mit.edu}
in \file{\twiddle{}ftp/pub/gnu}.}.
Thus, while it is very desirable to use \<alloca> when it is
available, because of efficiency considerations, it is highly
recommended that the code be written so that \<malloc> and \<free> can
easily replace it, if and when necessary.
%-----------------------------------------------------------------------------
\subsubsection{\<bcopy> {\em vs.\ }\<memcpy> and \<memmove>}
%-----------------------------------------------------------------------------
\<bcopy(s1,s2,n)> copies the string~\<s1> into~\<s2>, whereas
\<memcpy(s1,s2,n)> copies~\<s2> into~\<s1>. \<bcopy> can be
found in BSD-like systems, and some implementations handle
overlapping strings, while others do not. \<memcpy> and
\<memmove> are implemented in the other camp (System~V);
\<memcpy> does not handle overlapping strings, whereas
\<memmove> does.
The normal solution is to use macros.
%-----------------------------------------------------------------------------
\subsubsection{\<bzero> {\em vs.\ }\<memset>}
%-----------------------------------------------------------------------------
\<bzero(s,n)> is equivalent to \<memset(s,0,n)>. The former is
implemented in BSD-like systems, whereas the latter is implemented in
System~V-like systems and is required by the Standard.
See also {\bf Initialization} in~\S\ref{misc}.
%-----------------------------------------------------------------------------
\subsubsection{\<malloc> and \<free>}
%-----------------------------------------------------------------------------
\<malloc> is available in all C~implementations and its behavior is
very well defined except in boundary conditions. Not all implementations
accept a zero-sized request. There are other minor differences such as
the return type being \<char~*> in some implementations and \<void~*>
in others.
In a similar vein, some implementations of \<free> do not accept
\<NULL> as an argument. Worse, though, is that some
implementations allowed the caller to use the pointer even
{\em after\/} it had been \<free>d so long as no other call to
\<malloc> was performed. Relying on such behavior is bad.
%-----------------------------------------------------------------------------
\subsubsection{\<realloc>}
%-----------------------------------------------------------------------------
\<realloc(sto,n)> takes a pointer to a region allocated with
\<malloc> and grows or shrinks the region so that it is of size~\<n>.
The return value from \<realloc> is a pointer to the resized storage;
if the storage was grown ``in place'', the return value is the same as
\<sto>.
If the region was moved, then the old contents are copied to the new
storage (if~\<n> is smaller than the old size, then only the
first~\<n> units are copied).
If the region is grown, the new storage at the end is uninitialized
and may contain garbage.
Under ANSI C:
\begin{itemize}
\item If \<sto == NULL>, then \<realloc> acts like \<malloc>.
\item If \<n == 0>, then \<realloc> acts like \<free>.
\item If \<sto == NULL> {\em and\/} \<n == 0>, the results are
undefined.
\end{itemize}
For non-ANSI versions of \<realloc>, specifying \<NULL> as the storage
or \<0>~as the new size causes undefined behavior.
Thus, it is recommended that portable programs,
{\em even those written in ANSI~C}, not use these features.
If it is necessary to rely on those features, use a macro or write
a function that can be configured to check for those cases
explicitly.
%=============================================================================
\subsection{Miscellaneous}
%=============================================================================
%-----------------------------------------------------------------------------
\subsubsection{\<scanf>}
%-----------------------------------------------------------------------------
\<scanf> can behave differently on different platforms because
its descriptions, including the one in the Standard, allows for
different interpretations under some circumstances. The most
portable input parser is the one you write yourself.
Some versions of the \<scanf> family modify and then restore arguments
which are string constants. These implementations cause problems when
string constants are placed in read-only memory (see {``String
constants''} in~\S\ref{misc}). If the string is actually a
constant, then some workaround is needed; usually a compiler flag
may be used to indicate that such constants should be placed in
writable memory instead. If such a flag is not available then the code
must be modified.
%-----------------------------------------------------------------------------
\subsubsection{\<setjmp> and \<longjmp>}
%-----------------------------------------------------------------------------
Quoting anonymously from \ng{comp.std.c}, ``pre-X3.159
implementations of \<setjmp> and \<longjmp> often did not meet
the requirements of the Standard. Often they didn't even meet
their own documented specs. And the specs varied from system to
system. Thus it is wise not to depend too heavily on the exact
standard semantics for this facility\ldots''.
In other words, it is not that you should {\em not\/} use them
but be careful if you do. Furthermore, the behavior of a
\<longjmp> invoked from a nested signal handler\footnote{That
is, a function invoked as a result of a signal raised during the
handling of another signal. See~\S4.6.2.1 \P15 in
\cite{ansi}.} is undefined.
Finally, the symbols \<\_setjmp> and \<\_longjmp> are only
defined under SunOS, BSD, and HP-UX\@. Some systems do not
implement \<setjmp> and friends at all.
%-----------------------------------------------------------------------------
\subsubsection{Signal Handling}
%-----------------------------------------------------------------------------
We would like to point out one problem when handling signals
generated by hardware, such as \<SIGFPE> and \<SIGSEGV>\@. There
are two possibilities on a normal exit from the signal handler:
(i)~the offending instruction is re-executed, or (ii)~it is not.
The first possibility may cause an infinite loop, and the only
portable solution is to \<longjmp> out of the signal handler.
More information about the Comp.unix.questions
mailing list