libm in SunOS 4.0
David Hough
dgh at sun.com
Wed May 3 23:17:55 AEST 1989
In recent Sun-Spots, Peter Lamb has complained about libm in SunOS 4.0.
He's raised a number of interesting points. The following examines the
issues.
We'll repeat his timing experiments in a little simpler form. The
otherwise worthless "savage" benchmark happens to be ideal for the task at
hand, since its inner loop consists almost entirely of elementary
transcendental functions; I added two register declarations:
/*
* savage.c -- floating point speed and accuracy test. C version derived
* from BASIC version which appeared in Dr. Dobb's Journal, Sep. 1983, pp.
* 120-122.
*/
#define ILOOP 100000
#include <stdio.h>
extern double tan(), atan(), exp(), log(), sqrt();
main()
{
int i;
register double a=1, one=1;
for (i = 1; i <= (ILOOP - 1); i++)
a = tan(atan(exp(log(sqrt(a * a))))) + one;
printf("a-ILOOP = %g0, a - ILOOP);
exit(0); /* Better get in the habit of adding this! */
}
Here's some compile lines and timing results from a Sun-3/140:
[[ I removed "savage.c" from each compile line to make the table fit in 80
columns. --wnl ]]
SunOS Compile line a.out residual meets
time a-ILOOP SVID?
seconds
3.5 cc -O4 -f68881 -lm 26 -1.34482e-06 no
3.5 cc -O4 -f68881 /usr/lib/f68881.il -lm 19 -1.34482e-06 no
4.0 cc -O4 -f68881 -lm 153 -1.34482e-06 yes
4.0 cc -O4 -f68881 /usr/lib/f68881/libm.il 17 -1.34482e-06 no
4.0 cc -O4 -f68881 math.S 19 -1.34482e-06 no
4.0 cc -O4 -f68881 math.il 13 4.83633e-08 no
math.S and math.il are listed later. What conclusions
does this table suggest?
* In 3.5->4.0 the fast got faster.
* In 3.5->4.0 the slow got slower.
* In 4.0 it is possible to obtain some SVID (System V
Interface Definition) compliance even with -f68881. It
doesn't matter for this program but it does if you run
the SV Validation Suite.
* In 4.0 both functions and inline expansion templates
could have been faster.
* The last executable listed is smallest, fastest, and
most accurate, for indeed its inner loop is:
main+0x16: fmulx fp7,fp7
main+0x1a: fsqrtx fp7,fp7
main+0x1e: flognx fp7,fp7
main+0x22: fetoxx fp7,fp7
main+0x26: fatanx fp7,fp7
main+0x2a: ftanx fp7,fp7
main+0x2e: faddx fp6,fp7
main+0x32: addql #1,d7
main+0x34: cmpl #0x1869f,d7
main+0x3a: bles main+0x16
which could scarcely be improved upon. This is the main benefit of
inline expansion of function calls: when they work well, all the
direct and indirect effects of function calls are eliminated.
Let's examine each of those possible conclusions.
* In SunOS 3.5 the compiler generates some workarounds for A79J
68881's. These were removed for 4.0, so most 68881's can run faster.
That made the inline templates more effective. Thus the fast got
faster. Also the SunOS 4.0 compiler invokes a global optimizer but
that doesn't affect this program much.
* In SunOS 3.5, if you compiled with -f68881 or -ffpa the libm didn't
meet the SVID requirements for errno and matherr. That was fixed in
4.0, at a significant per- formance penalty; given that, I figured
that anybody who cared about floating-point performance in C was
going to use the inline expansion templates all the time, so I
optimized them and didn't bother with the corresponding libm
functions. The SVID requirements are wrong-headed; X3J11 saw half
the light and removed matherr without grasping that the arguments
they used to remove matherr were equally appropriate for errno.
Anyway, if you don't use the inline expansion templates in 4.0 you
conform to the SVID whether you need to or not. Thus the slow got
slower. Indeed avoiding the SVID performance penalties is one of the
main reasons that C programmers would use the inline expansion tem-
plates in 4.0.
* SunOS 4.0 libm functions would obviously be faster if they ignored
the SVID. Here is a corresponding math.S file:
#define FUNC(F,G) \
.globl _/**/F ;\
_/**/F: movel sp at +,a0 ; \
f/**/G/**/d sp@,fp0 ; \
fmoved fp0,sp@ ; \
movel sp@,d0 ; \
movel sp@(4),d1 ; \
jmp a0@
FUNC(sqrt,sqrt)
FUNC(exp,etox)
FUNC(log,logn)
FUNC(tan,tan)
FUNC(atan,atan)
* What wasn't apparent until Peter Lamb provoked an investigation is
that the 4.0 inline templates weren't well matched with the
capabilities of c2, the local optimizer that follows the inline
expansion. c2 likes to see sp at + and sp at - but not sp@; a revised
math.IL file:
#define FUNC(F,G) \
.inline _/**/F,8 ;\
f/**/G/**/d sp at +,fp0 ; \
fmoved fp0,sp at - ; \
movel sp at +,d0 ; \
movel sp at +,d1 ; \
.end
FUNC(sqrt,sqrt)
FUNC(exp,etox)
FUNC(log,logn)
FUNC(tan,tan)
FUNC(atan,atan)
which can be converted to a math.il this way
cpp math.IL | sed 'y/;/\n/'
since cc doesn't handle .IL files! Anyway the inline
expansion templates have been revised correspondingly
for SunOS 4.1.
Why Sun-3?
If you have a Sun-3 on your desk, as I do, then natur- ally you want to
make the most of it. But when your budget permits you may well want to
upgrade to a Sun-4. As announced today, the entry price has been
substantially reduced. Since the SPARC architecture, unlike MC68881,
defines fsqrt but no elementary transcendental function instructions, the
libm performance penalty related to SVID is much reduced.
Why C?
Why program numerical work in C when Fortran is almost always more
efficient? Examples supporting the latter assertion: sqrt is an operator
in Fortran, a function in C; Fortran pointers (parameters) can be assumed
to be unaliased, but not in C. The issues Peter Lamb raised don't exist
in Sun Fortran; fsqrt instructions are simply gen- erated inline as needed
without resorting to libm or .il files.
Of course creating a complete application by combining numerical Fortran
code with non-numerical C code is not very easy to do in a
machine-independent way; I tried to get X3J11 interested in that problem,
so much more significant than errno, without success.
Why Inline Expansion Templates?
Sun's inline expansion template facility is probably not exactly like
anybody else's, and thus unfamiliar. The facility was originally intended
to provide a quick fix to some pernicious problems such as complex
arithmetic perfor- mance in Fortran prior to implementation of the
definitive solution in the rest of the compiler. The best way to think of
it is that you can redesign parts of the compiler with inline expansion
templates. Sun-supplied algorithm too slow or too accurate? Write your
own.
Questions for the Reader
Tell me what you think about the following:
* Should SunOS provide two versions of libm, one that conforms to SVID,
X3J11, and X/Open requirements and one that doesn't compromise
performance?
* Should SunOS provide means of EASILY obtaining maximum performance
without having to read many pages of obscure manuals? Note that
bundling additional options into -O or -O4 might NOT be a good idea
since optimiza- tion levels are somewhat independent of other types
of optimizations such as inline expansion templates. Embedded
systems with limited physical memory, for instance, may prefer to
call a function than suffer code expansion. So the question is
whether a new bun- dled compiler option such as "-allopts" would be
appropriate.
For More Information
Check out the SVID Volume 1 and the X3J11 draft and rationale, and maybe
the MC68881/2 manual. And (once again) the Floating-Point Programmer's
Guide in your SunOS doc crate and especially the 4.0 addendum in the
Programmer's Guides Minibox Read This First. If you are curious about C's
shortcomings in the numerical area, I have written a much longer
memorandum as part of the X3J11 public review; I will send troff source on
request. If you are even more curious then contact Rex Jaeschke
(uunet.uu.net!aussie!rex) about the Numerical C Extensions Group.
More information about the Comp.sys.sun
mailing list