What does the C optimiser do? -- Responses
David F. Hinnant
dfh at scirtp.UUCP
Fri Oct 4 02:30:03 AEST 1985
I had several requests to post the responses I received on what the C
optimizer really does. So.... Here they are.
I also uncovered form my net.sources archives a series of postings a
year or so ago from mi-cec!dvk concerning how the UNIX C optimiser doesn't
optimise, it neatens. If anyone would like a copy of these postings,
let me know.
David Hinnant
SCI Systems Inc
...akgua!mcnc!rti-sel!scirtp!dfh
===============================================================================
From: Arnold Robbins <rti-sel!vrdxhq!seismo!gatech!arnold> <arnold at seism.UUCP>
Subject: Re: What does the optimiser optimise?
An interesting thing that c2 does (under BSD, anyway) is the following:
switch (x) {
case 1:
foo ("x");
break;
case 2:
foo("y");
break;
....
}
where all it does is call foo() with different arguments for different
values of x. The code will determine the argument based on the value of x, and
then call foo(). In other words, the procedure call code is only generated
once! I found this out when trying to optomize such a routine by just
assigning a character pointer and then calling the routine. The code
got bigger!
============================
From: rti-sel!mcnc!clyde!ulysses!sfmag!mjs <mjs at c.UUCP>
Subject: Re: What does the optimiser optimise?
Most optimizers (actually `assembly language improvers') do common tail
merging, unreachable code removal, fixup branches to branches, and a host of
other machine independant improvements. Depending on the code generated, many
will also reuse registers known to have constant values (instead of using an
immediate operand), collapse some sequences of instructions into shorter or
faster sequences (often both, but typically only by accident), using both
different instructions and/or different addressing modes. The arithmetic
example you posted is unlikely to be optimizable because the compiler tries
to reuse scratch registers too quickly, so the "(a+b)" in your example is
unlikely to stay in a scratch register long enough (i.e., across enough
instructions) to be reused.
================================
From: rti-sel!mcnc!bellcore!hammond (Rich A. Hammond) <hammond at bellcor.UUCP>
Subject: Re: What does the optimiser optimise?
1) Common sub-expressions are tricky in C, for example :
x = (a+b);
(receive signal, change state of a or b)
y = (a+b) +c;
or even if you don't worry about signals, longjumps, ...
then you must be sure that:
a) None of the operands in the CSE could have their address taken,
b) No indirection is done,
c) No function calls (unless you do complete flow analysis).
For these reasons, most optimisers for C don't play with CSE's much.
Besides, in C using pointer arithmetic gets rid of the frequent and
hidden to the programmer CSE's from array indexing arithmetic.
I wrote an optimiser for the IBM Series/1 C compiler we used at
U of Delaware and it only found a few (1 or 2) CSE's in an awful
lot of code run through it.
2) For, while loops:
On the PDP-11, which had a decrement and branch instruction,
register int i;
for (i = n; --i>0; ) would be turned into the dbr instruction.
I suspect that machines with add one and branch and subtract one and branch
with loop counters in registers might try and take advantage of those
instructions, but this is not guaranteed.
3) Common code sequences: (another place besides IF ELSE is in SWITCH CASE)
Yes, the PDP-11 optimiser found them and merged them, note that this
is a space optimization and not a time optimization, since one sequence
will be replaced by an unconditional branch plus the sequence.
4) Code that is never reached is often removed, note that the way the
PDP-11 optimiser handled the branch-tail merging was to insert the
unconditional branch ahead of the sequence and leave the sequence
to be removed on the next pass by the unreachable code remover.
In general, the quality of optimisers varies tremendously, and no
two do the same set of "optimisations." I tried the trivial program
register int i; register double a, b, c;
for (i = 0; i < 1000000; i++) {
a = b + c; /* 100 copies of this line */
}
and only one optimiser noticed the CSE, even though this was the trivial
case with all the variables in registers, no indirection, ...
It was smart enough to just do the statement once per loop.
Made the floating point unit look nice compared to the other machines.
===================================
From: rti-sel!mcnc!decvax!ucbvax!ucdavis!deneb!ccrrick (Rick Heli) <ccrrick at .UUCP>
Subject: Re: What does the optimiser optimise?
Code that can never reached should be flagged as an error!
--rick heli
(... ucbvax!ucdavis!groucho!ccrrick)
===================================
From: Vincent P. Broman <rti-sel!mcnc!decvax!sdcsvax!noscvax!broman> <broman at .UUCP>
Subject: C optimizer
David Hinnant,
One way to investigate such questions is to run /lib/c2 with the -d
option standalone. It prints out how many of what kind of improvements it did,
at least on 4.xBSD UNIX.
===================================
From: rti-sel!mcnc!decvax!astrovax!tl-vaxa!harbison (Samuel P. Harbison) <harbison at .UUCP>
Subject: Optimization examples
Dear Mr. Hinnant,
Regarding your info-c post about compiler optimizations, I put your example
programs--and some variants of my own--through our Tartan C compiler for
Vax/UNIX, and I have included the generated code below. Most of the major
optimizations we do are shown.
I am not a disinterested observer insofar as I am the development manager for
this product. I am quite proud of our compiler, which I think is the best
optimizing C compiler available under UNIX.
The compiler used in the examples is our latest update, release 2.1, which
will be shipped to current and new customers within a month.
Sam Harbison
Tartan Laboratories Incorporated
int x,y,z,a,b,c,d;
/* Here's the first example, slightly modified from the original.
We eliminate the common subexpressions (a+b) and (a+b)+c. */
int f1a()
{
x = (a+b);
y = (a+b) +c;
z = ((a+b) +c) / d;
return 0;
}
/* Generated code:
_f1a:
.word 0
addl3 _b,_a,r0
movl r0,_x
addl2 _c,r0
movl r0,_y
divl3 _d,r0,_z
clrl r0
ret
*/
/* Here's the previous example in its original form. Because the
same variable is being overwritten, only the last assignment is kept.
*/
int f1b()
{
x = (a+b);
x = (a+b) +c;
x = (a+b) +c / d;
return 0;
}
/* Generated code:
_f1b:
.word 0
addl3 _b,_a,r0
divl3 _d,_c,r1
addl3 r1,r0,_x
clrl r0
ret
*/
/* Here are some looping examples. Note that the local variables go into
registers even though they are not declared "register". */
int ary[10][10];
int f2()
{
int i, sum=0;
/* The compiler can't know if the following loop will be executed even once,
so it must perform the test first: */
for (i=1; i < x; i++) printf("First\n");
/* Generated code:
movl $1,r6
cmpl _x,$1
bleq L45
L34: pushab Def001
calls $1,_printf
aoblss _x,r6,L34
L45: ...
.data
Def001:
.ascii "First"
.byte 10, 0
*/
/* The compiler knows this loop will be done at least once, so eliminates
the top-of-loop test: */
for (i=1; i< 10; i++) {printf("Second\n");}
/* Generated code:
movl $1,r6
L36: pushab Def002
calls $1,_printf
aobleq $9,r6,L36
...
.data
Def002:
.ascii "Second"
.byte 10, 0
*/
/* The compiler knows this loop will never be executed, and so generates
no code: */
for (i=10; i<10; i++) {printf("Third\n");}
/* Generated code:
*/
/* This one is more interesting; it is an example of "strength reduction".
The multiplication i*40 in the address computation of ary[i][i]
has been converted to an addition in the loop ("addl2 $40,r3"). */
for (i=0; i<10; i++)
sum += ary[i][i];
return sum;
}
/* Generated code:
clrl r7 # r7 is sum
...
clrl r6 # r6 is i
movab _ary,r3 # r3 is &ary[i][0]
L43: addl2 (r3)[r6],r7
addl2 $40,r3
aobleq $9,r6,L43
movl r7,r0
ret
*/
/* Here's an example of hoisting an invariant expression out of a loop */
int f2b()
{
int i,sum=0;
for(i=0;i<10;i++)
sum += x*(a+b); /* x*(a+b) is invariant */
return sum;
}
/* Generated code:
_f2b:
.word 0
clrl r4 # r4 is sum
clrl r3 # r3 is i
addl3 _b,_a,r0
mull2 _x,r0 # r0 is x*(a+b)
L45: addl2 r0,r4
aobleq $9,r3,L45
movl r4,r0
ret
*/
/* Here's your example of cross jumping. (The resulting code
has a superfluous test; you can't win them all...) */
int f3()
{
if (x) { x = 1; y = 1; z = 1; }
else { x = 1; y = 1; z = 1; }
return 0;
}
/* Generated code:
_f3:
.word 0
tstl _x
movl $1,_x
movl $1,_y
movl $1,_z
clrl r0
ret
*/
/* Dead code elimination (and constant propagation): */
int f4()
{
int t=1;
if (t) x=1; else x=2;
return 0;
}
/* Generated code:
_f4:
.word 0
movl $1,_x
clrl r0
ret
*/
/* Finally, here's one I don't think you'll find in any other C compiler:
inline expansion of functions */
int abs(i) int i; { return i<0 ? -i : i; }
int f5(x)
int x;
{
return abs(x);
}
/* Generated code for f5(): [ Code for abs() is also generated but not shown.]
We should have used r0 instead of r3, but...
_f5:
.word 0
movl 4(ap),r3
tstl r3
bgeq L55
mnegl r3,r3
L55: movl r3,r0
ret
*/
==============================
--
David Hinnant
SCI Systems, Inc.
{decvax, akgua}!mcnc!rti-sel!scirtp!dfh
More information about the Comp.lang.c
mailing list