What does the C optimiser do? -- Responses

Fri Oct 4 02:30:03 AEST 1985

I had several requests to post the responses I received on what the C
optimizer really does.  So....  Here they are.

I also uncovered form my net.sources archives a series of postings a
year or so ago from mi-cec!dvk concerning how the UNIX C optimiser doesn't
optimise, it neatens.  If anyone would like a copy of these postings,
let me know.
				David Hinnant
				SCI Systems Inc
				...akgua!mcnc!rti-sel!scirtp!dfh

===============================================================================
From: Arnold Robbins <rti-sel!vrdxhq!seismo!gatech!arnold> <arnold at seism.UUCP>
Subject: Re: What does the optimiser optimise?
An interesting thing that c2 does (under BSD, anyway) is the following:

	switch (x) {
	case 1:
		foo ("x");
		break;

	case 2:
		foo("y");
		break;
	....
	}

where all it does is call foo() with different arguments for different
values of x.  The code will determine the argument based on the value of x, and
then call foo(). In other words, the procedure call code is only generated
once!  I found this out when trying to optomize such a routine by just
assigning a character pointer and then calling the routine. The code
got bigger!
============================

From: rti-sel!mcnc!clyde!ulysses!sfmag!mjs <mjs at c.UUCP>
Subject: Re: What does the optimiser optimise?
Most optimizers (actually `assembly language improvers') do common tail
merging, unreachable code removal, fixup branches to branches, and a host of
other machine independant improvements.  Depending on the code generated, many
will also reuse registers known to have constant values (instead of using an
immediate operand), collapse some sequences of instructions into shorter or
faster sequences (often both, but typically only by accident), using both
different instructions and/or different addressing modes.  The arithmetic
example you posted is unlikely to be optimizable because the compiler tries
to reuse scratch registers too quickly, so the "(a+b)" in your example is
unlikely to stay in a scratch register long enough (i.e., across enough
instructions) to be reused.

================================
From: rti-sel!mcnc!bellcore!hammond (Rich A. Hammond) <hammond at bellcor.UUCP>
Subject: Re: What does the optimiser optimise?
1) Common sub-expressions are tricky in C, for example :
 x = (a+b);
 (receive signal, change state of a or b)
 y = (a+b) +c;
 or even if you don't worry about signals, longjumps, ...
 then you must be sure that:
	a) None of the operands in the CSE could have their address taken,
	b) No indirection is done,
	c) No function calls (unless you do complete flow analysis).
 For these reasons, most optimisers for C don't play with CSE's much.
 Besides, in C using pointer arithmetic gets rid of the frequent and
 hidden to the programmer CSE's from array indexing arithmetic.
 I wrote an optimiser for the IBM Series/1 C compiler we used at
 U of Delaware and it only found a few (1 or 2) CSE's in an awful
 lot of code run through it.

2) For, while loops:
 On the PDP-11, which had a decrement and branch instruction,
	register int i;
	for (i = n; --i>0; ) would be turned into the dbr instruction.
 I suspect that machines with add one and branch and subtract one and branch
 with loop counters in registers might try and take advantage of those
 instructions, but this is not guaranteed.

3) Common code sequences: (another place besides IF ELSE is in SWITCH CASE)
 Yes, the PDP-11 optimiser found them and merged them, note that this
 is a space optimization and not a time optimization, since one sequence
 will be replaced by an unconditional branch plus the sequence.

4) Code that is never reached is often removed, note that the way the
 PDP-11 optimiser handled the branch-tail merging was to insert the
 unconditional branch ahead of the sequence and leave the sequence
 to be removed on the next pass by the unreachable code remover.

In general, the quality of optimisers varies tremendously, and no
two do the same set of "optimisations."  I tried the trivial program
	register int i; register double a, b, c;
	for (i = 0; i < 1000000; i++) {
		a = b + c; /* 100 copies of this line	*/
	}
and only one optimiser noticed the CSE, even though this was the trivial
case with all the variables in registers, no indirection, ...
It was smart enough to just do the statement once per loop.
Made the floating point unit look nice compared to the other machines.
===================================

From: rti-sel!mcnc!decvax!ucbvax!ucdavis!deneb!ccrrick (Rick Heli) <ccrrick at .UUCP>
Subject: Re: What does the optimiser optimise?
Code that can never reached should be flagged as an error!
					--rick heli
					(... ucbvax!ucdavis!groucho!ccrrick)
===================================

From: Vincent P. Broman <rti-sel!mcnc!decvax!sdcsvax!noscvax!broman> <broman at .UUCP>
Subject: C optimizer
David Hinnant,
	One way to investigate such questions is to run /lib/c2 with the -d
option standalone. It prints out how many of what kind of improvements it did,
at least on 4.xBSD UNIX.
===================================

From: rti-sel!mcnc!decvax!astrovax!tl-vaxa!harbison (Samuel P. Harbison) <harbison at .UUCP>
Subject: Optimization examples
Dear Mr. Hinnant,

Regarding your  info-c post about compiler  optimizations, I  put your example
programs--and  some  variants  of  my own--through  our Tartan  C compiler for
Vax/UNIX, and I have included the generated  code below.   Most  of the major
optimizations we do are shown.  

I am not a disinterested observer insofar as I am the  development manager for
this product.  I am quite proud  of our  compiler, which  I think  is the best
optimizing C compiler available under UNIX.  

The compiler used in  the examples  is our  latest update,  release 2.1, which
will be shipped to current and new customers within a month.  

Sam Harbison
Tartan Laboratories Incorporated

int x,y,z,a,b,c,d;

/* Here's the first example, slightly modified from the original.
We eliminate the common subexpressions (a+b) and (a+b)+c. */

int f1a()
{
  x = (a+b);
  y = (a+b) +c;
  z = ((a+b) +c) / d;
  return 0;
}

/* Generated code:
_f1a:
	.word 	0
	addl3		_b,_a,r0
	movl		r0,_x
	addl2		_c,r0
	movl		r0,_y		
	divl3		_d,r0,_z
	clrl		r0
	ret	
*/

/* Here's the previous example in its original form.  Because the 
   same variable is being overwritten, only the last assignment is kept.
*/

int f1b()
{
  x = (a+b);
  x = (a+b) +c;
  x = (a+b) +c / d;
  return 0;
}

/* Generated code:
_f1b:
	.word 	0
	addl3		_b,_a,r0
	divl3		_d,_c,r1
	addl3		r1,r0,_x
	clrl		r0
	ret	
*/

/* Here are some looping examples.  Note that the local variables go into
   registers even though they are not declared "register". */

int ary[10][10];

int f2()
{
  int i, sum=0;

  /* The compiler can't know if the following loop will be executed even once,
     so it must perform the test first: */

  for (i=1; i < x; i++) printf("First\n");

/* Generated code:
	movl		$1,r6
	cmpl		_x,$1
	bleq		L45
L34:	pushab		Def001
	calls		$1,_printf
	aoblss		_x,r6,L34
L45:	...
	.data
Def001:	
	.ascii "First" 
	.byte 10, 0
*/

/* The compiler knows this loop will be done at least once, so eliminates
   the top-of-loop test:  */

  for (i=1; i< 10; i++) {printf("Second\n");}

/* Generated code:
	movl		$1,r6
L36:	pushab		Def002
	calls		$1,_printf
	aobleq		$9,r6,L36
	...
	.data
Def002:	
	.ascii "Second" 
	.byte 10, 0
*/

/* The compiler knows this loop will never be executed, and so generates
   no code: */

  for (i=10; i<10; i++) {printf("Third\n");}

/* Generated code:
*/

/* This one is more interesting; it is an example of "strength reduction".
   The multiplication i*40 in the address computation of ary[i][i] 
   has been converted to an addition in the loop ("addl2 $40,r3"). */

  for (i=0; i<10; i++) 
	sum += ary[i][i];
  return sum;
}

/* Generated code:
	clrl		r7		# r7 is sum
	...
	clrl		r6		# r6 is i
	movab		_ary,r3 	# r3 is &ary[i][0]
L43:	addl2		(r3)[r6],r7
	addl2		$40,r3
	aobleq		$9,r6,L43
	movl		r7,r0
	ret
*/

/* Here's an example of hoisting an invariant expression out of a loop */

int f2b()
{
  int i,sum=0;
  for(i=0;i<10;i++)
    sum += x*(a+b);	/* x*(a+b) is invariant */
  return sum;
}

/* Generated code:
_f2b:
	.word 	0
	clrl		r4		# r4 is sum
	clrl		r3		# r3 is i
	addl3		_b,_a,r0
	mull2		_x,r0		# r0 is x*(a+b)
L45:	addl2		r0,r4
	aobleq		$9,r3,L45
	movl		r4,r0
	ret	
*/

/* Here's your example of cross jumping.  (The resulting code
   has a superfluous test; you can't win them all...) */

int f3()
{
if (x) { x = 1;	y = 1;	z = 1; } 
else   { x = 1;	y = 1;	z = 1; }
return 0;
}

/*  Generated code:
_f3:
	.word 	0
	tstl		_x
	movl		$1,_x
	movl		$1,_y
	movl		$1,_z
	clrl		r0
	ret	
*/

/* Dead code elimination (and constant propagation): */

int f4()
{
  int t=1;  
  if (t) x=1; else x=2;
  return 0;
}

/* Generated code:
_f4:
	.word 	0
	movl		$1,_x
	clrl		r0
	ret	
*/

/* Finally, here's one I don't think you'll find in any other C compiler:
   inline expansion of functions */

int abs(i) int i; { return i<0 ? -i : i; }

int f5(x)
  int x;
{
  return abs(x);
}

/* Generated code for f5():  [ Code for abs() is also generated but not shown.]
   We should have used r0 instead of r3, but...
_f5:
	.word 	0
	movl		4(ap),r3
	tstl		r3
	bgeq		L55
	mnegl		r3,r3
L55:	movl		r3,r0
	ret	
*/
==============================
-- 
				David Hinnant
				SCI Systems, Inc.
				{decvax, akgua}!mcnc!rti-sel!scirtp!dfh