C "optimization" (1 of 8)

Wed Feb 15 05:19:39 AEST 1984

This is a continuation of my diatribe on "C doesn't optimize, it neatens".
In this and other articles, I compare a true optimizing compiler (Bliss-32
running under VMS) to a code neatener (C running under BSD 4.1c).  Any and
all counterexamples are welcome.  However, this is NOT a comparison of the
languages.  Both C and Bliss have their good and bad points.  This is simply
a comparison of the code they generate.  As in all examples, the source
code and uncensored assembly code is presented.  In all examples, the C source
and Bliss source are as nearly identical as language differences permit.  I
have not taken advantage of any "tricks" to get either language to perform
better or worse than the other.  The optimizer was enabled for both languages.

		-Dan Klein, Mellon Institute, Pittsburgh	(412)578-3382
=============================================================================

In this comparison, I present a test which selects between two routines,
each of which has the identical arguments passed to it.  Now, since either
one or the other routine is called (i.e. flow control analysis can readily
determine that there is no condition under which neither or both routines
are called), a great optimizer would first push all the arguments onto the
stack, and then test the condition "t", and call the appropriate routine.
Regrettably, even Bliss is not smart enough to do this.  However, it is
smarter than C in that:

	1) It uses two MOVQ instructions to load the procedure arguments
instead of 4 PUSHL instructions (this is a space optimization, but gives
a small speed optimization since there are two less instructions to execute).
If Bliss were assured of the target processor, it could use a MOVO (move
octaword).  However, the 11/730 doesn't support that instruction, so Bliss
avoids it.
	2) C uses a common exit point from the routine.  Since no cleanup
code is used, the "jbr L19" is wasteful.  Bliss instead simply generates
two RET instructions where required.  This is a drawback of the peephole
optimizer that C uses.  It can eliminate jumps to jumps, but it does not
eliminate jumps to returns.  It should, since that optimization is both a
speed and space improvement.

----------------------------------------+-------------------------------------
external routine alpha;			|	extern  int	alpha();
external routine beta;			|	extern  int	beta();
					|
routine test(p,q,r,s,t) : novalue =	|	test(p,q,r,s,t)
begin					|	{
    if .t neq 0				|	    if (t!=0)
	then alpha(.p,.q,.r,.s)		|		alpha(p,q,r,s);
	else beta(.p,.q,.r,.s);		|	    else
end;					|		beta(p,q,r,s);
					|	}
					|
	.TITLE  FOO			|		.data
					|		.text
	.PSECT  $CODE$,NOWRT,2		|	LL0:	.align  1
					|		.globl  _test
TEST:	.WORD	^M<>			|		.set	L14,0x0
	TSTL	20(AP)			|		.data
	BEQL	1$			|		.text
	MOVQ	12(AP), -(SP)		|	_test:  .word	L14
	MOVQ	4(AP), -(SP)		|		tstl	20(ap)
	CALLS	#4, W^ALPHA		|		jeql	L18
	RET				|		pushl	16(ap)
1$:	MOVQ	12(AP), -(SP)		|		pushl	12(ap)
	MOVQ	4(AP), -(SP)		|		pushl	8(ap)
	CALLS	#4, W^BETA		|		pushl	4(ap)
	RET				|		calls	$4,_alpha
					|		jbr	L19
					|	L18:	pushl	16(ap)
					|		pushl	12(ap)
					|		pushl	8(ap)
					|		pushl	4(ap)
					|		calls	$4,_beta
					|	L19:	ret