C "optimization"

Fri Feb 24 02:16:46 AEST 1984

Dan Klein's comparison between C and Bliss really compared pcc
with BLISS-32.  Using such results to point out the inferior aspects of
C is like saying UNIX's user interface is horrible based on one's
experiences with version 6 and 3bsd.

I compiled Dan's test programs on the DEC VMS C compiler, version 1.0-09.
This compiler shares its back end with other highly optimizing
VMS compilers such as PL/1.  The generated code is in some cases better
than BLISS-32's, and in some cases it's worse than pcc's.

Program 1:
extern  int	alpha();	    test:   .entry  test,^m<r2>
extern  int	beta();			    moval   (ap),r2
					    tstl    14(r2)
test(p,q,r,s,t)				    beql    vcg.1
{					    pushl   10(r2)
    if (t!=0)				    pushl   0C(r2)
	alpha(p,q,r,s);			    pushl   08(r2)
    else				    pushl   04(r2)
	beta(p,q,r,s);			    calls   #4,alpha
}					    ret
				    vcg.1:  pushl   10(r2)
					    pushl   0C(r2)
					    pushl   08(r2)
					    pushl   04(r2)
					    calls   #4,beta
					    ret

The code here is about the same as pcc's.  Note that VMS C
likes to keep the address of the base of the argument list in a register,
even though it doesn't really buy anything.

Program 2:

extern  int	alpha();	    test:   .entry  test,^m<r2>
					    moval   (ap),r2
test(p,q,r,s,t)				    tstl    14(r2)
{					    beql    vcg.1
    if (t!=0)				    pushl   #0
	alpha(p,q,r,s,0);		    pushl   10(r2)
    else				    pushl   0C(r2)
	alpha(p,q,r,s,1);		    pushl   08(r2)
}					    pushl   04(r2)
					    calls   #5,alpha
					    ret
				    vcg.1:  pushl   #1
					    pushl   10(r2)
					    pushl   0C(r2)
					    pushl   08(r2)
					    pushl   04(r2)
					    calls   #5,alpha
					    ret

Well, VMS C loses here.  It doesn't share identical code sections.

Program 3:
test(a)				    test:   .entry  test,^m<r2>
int	a;				    moval   (ap),r2
{					    clrl    r1
    int i;				    addl3   #5,04(r2),r0
					    divl2   #2,r0
    for (i=0; i<=(5+a)/2; i++) ;	    cmpl    r1,r0
}					    bgtr    vcg.2
				    vcg.1:  aobleq  r0,r1,vcg.1
				    vcg.2:  ret

Almost as good as BLISS-32's.  It doesn't use BLISS-32's neat trick of setting
the value to be one less than the real starting value and flowing
into the aobleq.

Program 4:
extern  alpha();		    test:   .entry  test,^m<>
					    pushl   #2
test()					    calls   #1,alpha
{					    pushl   #5
    int a, b;				    calls   #1,alpha
					    ret
    a = 2;
    alpha(a);
    b = 5;
    alpha(b);
}

This beats BLISS-32, by pushing the constant values immediately rather than
moving each to a register first.

Program 5:
Identical to BLISS-32's code.

Program 6:
Identical to BLISS-32's code.

Program 7:

#define P1	3		    test:   .entry  test,^m<>
#define P2	5			    cmpl    #3,#5
					    bgeq    vcg.1
extern int alpha();			    clrl    r0
					    ret
test()				    vcg.1:
{					    movl    #1,r0
    int loc1 = P1,			    ret
	loc2 = P2;

    if (loc1 < loc2)
	return 0;
    else
	return 1;
}

This beats BLISS-32 in the same way as it did with Program 4 - it uses constant
values in immediate mode rather than moving them to registers first.
I wonder why it didn't just go further and collapse the program
down to "clrl r0 ; ret".

Program 8:
typedef struct dp *DP;		    test:   .entry  test,^m<>
struct dp {				    movl    08(ap),r0
    int d;				    movl    04(r0),r0
    DP  p;				    movl    04(r0),r0
    }					    movl    04(r0),r0
					    movl    @04(r0),(r0)
test(a)					    ret
DP a;
{
    a->p->p->p->d =
	a->p->p->p->p->d;
}

Identical to BLISS-32's code, except that the arg is at 8(ap) rather
than 4(ap).  This is because the function is structure-valued, and
4(ap) contains the address of some space set aside in the caller
for the return value.

Program 9:

extern  alpha();		    test:   .entry  test,^m<>
					    clrl    ap
test()					    movl    ap,r0
{					    pushl   #0
    register int a, b, c;		    pushl   ap
					    pushl   r0
    a = b = c = 0;			    calls   #3,alpha
    alpha(a, b, c);			    ret
}

This code looks truly bizarre, and has to do with VMS C's methods
of register allocation.  Variable "c" has been optimized out of
the code, "b" resides in ap (since we don't have any parameters,
we can use ap as just another register), and "a" resides in r0.

Before you all rush out and switch to VMS because of its C compiler,
note the following (all based on version 1.0-09; I don't know if
another version has come out that fixes these problems):
	- it uses registers very heavily, allocating them based on a static
	  analysis of the program.  It also IGNORES "register" declarations in
	  your C code.  This may lead to nonoptimal code, as well as slower
	  procedure startup (because of all the registers that may need to
	  be saved).
	- it tries to minimize code space, but not data space.  The
	  prime example of this is that it always uses a branch table
	  for implementing switches, no matter how sparsely the cases
	  are distributed.  A switch with a "case 0" and a "case 32000"
	  generates an awfully big branch table.
	- it sometime dies with a fatal parser error when presented with
	  some legal C constructs.
But, it's reasonably fast, comes with a very good manual, and can
produce compilation listings with storage maps, cross references, etc.

	Mark Plotnick
	WH 1C-244 x6955