C "optimization"
Mark Plotnick
mp at whuxle.UUCP
Fri Feb 24 02:16:46 AEST 1984
Dan Klein's comparison between C and Bliss really compared pcc
with BLISS-32. Using such results to point out the inferior aspects of
C is like saying UNIX's user interface is horrible based on one's
experiences with version 6 and 3bsd.
I compiled Dan's test programs on the DEC VMS C compiler, version 1.0-09.
This compiler shares its back end with other highly optimizing
VMS compilers such as PL/1. The generated code is in some cases better
than BLISS-32's, and in some cases it's worse than pcc's.
Program 1:
extern int alpha(); test: .entry test,^m<r2>
extern int beta(); moval (ap),r2
tstl 14(r2)
test(p,q,r,s,t) beql vcg.1
{ pushl 10(r2)
if (t!=0) pushl 0C(r2)
alpha(p,q,r,s); pushl 08(r2)
else pushl 04(r2)
beta(p,q,r,s); calls #4,alpha
} ret
vcg.1: pushl 10(r2)
pushl 0C(r2)
pushl 08(r2)
pushl 04(r2)
calls #4,beta
ret
The code here is about the same as pcc's. Note that VMS C
likes to keep the address of the base of the argument list in a register,
even though it doesn't really buy anything.
Program 2:
extern int alpha(); test: .entry test,^m<r2>
moval (ap),r2
test(p,q,r,s,t) tstl 14(r2)
{ beql vcg.1
if (t!=0) pushl #0
alpha(p,q,r,s,0); pushl 10(r2)
else pushl 0C(r2)
alpha(p,q,r,s,1); pushl 08(r2)
} pushl 04(r2)
calls #5,alpha
ret
vcg.1: pushl #1
pushl 10(r2)
pushl 0C(r2)
pushl 08(r2)
pushl 04(r2)
calls #5,alpha
ret
Well, VMS C loses here. It doesn't share identical code sections.
Program 3:
test(a) test: .entry test,^m<r2>
int a; moval (ap),r2
{ clrl r1
int i; addl3 #5,04(r2),r0
divl2 #2,r0
for (i=0; i<=(5+a)/2; i++) ; cmpl r1,r0
} bgtr vcg.2
vcg.1: aobleq r0,r1,vcg.1
vcg.2: ret
Almost as good as BLISS-32's. It doesn't use BLISS-32's neat trick of setting
the value to be one less than the real starting value and flowing
into the aobleq.
Program 4:
extern alpha(); test: .entry test,^m<>
pushl #2
test() calls #1,alpha
{ pushl #5
int a, b; calls #1,alpha
ret
a = 2;
alpha(a);
b = 5;
alpha(b);
}
This beats BLISS-32, by pushing the constant values immediately rather than
moving each to a register first.
Program 5:
Identical to BLISS-32's code.
Program 6:
Identical to BLISS-32's code.
Program 7:
#define P1 3 test: .entry test,^m<>
#define P2 5 cmpl #3,#5
bgeq vcg.1
extern int alpha(); clrl r0
ret
test() vcg.1:
{ movl #1,r0
int loc1 = P1, ret
loc2 = P2;
if (loc1 < loc2)
return 0;
else
return 1;
}
This beats BLISS-32 in the same way as it did with Program 4 - it uses constant
values in immediate mode rather than moving them to registers first.
I wonder why it didn't just go further and collapse the program
down to "clrl r0 ; ret".
Program 8:
typedef struct dp *DP; test: .entry test,^m<>
struct dp { movl 08(ap),r0
int d; movl 04(r0),r0
DP p; movl 04(r0),r0
} movl 04(r0),r0
movl @04(r0),(r0)
test(a) ret
DP a;
{
a->p->p->p->d =
a->p->p->p->p->d;
}
Identical to BLISS-32's code, except that the arg is at 8(ap) rather
than 4(ap). This is because the function is structure-valued, and
4(ap) contains the address of some space set aside in the caller
for the return value.
Program 9:
extern alpha(); test: .entry test,^m<>
clrl ap
test() movl ap,r0
{ pushl #0
register int a, b, c; pushl ap
pushl r0
a = b = c = 0; calls #3,alpha
alpha(a, b, c); ret
}
This code looks truly bizarre, and has to do with VMS C's methods
of register allocation. Variable "c" has been optimized out of
the code, "b" resides in ap (since we don't have any parameters,
we can use ap as just another register), and "a" resides in r0.
Before you all rush out and switch to VMS because of its C compiler,
note the following (all based on version 1.0-09; I don't know if
another version has come out that fixes these problems):
- it uses registers very heavily, allocating them based on a static
analysis of the program. It also IGNORES "register" declarations in
your C code. This may lead to nonoptimal code, as well as slower
procedure startup (because of all the registers that may need to
be saved).
- it tries to minimize code space, but not data space. The
prime example of this is that it always uses a branch table
for implementing switches, no matter how sparsely the cases
are distributed. A switch with a "case 0" and a "case 32000"
generates an awfully big branch table.
- it sometime dies with a fatal parser error when presented with
some legal C constructs.
But, it's reasonably fast, comes with a very good manual, and can
produce compilation listings with storage maps, cross references, etc.
Mark Plotnick
WH 1C-244 x6955
More information about the Comp.lang.c
mailing list