[4bsd-f77 #25] f77 seriously breaks exponentiations
4.2 BSD f77 bug reports
4bsd-f77 at utah-cs.UUCP
Wed Aug 1 10:47:34 AEST 1984
From: Donn Seeley <donn at utah-cs.arpa>
Subject: f77 seriously breaks certain expressions with exponentiations
Index: usr.bin/f77/src/f77pass1/optcse.c 4.2BSD
Description:
If an expression or series of expressions contain
subexpressions of the form 'x ** N(i)' where N(i) are integer
constants, f77 can generate code that turns all such
expressions into 'x ** M' where M is one such N(i). This
bug was brought to my attention by Sean Byrne.
Repeat-By:
Create the following program and compile it with the
optimizer on:
----------------------------------------------------------------
program xmpl
integer i
i = 5
i = i ** 3 + i ** 5
print *, i
stop
end
----------------------------------------------------------------
When run, the program prints '250' when it should print '3250'.
Fix:
Here's the relevant material from the assembly code that's
generated, prettied up slightly:
----------------------------------------------------------------
movl $5,{i}
movl {i},-4(fp)
mull3 -4(fp),-4(fp),r0 # r0 <- i ** 2
mull3 -4(fp),r0,-8(fp) # -8(fp) <- i ** 3
movl {i},-12(fp)
mull2 -12(fp),-12(fp) # -12(fp) <- i ** 2
addl3 -8(fp),-8(fp),{i} # i <- i ** 3 + i ** 3
----------------------------------------------------------------
Notice how the value in -12(fp) is developed but never gets
used for anything.
If you turn off common subexpression elimination and copy
propagation in the compiler by supplying the flag '-d7,13' as
an argument you get the following correct code instead (you
need to turn off copy propagation if you haven't installed my
earlier copy propagation bug fix, otherwise turning off CSE
isn't enough!):
----------------------------------------------------------------
movl $5,{i}
movl {i},-4(fp)
movl -4(fp),-8(fp)
+ mull3 -4(fp),-4(fp),r0 # r0 <- i ** 2
+ mull3 -8(fp),r0,-12(fp) # -12(fp) <- i ** 3
movl {i},-16(fp)
movl -16(fp),-20(fp)
- mull2 -16(fp),-16(fp) # -16(fp) <- i ** 2
* mull3 -16(fp),-16(fp),r0 # r0 <- i ** 4
* mull3 -20(fp),r0,-24(fp) # -24(fp) <- i ** 5
addl3 -24(fp),-12(fp),{i} # i <- i ** 3 + i ** 5
----------------------------------------------------------------
What is happening in the former example is that the common
subexpression elimination code has noticed that the two
computations marked '+' and '*' in the latter example are
identical; hence it eliminates the second subexpression in
favor of the first. Unfortunately the two subexpressions are
NOT computing the same value, because the assignment which
results in the instruction marked '-' in the latter example
changes the value of the temporary.
The problem is that the CSE code doesn't realize that the '*='
operator is different from the '=' operator. The '*=' is
generated by the code which handles the inline expansion of
exponentiations with constant integer exponents, and it is a
'*=' expression which produces the instruction marked '-'
above. If it were an '=' expression instead of a '*=', the CSE
would be correct. The fix is to make the CSE code understand
that the lvalue of a '*=' operator does not receive the value
of the right hand side (in function scantree() in optcse.c):
----------------------------------------------------------------
*** /tmp/,RCSt1028888 Thu Jul 19 19:21:54 1984
--- optcse.c Thu Jul 19 12:03:55 1984
***************
*** 595,602
switch (p->exprblock.opcode) {
case OPASSIGN :
- case OPPLUSEQ :
- case OPSTAREQ :
{
Addrp ap;
--- 604,609 -----
switch (p->exprblock.opcode) {
case OPASSIGN :
{
Addrp ap;
***************
*** 609,614
}
new = newnode(p,idp->initval,NULL,NULL);
appendnode(new);
new->rs = new;
return(new->rs);
}
--- 616,638 -----
}
new = newnode(p,idp->initval,NULL,NULL);
appendnode(new);
+ new->rs = new;
+ return(new->rs);
+ }
+
+ /*
+ * Don't optimize these... they're a real hassle.
+ */
+ case OPPLUSEQ :
+ case OPSTAREQ :
+ {
+ Addrp ap;
+
+ ap = (Addrp) p->exprblock.leftp;
+ idp = findid(ap);
+ killdepnodes(idp);
+ idp->assgnval = NULL;
+ new = newnode(p,lnode,rnode,NULL);
new->rs = new;
return(new->rs);
}
----------------------------------------------------------------
As you can see by the comment, the effect of this fix is to
largely prevent CSE between exponentiations. But a substantial
amount of time can be wasted doing extra multiplies in the
computation of (say) a polynomial if you don't take advantage
of the fact that (say) 'x ** 8' can be computed from 'x ** 4 *
x ** 4', where 'x ** 4' was computed for an earlier term.
I have two different attacks on this problem. Both attacks
involve the removal of '*=' operators so that CSE can proceed.
The first line of attack is to produce a new version of the
code that does inline expansion of exponentiations, which
follows the same strategy as the old code but uses simple
multiplies instead of '*='. This version has the problem that
CSE can't remove some of the computations, because it doesn't
realize that some of the temporaries are only going to be used
to compute the exponentiation and can be thrown away if their
value isn't needed. The second line of attack is to avoid
creating any temporaries at all in the exponentation code, and
let CSE do all the work of pruning the expression tree. This
has the advantage of drastically simplifying the inline
expansion code, but there is a cost. The problem is that the
expression tree can get very large (on the order of the size of
the exponent) before CSE gets a chance to prune it (to
something closer to the log of the exponent in size). What I
did is simply pick a largest exponent beyond which
exponentiation is not done inline. This is not perhaps as
reasonable as it sounds, because the worst case exponent (2 **
31 - 1) only requires 60-odd multiplies under the previous
system, whereas the new system requires a subroutine call.
Since most constant integer exponents are small, though, this
probably won't hurt very much.
To install the latter fix, replace the routine buffpower() in
optim.c with the following two routines:
----------------------------------------------------------------
/*
* Called by expand() to eliminate exponentiations to integer constants.
*/
LOCAL expptr buffpower( p )
expptr p;
{
expptr base;
Addrp newtemp;
expptr storetemp = ENULL;
expptr powtree();
expptr result;
ftnint exp;
if ( ! ISICON( p->exprblock.rightp ) )
fatal( "buffpower: bad non-integer exponent" );
base = p->exprblock.leftp;
exp = p->exprblock.rightp->constblock.const.ci;
if ( exp < 2 )
fatal( "buffpower: bad exponent less than 2" );
if ( exp > 64 ) {
/*
* Let's be reasonable, here...
*
* We need to fox the constant exponent to get mkpower() to
* do its job properly.
*/
newtemp = mktemp( TYINT, ENULL );
storetemp = mkexpr( OPASSIGN,
cpexpr( (expptr) newtemp ),
mkintcon( exp ) );
result = mkexpr( OPPOWER, cpexpr( base ), cpexpr( newtemp ) );
result = fixexpr( result );
result = mkexpr( OPCOMMA, storetemp, result );
return ( result );
}
/*
* If the base is not a simple variable, evaluate it and copy the
* result into a temporary.
*/
if ( ! (base->tag == TADDR && ISCONST( base->addrblock.memoffset )) ) {
newtemp = mktemp( base->headblock.vtype, PNULL );
storetemp = mkexpr( OPASSIGN,
cpexpr( (expptr) newtemp ),
cpexpr( base ) );
base = (expptr) newtemp;
}
result = powtree( base, exp );
if ( storetemp != ENULL )
result = mkexpr( OPCOMMA, storetemp, result );
frexpr( p );
return ( result );
}
/*
* powtree( base, exp ) -- Create a tree of multiplications which computes
* base ** exp. The tree is built so that CSE will compact it if
* possible. The routine works by creating subtrees that compute
* exponents which are powers of two, then multiplying these
* together to get the result; this gives a log2( exp ) tree depth
* and lots of subexpressions which can be eliminated.
*/
LOCAL expptr powtree( base, exp )
expptr base;
register ftnint exp;
{
register expptr r = ENULL, r1;
register int i;
for ( i = 0; exp; ++i, exp >>= 1 )
if ( exp & 1 )
if ( i == 0 )
r = (expptr) cpexpr( base );
else {
r1 = powtree( base, 1 << (i - 1) );
r1 = mkexpr( OPSTAR, r1, cpexpr( r1 ) );
r = (r ? mkexpr( OPSTAR, r1, r ) : r1);
}
return ( r );
}
----------------------------------------------------------------
I made a couple other aesthetic changes to the rest of optim.c
as well, in expand() and buffmnmx():
----------------------------------------------------------------
*** /tmp/,RCSt1029088 Thu Jul 19 20:00:10 1984
--- /tmp/,RCSt2029088 Thu Jul 19 20:00:13 1984
*** 476,482
{
Addrp t;
expptr q;
! Addrp buffmnmx(), buffpower();
if (!p)
return (ENULL);
--- 487,493 -----
{
Addrp t;
expptr q;
! expptr buffmnmx(), buffpower();
if (!p)
return (ENULL);
***************
*** 592,598
* local version of routine putmnmx in putpcc.c, called by expand
*/
! LOCAL Addrp buffmnmx(p)
register expptr p;
{
int op, type;
--- 603,609 -----
* local version of routine putmnmx in putpcc.c, called by expand
*/
! LOCAL expptr buffmnmx(p)
register expptr p;
{
int op, type;
***************
*** 636,642
frtemp(newtemp);
frchain( &p0 );
! return ( (Addrp) result);
}
--- 647,653 -----
frtemp(newtemp);
frchain( &p0 );
! return (result);
}
----------------------------------------------------------------
The code that this generates looks like this (for the same
expressions which were displayed above):
----------------------------------------------------------------
movl $5,{i}
mull3 {i},{i},-4(fp)
mull3 {i},-4(fp),r0
mull3 -4(fp),-4(fp),r1
mull2 {i},r1
addl3 r1,r0,{i}
----------------------------------------------------------------
Even this is not optimal -- a multiply could have been saved if
the tree had been structured so that 'x ** 3' was a
subexpression in the tree for 'x ** 5'. (The algorithm favors
exponents of the form '2 ** n' or '2 ** (n-1)', which you can
see if you spend a little while looking at powtree().) But it's
simple, and it's also the best I have for now.
Donn Seeley University of Utah CS Dept donn at utah-cs.arpa
40 46' 6"N 111 50' 34"W (801) 581-5668 decvax!utah-cs!donn
More information about the Comp.bugs.4bsd.ucb-fixes
mailing list