[4bsd-f77 #25] f77 seriously breaks exponentiations

Wed Aug 1 10:47:34 AEST 1984

From: Donn Seeley <donn at utah-cs.arpa>

Subject: f77 seriously breaks certain expressions with exponentiations
Index:	usr.bin/f77/src/f77pass1/optcse.c 4.2BSD

Description:
	If an expression or series of expressions contain
	subexpressions of the form 'x ** N(i)' where N(i) are integer
	constants, f77 can generate code that turns all such
	expressions into 'x ** M' where M is one such N(i).  This
	bug was brought to my attention by Sean Byrne.

Repeat-By:
	Create the following program and compile it with the
	optimizer on:

	----------------------------------------------------------------
		program xmpl

		integer i

		i = 5
		i = i ** 3 + i ** 5

		print *, i

		stop
		end
	----------------------------------------------------------------

	When run, the program prints '250' when it should print '3250'.

Fix:
	Here's the relevant material from the assembly code that's
	generated, prettied up slightly:

	----------------------------------------------------------------
		movl	$5,{i}
		movl	{i},-4(fp)
		mull3	-4(fp),-4(fp),r0	# r0 <- i ** 2
		mull3	-4(fp),r0,-8(fp)	# -8(fp) <- i ** 3
		movl	{i},-12(fp)
		mull2	-12(fp),-12(fp)		# -12(fp) <- i ** 2
		addl3	-8(fp),-8(fp),{i}	# i <- i ** 3 + i ** 3
	----------------------------------------------------------------

	Notice how the value in -12(fp) is developed but never gets
	used for anything.

	If you turn off common subexpression elimination and copy
	propagation in the compiler by supplying the flag '-d7,13' as
	an argument you get the following correct code instead (you
	need to turn off copy propagation if you haven't installed my
	earlier copy propagation bug fix, otherwise turning off CSE
	isn't enough!):

	----------------------------------------------------------------
		movl	$5,{i}
		movl	{i},-4(fp)
		movl	-4(fp),-8(fp)
	+	mull3	-4(fp),-4(fp),r0	# r0 <- i ** 2
	+	mull3	-8(fp),r0,-12(fp)	# -12(fp) <- i ** 3
		movl	{i},-16(fp)
		movl	-16(fp),-20(fp)
	-	mull2	-16(fp),-16(fp)		# -16(fp) <- i ** 2
	*	mull3	-16(fp),-16(fp),r0	# r0 <- i ** 4
	*	mull3	-20(fp),r0,-24(fp)	# -24(fp) <- i ** 5
		addl3	-24(fp),-12(fp),{i}	# i <- i ** 3 + i ** 5
	----------------------------------------------------------------

	What is happening in the former example is that the common
	subexpression elimination code has noticed that the two
	computations marked '+' and '*' in the latter example are
	identical; hence it eliminates the second subexpression in
	favor of the first.  Unfortunately the two subexpressions are
	NOT computing the same value, because the assignment which
	results in the instruction marked '-' in the latter example
	changes the value of the temporary.

	The problem is that the CSE code doesn't realize that the '*='
	operator is different from the '=' operator.  The '*=' is
	generated by the code which handles the inline expansion of
	exponentiations with constant integer exponents, and it is a
	'*=' expression which produces the instruction marked '-'
	above.  If it were an '=' expression instead of a '*=', the CSE
	would be correct.  The fix is to make the CSE code understand
	that the lvalue of a '*=' operator does not receive the value
	of the right hand side (in function scantree() in optcse.c):

	----------------------------------------------------------------
	*** /tmp/,RCSt1028888	Thu Jul 19 19:21:54 1984
	--- optcse.c	Thu Jul 19 12:03:55 1984
	***************
	*** 595,602
		
			switch (p->exprblock.opcode) {
				case OPASSIGN :
	- 			case OPPLUSEQ :
	- 			case OPSTAREQ :
					{
					Addrp ap;
		

	--- 604,609 -----
		
			switch (p->exprblock.opcode) {
				case OPASSIGN :
					{
					Addrp ap;
	  
	***************
	*** 609,614
					}
					new = newnode(p,idp->initval,NULL,NULL);
					appendnode(new);
					new->rs = new;
					return(new->rs);
					}

	--- 616,638 -----
					}
					new = newnode(p,idp->initval,NULL,NULL);
					appendnode(new);
	+ 				new->rs = new;
	+ 				return(new->rs);
	+ 				}
	+ 
	+ 			/*
	+ 			 * Don't optimize these...  they're a real hassle.
	+ 			 */
	+ 			case OPPLUSEQ :
	+ 			case OPSTAREQ :
	+ 				{
	+ 				Addrp ap;
	+ 
	+ 				ap = (Addrp) p->exprblock.leftp;
	+ 				idp = findid(ap);
	+ 				killdepnodes(idp);
	+ 				idp->assgnval = NULL;
	+ 				new = newnode(p,lnode,rnode,NULL);
					new->rs = new;
					return(new->rs);
					}
	----------------------------------------------------------------

	As you can see by the comment, the effect of this fix is to
	largely prevent CSE between exponentiations.  But a substantial
	amount of time can be wasted doing extra multiplies in the
	computation of (say) a polynomial if you don't take advantage
	of the fact that (say) 'x ** 8' can be computed from 'x ** 4 *
	x ** 4', where 'x ** 4' was computed for an earlier term.

	I have two different attacks on this problem.  Both attacks
	involve the removal of '*=' operators so that CSE can proceed.
	The first line of attack is to produce a new version of the
	code that does inline expansion of exponentiations, which
	follows the same strategy as the old code but uses simple
	multiplies instead of '*='.  This version has the problem that
	CSE can't remove some of the computations, because it doesn't
	realize that some of the temporaries are only going to be used
	to compute the exponentiation and can be thrown away if their
	value isn't needed.  The second line of attack is to avoid
	creating any temporaries at all in the exponentation code, and
	let CSE do all the work of pruning the expression tree.  This
	has the advantage of drastically simplifying the inline
	expansion code, but there is a cost.  The problem is that the
	expression tree can get very large (on the order of the size of
	the exponent) before CSE gets a chance to prune it (to
	something closer to the log of the exponent in size).  What I
	did is simply pick a largest exponent beyond which
	exponentiation is not done inline.  This is not perhaps as
	reasonable as it sounds, because the worst case exponent (2 **
	31 - 1) only requires 60-odd multiplies under the previous
	system, whereas the new system requires a subroutine call.
	Since most constant integer exponents are small, though, this
	probably won't hurt very much.

	To install the latter fix, replace the routine buffpower() in
	optim.c with the following two routines:

	----------------------------------------------------------------
	/*
	 * Called by expand() to eliminate exponentiations to integer constants.
	 */
	LOCAL expptr buffpower( p )
		expptr p;
	{
		expptr base;
		Addrp newtemp;
		expptr storetemp = ENULL;
		expptr powtree();
		expptr result;
		ftnint exp;

		if ( ! ISICON( p->exprblock.rightp ) )
			fatal( "buffpower: bad non-integer exponent" );

		base = p->exprblock.leftp;
		exp = p->exprblock.rightp->constblock.const.ci;
		if ( exp < 2 )
			fatal( "buffpower: bad exponent less than 2" );

		if ( exp > 64 ) {
			/*
			 * Let's be reasonable, here...
			 *
			 * We need to fox the constant exponent to get mkpower() to
			 *	do its job properly.
			 */
			newtemp = mktemp( TYINT, ENULL );
			storetemp = mkexpr( OPASSIGN,
				      cpexpr( (expptr) newtemp ),
				      mkintcon( exp ) );
			result = mkexpr( OPPOWER, cpexpr( base ), cpexpr( newtemp ) );
			result = fixexpr( result );
			result = mkexpr( OPCOMMA, storetemp, result );
			return ( result );
		}

		/*
		 * If the base is not a simple variable, evaluate it and copy the
		 *	result into a temporary.
		 */
		if ( ! (base->tag == TADDR && ISCONST( base->addrblock.memoffset )) ) {
			newtemp = mktemp( base->headblock.vtype, PNULL );
			storetemp = mkexpr( OPASSIGN,
				      cpexpr( (expptr) newtemp ),
				      cpexpr( base ) );
			base = (expptr) newtemp;
		}

		result = powtree( base, exp );

		if ( storetemp != ENULL )
			result = mkexpr( OPCOMMA, storetemp, result );
		frexpr( p );

		return ( result );
	}


	/*
	 * powtree( base, exp ) -- Create a tree of multiplications which computes
	 *	base ** exp.  The tree is built so that CSE will compact it if
	 *	possible.  The routine works by creating subtrees that compute
	 *	exponents which are powers of two, then multiplying these
	 *	together to get the result; this gives a log2( exp ) tree depth
	 *	and lots of subexpressions which can be eliminated.
	 */
	LOCAL expptr powtree( base, exp )
		expptr base;
		register ftnint exp;
	{
		register expptr r = ENULL, r1;
		register int i;

		for ( i = 0; exp; ++i, exp >>= 1 )
			if ( exp & 1 )
				if ( i == 0 )
					r = (expptr) cpexpr( base );
				else {
					r1 = powtree( base, 1 << (i - 1) );
					r1 = mkexpr( OPSTAR, r1, cpexpr( r1 ) );
					r = (r ? mkexpr( OPSTAR, r1, r ) : r1);
				}

		return ( r );
	}
	----------------------------------------------------------------

	I made a couple other aesthetic changes to the rest of optim.c
	as well, in expand() and buffmnmx():

	----------------------------------------------------------------
	*** /tmp/,RCSt1029088	Thu Jul 19 20:00:10 1984
	--- /tmp/,RCSt2029088	Thu Jul 19 20:00:13 1984
	*** 476,482
	  {
	  Addrp t;
	  expptr q;
	! Addrp buffmnmx(), buffpower();
	  
	  if (!p)
		return (ENULL);

	--- 487,493 -----
	  {
	  Addrp t;
	  expptr q;
	! expptr buffmnmx(), buffpower();
	  
	  if (!p)
		return (ENULL);
	***************
	*** 592,598
	   *  local version of routine putmnmx in putpcc.c, called by expand
	   */
	  
	! LOCAL Addrp buffmnmx(p)
	  register expptr p;
	  {
	  int op, type;

	--- 603,609 -----
	   *  local version of routine putmnmx in putpcc.c, called by expand
	   */
	  
	! LOCAL expptr buffmnmx(p)
	  register expptr p;
	  {
	  int op, type;
	***************
	*** 636,642
	  frtemp(newtemp);
	  frchain( &p0 );
	  
	! return ( (Addrp) result);
	  }
	  
	  
	--- 647,653 -----
	  frtemp(newtemp);
	  frchain( &p0 );
	  
	! return (result);
	  }
	  
	  
	----------------------------------------------------------------

	The code that this generates looks like this (for the same
	expressions which were displayed above):

	----------------------------------------------------------------
		movl	$5,{i}
		mull3	{i},{i},-4(fp)
		mull3	{i},-4(fp),r0
		mull3	-4(fp),-4(fp),r1
		mull2	{i},r1
		addl3	r1,r0,{i}
	----------------------------------------------------------------

	Even this is not optimal -- a multiply could have been saved if
	the tree had been structured so that 'x ** 3' was a
	subexpression in the tree for 'x ** 5'.  (The algorithm favors
	exponents of the form '2 ** n' or '2 ** (n-1)', which you can
	see if you spend a little while looking at powtree().) But it's
	simple, and it's also the best I have for now.

Donn Seeley    University of Utah CS Dept    donn at utah-cs.arpa
40 46' 6"N 111 50' 34"W    (801) 581-5668    decvax!utah-cs!donn