Is this (gasp) a chip bug?

Tue Jul 24 08:33:41 AEST 1990

Here is a little program derived from a problem we tracked down in a
driver:

	main()
	{
	    unsigned y = 0x7ffff0ff;
	    unsigned x = 0x7ffff000;

	    while (1) {
		if ( ((int)(y - x)) < 0 )
		    printf("WRONG x=%x y=%x (y - x)=%x\n",x,y,y-x);
		x++;
		y++;
	    }
	}

This program `fails' on both a 386 and another cpu in-house -- the
range of values for the `failure' are:

	0x80000000 <= y <= 0x800000fe
	0x7fffff01 <= x <= 0x7fffffff

The assembly instructions generated for both cpu's were:

	sub	x from y
	bge	skip		# branch greater than or equal
	push	args for printf...
	call	printf
skip:

At first glance this seems inexplicable; (y-x) will always be
0xff, and 0xff is *never* < 0.  So the next thing looked at was the
condition code bits (this was not done on the 386).  The condition
code bits after 0x80000000 - 0x7fffff01 were VNCZ=1100 (V=overflow,
N=negative, C=carry, Z=zero)

The BGE instruction takes the branch if (Z==1 || N==0).  The program
does NOT take the branch when the values of x & y have the above values.

I don't understand this.  Admittedly, I'm digging the following up
from my 20 year old memory of two's compliment arithmetic.

As I recall, the subtraction y-x in two's compliment arithmetic consists of:

	take the two's compliment of x
	add to y
	if there is a carry out of the sign bit, set C
	if the carry into the sign bit does not match the carry out
		of the sign bit, set V
	if the sign bit == 1, set N
	if all bits are zero, set Z

So,	x = 0x7fffff01 = 0111 1111 1111 1111 1111 1111 0000 0001
			 1000 0000 0000 0000 0000 0000 1111 1111  (2's of x)
	y = 0x80000000 = 1000 0000 0000 0000 0000 0000 0000 0000
		       1 0000 0000 0000 0000 0000 0000 1111 1111  (sum)

		       | |
		carry--+ |
		sign bit-+

	there was a carry from the high order bit, so set C=1
	there was no carry into the sign bit and there was a carry from
		the high order bit, so V=1
	the sign bit is zero, so N=0
	the result is not zero, so Z=0

The BGE instruction takes the branch if (Z==1 || N==0), so the branch
should be taken.  *BUT*, when I modify the program to save and print
the condition code bits, I find C=0, V=1, N=1, Z=0 immediately after
the subtraction!

Here's my analysis of the subtraction for the preceeding iteration:

	x = 0x7fffff00 = 0111 1111 1111 1111 1111 1111 0000 0000
			 1000 0000 0000 0000 0000 0001 0000 0000  (2's of x)
	y = 0x7fffffff = 0111 1111 1111 1111 1111 1111 1111 1111
		       1 0000 0000 0000 0000 0000 0000 1111 1111  (sum)

		       | |
		carry--+ |
		sign bit-+

	there was a carry from the high order bit, so set C=1
	there was a carry into the sign bit and there was a carry from
		the high order bit, so V=0
	the sign bit is zero, so N=0
	the result is not zero, so Z=0

In this case the branch is taken because (Z==1 || N==0).

What really concerns me is that I (and probably anyone else) would, if
given the task of writing an assembler routine to subtract two unsigned
values and compare for a result >= 0, write the *exact* same sequence of
instructions!  This idiosyncracy of SUB/BGE would probably not be uncovered by
testing unless the values happened to be such that the program fails --
and then tracking the problem down would probably be very time-consuming.
We "solved" our problem with `if ( (int)((y - x) & 0xffffffff) < 0 )' --
this reset the condition codes correctly, and the program behaves as
expected.

*I'm* not going to take out an ad and claim the 386 is broken! :-)

But what's going on here?  The above program *works* on a SUN, a 80286
[msdos and unix, ints and longs], and also a MIPS cpu.  Is it the compiler?
Is it the chip?  Does the program work on *your* cpu?

Thoughtful suggestions will be appreciated -- flames will be ignored.
Thanks!

Dave Bodenstab
...att!iwtmx!imdave