Is this (gasp) a chip bug?
david.e.bodenstab
imdave at cbnewsc.att.com
Tue Jul 24 08:33:41 AEST 1990
Here is a little program derived from a problem we tracked down in a
driver:
main()
{
unsigned y = 0x7ffff0ff;
unsigned x = 0x7ffff000;
while (1) {
if ( ((int)(y - x)) < 0 )
printf("WRONG x=%x y=%x (y - x)=%x\n",x,y,y-x);
x++;
y++;
}
}
This program `fails' on both a 386 and another cpu in-house -- the
range of values for the `failure' are:
0x80000000 <= y <= 0x800000fe
0x7fffff01 <= x <= 0x7fffffff
The assembly instructions generated for both cpu's were:
sub x from y
bge skip # branch greater than or equal
push args for printf...
call printf
skip:
At first glance this seems inexplicable; (y-x) will always be
0xff, and 0xff is *never* < 0. So the next thing looked at was the
condition code bits (this was not done on the 386). The condition
code bits after 0x80000000 - 0x7fffff01 were VNCZ=1100 (V=overflow,
N=negative, C=carry, Z=zero)
The BGE instruction takes the branch if (Z==1 || N==0). The program
does NOT take the branch when the values of x & y have the above values.
I don't understand this. Admittedly, I'm digging the following up
from my 20 year old memory of two's compliment arithmetic.
As I recall, the subtraction y-x in two's compliment arithmetic consists of:
take the two's compliment of x
add to y
if there is a carry out of the sign bit, set C
if the carry into the sign bit does not match the carry out
of the sign bit, set V
if the sign bit == 1, set N
if all bits are zero, set Z
So, x = 0x7fffff01 = 0111 1111 1111 1111 1111 1111 0000 0001
1000 0000 0000 0000 0000 0000 1111 1111 (2's of x)
y = 0x80000000 = 1000 0000 0000 0000 0000 0000 0000 0000
1 0000 0000 0000 0000 0000 0000 1111 1111 (sum)
| |
carry--+ |
sign bit-+
there was a carry from the high order bit, so set C=1
there was no carry into the sign bit and there was a carry from
the high order bit, so V=1
the sign bit is zero, so N=0
the result is not zero, so Z=0
The BGE instruction takes the branch if (Z==1 || N==0), so the branch
should be taken. *BUT*, when I modify the program to save and print
the condition code bits, I find C=0, V=1, N=1, Z=0 immediately after
the subtraction!
Here's my analysis of the subtraction for the preceeding iteration:
x = 0x7fffff00 = 0111 1111 1111 1111 1111 1111 0000 0000
1000 0000 0000 0000 0000 0001 0000 0000 (2's of x)
y = 0x7fffffff = 0111 1111 1111 1111 1111 1111 1111 1111
1 0000 0000 0000 0000 0000 0000 1111 1111 (sum)
| |
carry--+ |
sign bit-+
there was a carry from the high order bit, so set C=1
there was a carry into the sign bit and there was a carry from
the high order bit, so V=0
the sign bit is zero, so N=0
the result is not zero, so Z=0
In this case the branch is taken because (Z==1 || N==0).
What really concerns me is that I (and probably anyone else) would, if
given the task of writing an assembler routine to subtract two unsigned
values and compare for a result >= 0, write the *exact* same sequence of
instructions! This idiosyncracy of SUB/BGE would probably not be uncovered by
testing unless the values happened to be such that the program fails --
and then tracking the problem down would probably be very time-consuming.
We "solved" our problem with `if ( (int)((y - x) & 0xffffffff) < 0 )' --
this reset the condition codes correctly, and the program behaves as
expected.
*I'm* not going to take out an ad and claim the 386 is broken! :-)
But what's going on here? The above program *works* on a SUN, a 80286
[msdos and unix, ints and longs], and also a MIPS cpu. Is it the compiler?
Is it the chip? Does the program work on *your* cpu?
Thoughtful suggestions will be appreciated -- flames will be ignored.
Thanks!
Dave Bodenstab
...att!iwtmx!imdave
More information about the Comp.lang.c
mailing list