Optimal for loop on the 68020.
Inst.f.Techn.Informatik
inst182 at tuvie
Fri Jun 9 21:58:00 AEST 1989
I tried the same kinds of for loops on a XT-compatible (8086, 8MHz)
using Turbo-C 2.0. (tcc -G -O -Z -S)
I also tested
for (i = COUNT; i --;);
because I thought, it would be the fastest.
For integers, the fastest loop was
for (i = COUNT; -- i >= 0;);
which turned out to be 1.7 times as fast as the slowest. It is
remarkable, that this is the same factor as Jef Poskanzer has timed on
his Sun.
#define COUNT 30000
----------------------------------------------------------------
for (i = 0; i < COUNT; i ++);
127 ms
xor si,si
jmp short @5
@4:
inc si
@5:
cmp si,30000
jl @4
Comparation against a non-zero value
----------------------------------------------------------------
for (i = 0; i < COUNT; ++ i);
115 ms
xor si,si
jmp short @9
@8:
inc si
@9:
cmp si,30000
jl @8
Can anybody explain, why this is faster than the above ????
----------------------------------------------------------------
for (i = 0; ++ i <= COUNT;);
132 ms
xor si,si
@13:
inc si
mov ax,si
cmp ax,30000
jle @13
Moving si to ax seems rather useless to me.
----------------------------------------------------------------
for (i = 0; i++ < COUNT;);
132 ms
xor si,si
@17:
mov ax,si
inc si
cmp ax,30000
jl @17
Ok, here it is neccessary
----------------------------------------------------------------
for (i = COUNT; i > 0; i --);
94 ms
mov si,30000
jmp short @21
@20:
dec si
@21:
or si,si
jg @20
The separation of decrement and comparation causes an additional or
----------------------------------------------------------------
for (i = COUNT; i > 0; --i);
93 ms
mov si,30000
jmp short @25
@24:
dec si
@25:
or si,si
jg @24
Looks exactly like the version above. The 1 ms difference could be a
timing error.
----------------------------------------------------------------
for (i = COUNT; --i >= 0;);
77 ms
mov si,30000
@29:
dec si
jge @29
This surely is the optimal loop
----------------------------------------------------------------
for (i = COUNT; i-- > 0;);
115 ms
mov si,30000
@33:
mov ax,si
dec si
or ax,ax
jg @33
The former value of the counter has to be saved for the comparison.
----------------------------------------------------------------
for (i = COUNT; i--;);
115 ms
mov si,30000
@37:
mov ax,si
dec si
or ax,ax
jne @37
================================================================
Then I tried the same with a long counter (instead of int), which
changed the results radically. Not only is the difference between the
fastest and the slowest loop much lesser (The factor is now only
1.13), but version 7, which has been the fastest for integers is now
the second slowest, and version 9 now is the fastest.
Versions 3 and 4 have remained at the slow end of the charts.
#define COUNT 60000
----------------------------------------------------------------
for (i = 0; i < COUNT; i ++);
938 ms
mov word ptr [bp-2],0
mov word ptr [bp-4],0
jmp short @5
@4:
add word ptr [bp-4],1
adc word ptr [bp-2],0
@5:
cmp word ptr [bp-2],0
jl @4
jne @38
cmp word ptr [bp-4],-5536
jb @4
@38:
----------------------------------------------------------------
for (i = 0; i < COUNT; ++ i);
938 ms
mov word ptr [bp-2],0
mov word ptr [bp-4],0
jmp short @9
@8:
add word ptr [bp-4],1
adc word ptr [bp-2],0
@9:
cmp word ptr [bp-2],0
jl @8
jne @39
cmp word ptr [bp-4],-5536
jb @8
@39:
----------------------------------------------------------------
for (i = 0; ++ i <= COUNT;);
1056 ms
mov word ptr [bp-2],0
mov word ptr [bp-4],0
@13:
add word ptr [bp-4],1
adc word ptr [bp-2],0
mov dx,word ptr [bp-2]
mov ax,word ptr [bp-4]
or dx,dx
jl @13
jg @40
cmp ax,-5536
jbe @13
@40:
Now the compiler seems to have forgotten that a memory location can be
compared against a constant.
----------------------------------------------------------------
for (i = 0; i++ < COUNT;);
1009 ms
mov word ptr [bp-2],0
mov word ptr [bp-4],0
@17:
mov dx,word ptr [bp-2]
mov ax,word ptr [bp-4]
add word ptr [bp-4],1
adc word ptr [bp-2],0
or dx,dx
jl @17
jne @41
cmp ax,-5536
jb @17
@41:
----------------------------------------------------------------
for (i = COUNT; i > 0; i --);
938 ms
mov word ptr [bp-2],0
mov word ptr [bp-4],-5536
jmp short @21
@20:
sub word ptr [bp-4],1
sbb word ptr [bp-2],0
@21:
cmp word ptr [bp-2],0
jg @20
jne @42
cmp word ptr [bp-4],0
ja @20
@42:
Now it seems to have regained this knowledge and is fast as before.
Comparing against zero is not faster here than comparing against
another constant.
----------------------------------------------------------------
for (i = COUNT; i > 0; -- i);
938 ms
mov word ptr [bp-2],0
mov word ptr [bp-4],-5536
jmp short @25
@24:
sub word ptr [bp-4],1
sbb word ptr [bp-2],0
@25:
cmp word ptr [bp-2],0
jg @24
jne @43
cmp word ptr [bp-4],0
ja @24
@43:
----------------------------------------------------------------
for (i = COUNT; --i >= 0;);
1051 ms
mov word ptr [bp-2],0
mov word ptr [bp-4],-5536
@29:
sub word ptr [bp-4],1
sbb word ptr [bp-2],0
mov dx,word ptr [bp-2]
mov ax,word ptr [bp-4]
or dx,dx
jg @29
jl @44
or ax,ax
jae @29
@44:
Again, the compiler fetches the counter into registers before
comparing. Here comparing against zero saves a little time.
But letting the counter in memory would have saved more.
----------------------------------------------------------------
for (i = COUNT; i-- > 0;);
1004 ms
mov word ptr [bp-2],0
mov word ptr [bp-4],-5536
@33:
mov dx,word ptr [bp-2]
mov ax,word ptr [bp-4]
sub word ptr [bp-4],1
sbb word ptr [bp-2],0
or dx,dx
jg @33
jne @45
or ax,ax
ja @33
@45:
----------------------------------------------------------------
for (i = COUNT; i--;);
934 ms
mov word ptr [bp-2],0
mov word ptr [bp-4],-5536
@37:
mov dx,word ptr [bp-2]
mov ax,word ptr [bp-4]
sub word ptr [bp-4],1
sbb word ptr [bp-2],0
or dx,ax
jne @37
Checking a long for *equality* to zero saves lots of time, so the
overhead for loading the registers is more than compensated.
----------------------------------------------------------------
???
374 ms
mov si,0
mov di,30000
@_l:
mov dx,si
mov ax,di
sub di,1
sbb si,0
or dx,ax
jne @_l
This would have been the optimal loop, but Turbo-C does not put long
variables into registers.
================================================================
Please email to address below, because the account I was posting this from is not
mine.
_______________________________________________________________
| __ | |
| | | \ | Peter J. Holzer |
| |___|__/ | Technische Universitaet Wien |
| | | | |
| | | | ...!uunet!mcvax!tuvie!asupa!honey!hjphr |
| ____/ |--------------------------------------------------|
| | Think of it as evolution in action -- Tony Rand |
|____________|__________________________________________________|
More information about the Comp.unix.wizards
mailing list