A study in code optimization in C
Bruce Worden
bruce at seismo.gps.caltech.edu
Sun Jul 29 06:38:00 AEST 1990
In article <1349 at proto.COM> joe at proto.COM (Joe Huffman) writes:
>In article <1990Jul26.144134.16053 at ux1.cso.uiuc.edu>, mcdonald at aries.scs.uiuc.edu (Doug McDonald) writes:
>> In article <133 at smds.UUCP> rh at smds.UUCP (Richard Harter) writes:
>> >
>> >The macro shown below is an optimized memory to memory copy macro.
>> >It is probably faster than memcopy on your machine -- I have checked
>> >it on several machines and have always found it to be faster.
>> !!!!!!
>> Oh My!.
>> Time on my computer, in seconds, for 1000 copies of a 20 kilobyte array:
>> His code library memcpy
>> Compiler 1:
>> (chars) 12.6 2.7
>> (ints) 6.9 2.7
>> Compiler 2:
>> (chars) 23.6 1.3
>> (ints) 6.9 1.3
>[Stuff deleted... compilers were Microsoft and Microway NDPC, machine was
>20 MHz 386]
>
>I just ran it on a 20 MHz 386 running SCO UNIX. The timing were done with
>5000 copies but then divided by 5 to make the numbers comparable.
> His code library memcpy
>SCO supplied MSC 5.1
> (chars) 14.0 2.05
>Zortech
> 386 code generator not available 1.80
Here are the results on some machines I could find the other day. The
compilers are the native compilers unless otherwise stated. I used
whatever compiler optimizations I could. 20kbyte arrays, 1000 copies:
Sun Sparcstation 1+
Him memcpy
chars: 7.6 2.0
ints: 2.0 2.0
Sun 4/280
Him memcpy
chars: 9.8 2.8
ints: 2.5 2.8
Sun Sparcstation SLC
Him memcpy
chars: 9.9 2.6
ints: 2.5 2.6
Sun 386i
Him memcpy
chars: 9.5 2.6
ints: 2.4 2.6
Sun 3/160
Him memcpy
chars: 13.7 4.5
ints: 3.4 4.5
Inmos T800 (Meiko, 25MHz, kind-of unfair because of block_copy instruction)
Him memcpy
chars: 37.6 1.6
ints: 8.4 1.6
i860 (Meiko, 40MHz, Green Hills C-I860 1.8.5, beta assembler 1.41, beta
linker 1.2)
Him memcpy
chars: 2.1 3.9
ints: 0.9 3.9
Convex C120 (Vector--yes his code vectorizes nicely, memcpy not available,
used bcopy)
Him memcpy
chars: 3.0 1.0
ints: 1.0 1.0
Convex C120 (Scalar, memcpy not available, used bcopy)
Him memcpy
chars: 28.4 1.5
ints: 7.5 1.5
BBN TC2000 (Motorola 88000-based, Green Hills C-88000 2.35(1.8.4))
Him memcpy
chars: 10.3 12.0
ints: 4.9 12.0
In general, I'd say Richard's code does a pretty good job when moving int's,
and also when compared to young machines (the BBN and the Meiko i860.)
In addition, his code is about 20% faster than a simple "for" loop on my
Sparc 1+, so it illustrates a useful principle as well. I intend to
use it in some selected applications, thanks for posting it.
BIG TIME DISCLAIMER: I in no way intended this to be a comparison of
different machines, but of the performance of a piece of C code on each of
several different machines. There are a lot of ways to do timings, and most
of them aren't very good, so please don't flame me if I didn't do justice to
some machine's absolute performance, it is the relative timings that matter.
If I screwed that up, flame away (though a nice note explaining the error
might be more instructive.)
Bruce
P.S. For timing I used getusecclock() on the BBN, ticks() on the Meiko's, and
getrusage() on everything else.
More information about the Comp.lang.c
mailing list