Array indexes vs. Pointers
Dave Ciemiewicz
ciemo at bananapc.wpd.sgi.com.SGI.COM
Sat May 20 07:48:23 AEST 1989
To use pointers or array indexes, that is the question. The SGI (MIPS)
compilers for the 4D are optimization monsters. The optimizer is "smart"
enough to recognize special case use of array indexes and convert them to
pointers internally. This removes the infamous multiply within your loop
when using array indices. The fact of the matter is that there are cases
where use of indexes will generate more efficient code than the corresponding
code using pointers. The following is a C source file which demonstrates
this.
----- optimizer.c -------------------------------------------------------------
/* 1 */extern v3f(float [3]);
/* 2 */
/* 3 */dovertices_indexed(float v[][3], unsigned int len) {
/* 4 */ register unsigned int i;
/* 5 */
/* 6 */ for (i=0; i<len; i++) {
/* 7 */ v3f(v[i]);
/* 8 */ }
/* 9 */}
/* 10 */
/* 11 */dovertices_pointer(float v[][3], unsigned int len) {
/* 12 */ register float ** p;
/* 13 */
/* 14 */ for (p=v; p != v+len*3; p+=3) {
/* 15 */ v3f(*p);
/* 16 */ }
/* 17 */}
----- optimizer.c -------------------------------------------------------------
"cc -O2 -c optimizer.c" will generate an optimized object file "optimizer.o".
Use "dis optimizer.o" to disassemble the object code produced.
The array indexed code generates the following innerbody of the loop.
[test.c: 7] 0x38: 0c000000 jal v3f
* [test.c: 7] 0x3c: 02202021 move a0,s1
[test.c: 8] 0x40: 2610000c addiu s0,s0,12
[test.c: 8] 0x44: 0212082b sltu at,s0,s2
[test.c: 8] 0x48: 1420fffb bne at,zero,0x38
* [test.c: 8] 0x4c: 2631000c addiu s1,s1,12
a0 is the argument the function call v3f. Yes it does get loaded
*after* the jump to v3f is initiated. A jump takes more than a
cycle to compute so there is a delay. A delay slot is left open
for an executable instruction. There is enough time to load the
argument after initiating the jump but before completing the jump.
(Tricky huh?) s1 is the value of v[i]. s0 is the value of i*4.
s2 is the value of len*4. Guess where the multiply went. 6
intructions were generated for the loop. The time to execute is
6 cycles.
The pointer version generates the following innerbody of the loop.
[test.c: 15] 0x98: 8e040000 lw a0,0(s0)
[test.c: 15] 0x9c: 0c000000 jal v3f
* [test.c: 15] 0xa0: 00000000 nop
[test.c: 16] 0xa4: 2610000c addiu s0,s0,12
[test.c: 16] 0xa8: 1630fffb bne s1,s0,0x98
* [test.c: 16] 0xac: 00000000 nop
a0 is the argument to the function call v3f. This time the loading of
a0 is done from memory instead of a copy. This load has a delay
slot associated with it that the jump fills. The jump delay slot
is filled by a no-op. s0 is p. s1 is v+len*3. The final
instruction is a no-op to fill the branch delay slot. Only 4 "real"
instructions were generated for the innerloop. The 2 no-ops could
be replaced by other instructions in a different code example.
The time to execute for this example is the same as before, 6 cycles.
However, there is a potential for a data cache miss with the load
instruction that would be some number of cycles for the first load,
depending on the CPU type.
With the different setups involved for the two type of array traversals
-- indexes versus pointers -- and the potentials for data cache misses,
it is difficult to clearly say one style is superior to another. The
advantages of one style or another is dependent on the instructions performed
in the loop and even on the CPU executing the loop. From an asthetics point
of view, I prefer the use of array indexes. Array indexes are easier to
understand than pointers.
My overall recommendation is to avoid language tricks when developing code.
After the code is working, use the profiler for locating areas that might
benefit from using the tricks.
--
Dave (commonplace) "Boldly going where no one cares to go."
Ciemiewicz (incomprehensible)
ciemo (infamous)
More information about the Comp.sys.sgi
mailing list