memcpy versus assignment
D. Chadwick Gibbons
chad at csd4.csd.uwm.edu
Wed Dec 27 09:41:51 AEST 1989
In several books I've seen that assignment of structures is usually
more efficient than using memcpy(), at leant on most modern
processors. I did a few experiments to see if this is true...using
the following short program, I attempted to extract the machine
code produced on different machines.
struct bozo {
int one;
char two;
long three;
} foo, bar;
main()
{
foo = bar;
(void)memcpy((char *)&foo, (char *)&bar, sizeof(struct bozo));
}
On an 8086 CPU, the compiler - MSC5.1 (yuck!) - produces the
following code for the assignment when full optimization is on:
; foo = bar
lea di, WORD PTR[bp-8] ; foo
lea si, WORD PTR[bp-16] ; bar
push ss
pop es
movsw ; the four movesw statements are more
movsw ; space/speed efficient than a
movsw ; mov cx,sizeof(foo)/2
movsw ; rep movsw combination....
On a VAX using gcc, the following code is produced:
; foo = bar;
subl3 $76,fp,sp
movab -64(fp),r1
movab -76(fp),r0
movl $12,r2
movblk
The VAX naturally produces the more efficient code, but I would
imagine the 8086 would do just as good of a job with larger
structures, so that a
mov cx, sizeof(struct bozo)/2
rep movsw
could be used under appropriate circumstances.
However, this is only have the question. Does the assignment win
over memcpy? On the 8086, the following code is produced:
; (void)memcpy((char *)&foo, (char *)&bar, sizeof(struct foo));
lea ax, WORD PTR[bp-16] ; foo
mov WORD PTR[bp-18], ax
mov cx, 8
lea di, WORD PTR[bp-8] ; foo
lea si, WORD PTR[bp-16] ; bar
mov ax, ss
shr cx, 1
rep movsw
adc cx, cx
rep movsb
The compiler is smart enough to make memcpy an intrinsic function,
so as to avoid a costly call statement. On the vax, a call to
memcpy (or in this case bcopy(), which is the same thing) was
produced, so I wasn't able to analyze the code directly. However,
using gcc on bcopy.c produces the following code:
.globl _bcopy
_bcopy:
.word 0x0
movl 4(fp),r4
movl 8(fp),r3
movl 12(fp),r2
tstl r2
jeql L1
cmpl r4,r3
jeql L1
L2:
decl r2
tstl r2
jneq L2
L4:
movl r3,r0
addl2 $4,r3
movl r4,r1
addl2 $4,r4
movl (r1),(r0)
decl r2
tstl r2
jneq L4
ret
Which, seems like quite a bit compared to the assignment. However,
in almost all C code I have seen written, comments always state
something along the lines of "/* use memcpy for structures larger
than int */" which seems to go against the results shown above.
In _general_ what is the rule for the assignment of two large
structures? memcpy vs. assignment? Which is generally better?
More information about the Comp.lang.c
mailing list