memcpy versus assignment

Wed Dec 27 09:41:51 AEST 1989

In several books I've seen that assignment of structures is usually
more efficient than using memcpy(), at leant on most modern
processors.  I did a few experiments to see if this is true...using
the following short program, I attempted to extract the machine
code produced on different machines.

struct bozo {
    int one;
    char two;
    long three;
} foo, bar;

main()
{
    foo = bar;
    (void)memcpy((char *)&foo, (char *)&bar, sizeof(struct bozo));
}

On an 8086 CPU, the compiler - MSC5.1 (yuck!) - produces the
following code for the assignment when full optimization is on:

; foo = bar
	lea	di, WORD PTR[bp-8]	; foo
	lea	si, WORD PTR[bp-16]	; bar
	push	ss
	pop	es
	movsw				; the four movesw statements are more
	movsw				; space/speed efficient than a 
	movsw				; mov cx,sizeof(foo)/2
	movsw				; rep movsw combination....

On a VAX using gcc, the following code is produced:

; foo = bar;
	subl3 $76,fp,sp
	movab -64(fp),r1
	movab -76(fp),r0
	movl $12,r2
	movblk

The VAX naturally produces the more efficient code, but I would
imagine the 8086 would do just as good of a job with larger
structures, so that a
	mov cx, sizeof(struct bozo)/2 
	rep movsw 
could be used under appropriate circumstances.

However, this is only have the question.  Does the assignment win
over memcpy?   On the 8086, the following code is produced:

; (void)memcpy((char *)&foo, (char *)&bar, sizeof(struct foo));
	lea	ax, WORD PTR[bp-16]	; foo
	mov	WORD PTR[bp-18], ax
	mov	cx, 8
	lea	di, WORD PTR[bp-8]	; foo
	lea	si, WORD PTR[bp-16]	; bar
	mov	ax, ss
	shr	cx, 1
	rep	movsw
	adc	cx, cx
	rep	movsb

The compiler is smart enough to make memcpy an intrinsic function,
so as to avoid a costly call statement.  On the vax, a call to
memcpy (or in this case bcopy(), which is the same thing) was
produced, so I wasn't able to analyze the code directly. However,
using gcc on bcopy.c produces the following code:

.globl _bcopy
_bcopy:
	.word 0x0
	movl 4(fp),r4
	movl 8(fp),r3
	movl 12(fp),r2
	tstl r2
	jeql L1
	cmpl r4,r3
	jeql L1
L2:
	decl r2
	tstl r2
	jneq L2
L4:
	movl r3,r0
	addl2 $4,r3
	movl r4,r1
	addl2 $4,r4
	movl (r1),(r0)
	decl r2
	tstl r2
	jneq L4
	ret

Which, seems like quite a bit compared to the assignment.  However,
in almost all C code I have seen written, comments always state
something along the lines of "/* use memcpy for structures larger
than int */" which seems to go against the results shown above.
In _general_ what is the rule for the assignment of two large
structures?  memcpy vs. assignment?  Which is generally better?