structure alignment question

Mon Sep 22 06:11:24 AEST 1986

In article <1705 at mcc-pp.UUCP> tiemann at mcc-pp.UUCP (Michael Tiemann) writes:
>... The last 68000 compiler I used aligned strings on WORD boundaries.
>This would cost one byte per string, half the time. But there was
>a big speed payoff: I could do word operations in my strnlen,
>strncmp, strncpy, and whatever other string processing functions
>I happened to write. ... all this "fast" code actually runs slower
>than a "dumb" byte-copy model [on a Sun-3], because the 68020 faults
>itself to death reading in 32-bit words on odd boundaries, and
>doesn't run at all on a Sun-2 because the 68010 can read odd words.

(Does the 68020 really fault?  I thought it just did two bus accesses.)

It is not difficult to do copies in word mode iff the strings
are aligned:

	| Sun mnenonics

	| /*LINTLIBRARY*/
	| strcpy(to, from) char *to, *from; { *to = *from; return (to); }
	| /*UNTESTED!*/
		ENTRY(strcpy)
	TO	=	a0		| I think this works
	FROM	=	a1
		movl	sp@(4),TO	| to
		movl	sp@(8),FROM	| from
	| I forget if this is legal.  If not, copy to d0 first.
		btst	#0,TO		| test for odd destination
		bnes	odd0		| handle odd dst, unknown src
		btst	#0,FROM		| test for odd source
		bnes	hardway		| handle even dst, odd src

	| both addresses are even; do a fast strcpy
	fastcopy:
		movw	FROM at +,d0	| grab entire word
		movw	d0,d1		| need to test high byte first
		lsrw	#8,d1		| throw out low byte
		beqs	fastend		| if high byte zero, go terminate dst
		movw	d0,TO at +		| copy entire word
		tstb	d0		| and see if we are now done
		bnes	fastcopy	| do more if not
		movl	sp@(4),d0	| set return value
		rts			| and return
	fastend:
		movql	#0,d0
		movb	d0,TO@		| terminate destination string
		movl	sp@(4),d0	| set return value
		rts			| and return

	odd0:
		btst	#0,FROM		| test for odd source
		beqs	hardway		| handle odd dst, even src
		movb	FROM at +,TO at +	| copy one byte to make even
		bnes	fastcopy	| and do rest with fast copy
		movl	sp@(4),d0	| set return value
		rts			| and return

	| one address is even, the other odd, so do it a byte at a time.
	hardway:
		movl	TO,d0		| set return value
	hardloop:
		movb	FROM at +,TO at +	| copy ...
		bnes	hardloop	| until we copy a null
		rts			| return

I wonder, though, if this is truly faster.  Should not a movb/bnes
pair run in loop mode?  (Perhaps not; `dbcc' loops do, though, and
one could use a dbra surrounded by a bit of extra logic.)  Machine
dependent `fast' code is often CPU dependent as well, and one must
be prepared to modify marked inner loops when moving among implem-
entations of one architecture.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris at umcp-cs		ARPA:	chris at mimsy.umd.edu