structure alignment question
Chris Torek
chris at umcp-cs.UUCP
Mon Sep 22 06:11:24 AEST 1986
In article <1705 at mcc-pp.UUCP> tiemann at mcc-pp.UUCP (Michael Tiemann) writes:
>... The last 68000 compiler I used aligned strings on WORD boundaries.
>This would cost one byte per string, half the time. But there was
>a big speed payoff: I could do word operations in my strnlen,
>strncmp, strncpy, and whatever other string processing functions
>I happened to write. ... all this "fast" code actually runs slower
>than a "dumb" byte-copy model [on a Sun-3], because the 68020 faults
>itself to death reading in 32-bit words on odd boundaries, and
>doesn't run at all on a Sun-2 because the 68010 can read odd words.
(Does the 68020 really fault? I thought it just did two bus accesses.)
It is not difficult to do copies in word mode iff the strings
are aligned:
| Sun mnenonics
| /*LINTLIBRARY*/
| strcpy(to, from) char *to, *from; { *to = *from; return (to); }
| /*UNTESTED!*/
ENTRY(strcpy)
TO = a0 | I think this works
FROM = a1
movl sp@(4),TO | to
movl sp@(8),FROM | from
| I forget if this is legal. If not, copy to d0 first.
btst #0,TO | test for odd destination
bnes odd0 | handle odd dst, unknown src
btst #0,FROM | test for odd source
bnes hardway | handle even dst, odd src
| both addresses are even; do a fast strcpy
fastcopy:
movw FROM at +,d0 | grab entire word
movw d0,d1 | need to test high byte first
lsrw #8,d1 | throw out low byte
beqs fastend | if high byte zero, go terminate dst
movw d0,TO at + | copy entire word
tstb d0 | and see if we are now done
bnes fastcopy | do more if not
movl sp@(4),d0 | set return value
rts | and return
fastend:
movql #0,d0
movb d0,TO@ | terminate destination string
movl sp@(4),d0 | set return value
rts | and return
odd0:
btst #0,FROM | test for odd source
beqs hardway | handle odd dst, even src
movb FROM at +,TO at + | copy one byte to make even
bnes fastcopy | and do rest with fast copy
movl sp@(4),d0 | set return value
rts | and return
| one address is even, the other odd, so do it a byte at a time.
hardway:
movl TO,d0 | set return value
hardloop:
movb FROM at +,TO at + | copy ...
bnes hardloop | until we copy a null
rts | return
I wonder, though, if this is truly faster. Should not a movb/bnes
pair run in loop mode? (Perhaps not; `dbcc' loops do, though, and
one could use a dbra surrounded by a bit of extra logic.) Machine
dependent `fast' code is often CPU dependent as well, and one must
be prepared to modify marked inner loops when moving among implem-
entations of one architecture.
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP: seismo!umcp-cs!chris
CSNet: chris at umcp-cs ARPA: chris at mimsy.umd.edu
More information about the Comp.lang.c
mailing list