Explanation, please!
Kenneth Goodwin
klg at njsmu.UUCP
Fri Sep 2 03:19:39 AEST 1988
In article <9064 at pur-ee.UUCP>, hankd at pur-ee.UUCP (Hank Dietz) writes:
> In article <189 at bales.UUCP>, nat at bales.UUCP (Nathaniel Stitt) writes:
> > Here is my own personal version of the "Portable Optimized Copy" routine.
> 2. If the number of items/bytes is not known, then build a binary tree of
> such structs and copy half, then half of what remains, etc. This is
> struct t512 { int t[512]; };
> struct t256 { int t[256]; };
> struct t128 { int t[128]; };
.... etc .....
> if (n & 512) {
> *((struct t512 *) q) = *((struct t512 *) p); q+=512; p+=512;
> }
> if (n & 256) {
> *((struct t256 *) q) = *((struct t256 *) p); q+=256; p+=256;
> }
... etc ...
> Incidentally, this ran about 8x faster (on a VAX 11/780) than using
> the usual copy loop. Unfortunately, the above code should have been
> written as:
>
> if (n & 512) {
> *(((struct t512 *) q)++) = *(((struct t512 *) p)++);
> }
> ...
BUT This is where UNIONS come in handy, I used a similar although
more brief technique for a faster version of a bmov() (byte move)
subroutine on our PDP11-70 a while ago, and subsequently ported
it to memcpy when we updated from V6 to System V.
The basic idea that was used is to create a union of long, int,
(short), and char pointers, use the character pointer to achieve
the needed alignments and then use the largest available pointer
to do the copy. There is no reason why a stucture copy could not be
used, although I suspect on NON-VAX systems it may actually
be detremental (sp?) in some cases.
The PDP11 C compiler used to stuff registers onto the stack
and create a 16 bit word copy loop to do structure copies
using the freed registers, restoring them when it was done.
So a structure copy would be the same as a word copy on that style
of a system (ie, ones without block move instructions)
So In the case of your example, a modified brief version of it
would be:
union ptr_types {
struct t512 { int t512[512] } *t512;
....
struct t32 { int t32[32] } *t32;
long *t_long;
int *t_int;
short *t_short;
char *t_char;
} ;
(probably could dispense with long and short pointers
and related tests)
memcpy(a, b, len)
char *a; *b;
{
register union ptr_types a_ptr, b_ptr;
a_ptr.t_char = a;
b_ptr.t_char = b;
while(NOT ON A WORD BOUNDARY AND CHARS LEFT) {
*a_ptr.t_char++ = *b_ptr.t_char++;
len--;
}
if(len >= sizeof(int) * 512) {
/* if we can use a 512 int structure copy */
*a_ptr.t512++ = *b_ptr.t512++;
len -= (512 * sizeof(int));
}
/*M the biggest win is that the pointers increment correctly
len -= (sizeof(*element pointer)) is the correct form over
N INTS * sizeof int */
.......
I guess the rest is obvious, some GLUE may be needed
that has not be shown.... :-)
Boundaries should be checked on source and destination addresses
to avoid memory faults....
As you may be given incompatible source and destination address
that may require a full char by char copy. The first
test loop sort of does this, but all the other copies
should also check for proper address alignments before
proceeding.
Ken Goodwin
NJSMU.
More information about the Comp.lang.c
mailing list