fastest way to copy hunks of memory
Dan Mercer
mercer at ncrstp.StPaul.NCR.COM
Fri May 4 08:09:53 AEST 1990
In article <5531 at helios.ee.lbl.gov> tierney at ux1.lbl.gov (Brian Tierney) writes:
:
:Which method is fastest?
:
:1. char *p1, *p2;
:
: for (j = 0; j < size; j++)
: *p1++ = *p2++;
:
:2. bcopy(p2,p1,size);
:
:3. memcpy(p1,p2,size);
:
:In general, system calls are slower, right? (ie, 1 faster that 2 and 3)
:
:BTW, what's the difference between bcopy and memcpy anyway??
:
:THANKS.
:
:
:/---------------------------------------v-------------------------------------\
:| Brian Tierney, Computer Graphics Lab | internet: tierney at ux1.lbl.gov |
:| Lawrence Berkeley Laboratory | or BLTierney at lbl.gov |
:| "I am her hero, and she is my heroin": Rocky Ericson |
It's c language implementation independent. Bcopy or memcpy may be
implemented as inline assembly (as strcpy is on the NCR Tower's SVr3
machines if <istring.h> is included). They may also be implemented
in other fashions that are quicker than standard C function calls.
If the hardware architecture supports move instructions the savings can
be significant.
for (j = 0; j < size; j++) *p1++ = *p2++;
At the minimal, this breaks down to:
exclusive or register x against register x - 1 cycle
load register a with size - 1 cycle
label:
move character based at value in x plus base y to position at x plus
base z - 1 cycle
increment x - 1 cycle
compare x to a - 1 cycle
branch not equal to label - 1 cycle
Cost = 2 + (size * 4) cycles
In fact, if the architecture supports a branch and decrement
instruction, the following c code would be faster -
for (j=size;j;j--) *p1++ = *p2++;
load register x with size - 1 cycle
label:
move character based at value in x plus base y to position at x plus
base z - 1 cycle
decrement x and branch to label if not zero - 1 cycle
Cost = 2 + (size * 2) cycles
However, if the bcopy or memcpy is expanded inline and the architecture
support move character instructions, the savings may be far greater.
For instance, we have a processor that does an MVC - move characters.
While the bytes being moved are not on full words, or the length to
be moved is less that a full word, one cycle is used for each byte.
But if full words can be moved, only one cycle is required to move
each 4 byte chunk.
Socrates once came upon a bunch of philosophers arguing about how many
teeth a horse had. One group argued for a certain number based on
the length of the horses snout, a second upon the diet the horse
ate, a third group based upon number theory. Socrates disappeared
for a minute and came back and told them the exact number. They were
incensed. How could he be so sure of his number, when he hadn't even
offered a theory to explain it. Simple, he replied, while they were
arguing, he had gone back to the stables and counted.
So, if you want the real answer, I suggest you do the same. If
you can't decipher the answer from the assembly generated, put
together a nice looping package and try that.
--
Dan Mercer
Reply-To: mercer at ncrstp.StPaul.NCR.COM (Dan Mercer)
More information about the Comp.unix.wizards
mailing list