Execution time bottleneck: How to speed up execution?
John Rowe
JRowe at exua.exeter.ac.uk
Fri Feb 15 06:36:22 AEST 1991
In article <21658 at yunexus.YorkU.CA> racine at yunexus.yorku.ca (Jeff Racine)
writes:
Execution time query:
I have the following nested loop in which I must sum (over all i), for each
x[j], e raised to the power x[i]-x[j] squared (times a constant).
for (j=1; j <= n; j++) {
sum = 0.0;
for (i=1; i<= n; i++) {
sum += exp( con * (x[i]-x[j]) * (x[i]-x[j]) );
}
a[j] = k*sum;
}
Jeff needs to speed it up. The C guys will tell you how to speed up
the loops; being a physicist, ie really just a FORTRAN programmer, I
would just point out to halve the number of them and maybe take out a
multiply. With minimal recoding maybe something along the lines of:
/* Just typed in - maybe typos */
foo = sqrt(con > 0 ? con : -con);
for (j = 1; j <= n; ++j) {
a[j] = 1.0; /* i = j case */
y[j] = x[j] * foo;
}
if ( con > 0.0 )
for (j = 1; j < n; ++j)
for (i = j + 1; i <= n; ++i) {
a[j] += foo = exp( (y[i]-y[j]) * (y[i]-y[j]) );
a[i] += foo; /* Two sums per exp */
}
else
for (j = 1; j < n; ++j)
for (i = j + 1; i <= n; ++i) {
a[j] += foo = exp( - (y[i]-y[j]) * (y[i]-y[j]) );
a[i] += foo; /* NB ^ only difference.*/
}
for (j=1; j <= n; ++j) a[j] *= k;
[pious platitudes about first improving the algorithm, taking
advantage of symmetry, etc. omitted]. I don't know your machine but
it's the halving of the number of exponentials that I would expect to
give you the greatest gain, 50%. If you've got a modern, floating
point orientated RISC machine (RS6000, i860??) floating point
multiplies may be so fast it's not it may not be worth the hassle of
the vector y and the if ( con < 0.0 ) bit. Any FORTRAN compiler would
optimise out the (y[i]-y[j]) * (y[i]-y[j]) in its sleep, so should
your C compiler. If not you could do it by hand as I see someone has
sugested, but that's *probably* even less of a gain. I would guess
that the question of c v assembler hangs upon how long your cpu takes
to do a floating point multiply and exponential against the quality of
your compiler's optimiser.
Halving the number of exponentials is only a factor of two but it's
zero effort and it's always there however you end up coding it.
But I may have missed really something obvious as to why you can't do
that; I had a late night this morning.
Good luck and hope this helps.
John Rowe
Exeter University Computational Physics Group
Exeter
U.K.
More information about the Comp.lang.c
mailing list