Fortran optimization - THE ANSWER!
Preston Briggs
preston at ariel.rice.edu
Fri Apr 5 16:25:36 AEST 1991
I wrote
>If you must unroll, unroll the outermost loop, giving
>
> DO N=1, NX, 4
> DO J = 1, JX
> DO I=1, IX
> A(I, J) = A(I, J) * B(I, J) + C(I, J)
> A(I, J) = A(I, J) * B(I, J) + C(I, J)
> A(I, J) = A(I, J) * B(I, J) + C(I, J)
> A(I, J) = A(I, J) * B(I, J) + C(I, J)
> A(I, J) = A(I, J) * B(I, J) + C(I, J)
> A(I, J) = A(I, J) * B(I, J) + C(I, J)
> A(I, J) = A(I, J) * B(I, J) + C(I, J)
> A(I, J) = A(I, J) * B(I, J) + C(I, J)
> ENDDO
> ENDDO
> ENDDO
On further thought (!), I'd unroll the middle loop a little
(use moderation in your experiments). Something like
DO N=1, NX
DO J = 1, JX, 4
DO I=1, IX
A(I, J+0) = A(I, J+0) * B(I, J+0) + C(I, J+0)
A(I, J+0) = A(I, J+0) * B(I, J+0) + C(I, J+0)
A(I, J+1) = A(I, J+1) * B(I, J+1) + C(I, J+1)
A(I, J+1) = A(I, J+1) * B(I, J+1) + C(I, J+1)
A(I, J+2) = A(I, J+2) * B(I, J+2) + C(I, J+2)
A(I, J+2) = A(I, J+2) * B(I, J+2) + C(I, J+2)
A(I, J+3) = A(I, J+3) * B(I, J+3) + C(I, J+3)
A(I, J+3) = A(I, J+3) * B(I, J+3) + C(I, J+3)
ENDDO
ENDDO
ENDDO
the idea being that the compiler would be better able to schedule
this stuff. Instead of 1 expression, we now get 4 expressions
that can be run in parallel, hopefully filling the pipe lines.
Experiment a little with the amount of unrolling and see what happens.
Preston Briggs
More information about the Comp.unix.cray
mailing list