parallelization problems with fpp under cf774.0.3 (long)
Hemant Dandekar
hemant at acsu.buffalo.edu
Sun Apr 21 00:01:30 AEST 1991
Hi,
I am having trouble 'parallelizing' the outermost do 20 loop when I
use the fpp preprocessor on a CRAY-2s running UNICOS 6.1 and cf77 4.0.3.
This loop does parallelize when I use the parallel fortran under IBM
FORTVS 2.4 and 2.5. I don't see an depedence in this loop at the
k level. Here is the the routine:
The subroutine calculates the convective term in a 3D cfd code. I
have removed the comment statements for sake of saving bandwidth.
subroutine convc1 (nlo,nhi,mlo,mhi,klo,khi,
$ f,indgeo,isimpl,confac)
parameter (nmax=65,mmax=45,kmax=45)
dimension f(0:nmax,0:mmax,0:kmax),
1 co1(0:nmax,0:mmax,0:kmax),
1 co2(0:nmax,0:mmax,0:kmax),
1 co3(0:nmax,0:mmax,0:kmax),
1 co4(0:nmax,0:mmax,0:kmax)
c
common / admsh1 / x(0:nmax,0:mmax),y(0:nmax,0:mmax),
1 xne(0:nmax,0:mmax),yne(0:nmax,0:mmax)
common / amsh1d / we(0:kmax),z(0:kmax),zne(0:kmax)
common / conflu / cof(0:nmax,0:mmax,0:kmax)
common / dotpro / dpl(0:nmax,0:mmax,0:kmax),
1 dpr(0:nmax,0:mmax,0:kmax),
1 dpb(0:nmax,0:mmax,0:kmax),
1 dpt(0:nmax,0:mmax,0:kmax)
common / datpro / dw1(0:nmax,0:mmax,0:kmax),
1 dw2(0:nmax,0:mmax,0:kmax),
1 dw3(0:nmax,0:mmax,0:kmax),
1 dw4(0:nmax,0:mmax,0:kmax),
1 dpw1(0:nmax,0:mmax,0:kmax),
1 dpw2(0:nmax,0:mmax,0:kmax),
1 dpw3(0:nmax,0:mmax,0:kmax),
1 dpw4(0:nmax,0:mmax,0:kmax)
do 10 k=klo,khi
do 10 j=mlo,mhi
do 10 i=nlo,nhi
cof(i,j,k) = 0.0
10 continue
if(indgeo.eq.0) then
confac = 2.0*2.0*8.0
confc1 = confac * 0.5
else
confac = 2.0*4.0*4.0*4.0*6.0*6.0
confc1 = confac * 0.5
end if
do 20 k=klo,khi
do 20 j=mlo,mhi-1
do 20 i=nlo,nhi-1
f1 = f(i,j,k)
f2 = f(i+1,j+1,k)
f3 = f(i,j+1,k)
f4 = f(i+1,j,k)
f5 = f(i,j,k+1)
f6 = f(i+1,j+1,k+1)
f7 = f(i,j+1,k+1)
f8 = f(i+1,j,k+1)
c
zl = ( (z(k+1) - z(k-1)) + (zne(k+1) - zne(k-1))) / 4.0
c
vnl = dpl(i,j,k)
vnr = dpr(i,j,k)
vnb = dpb(i,j,k)
vnt = dpt(i,j,k)
c
dwp1 = dpw1(i,j,k)
dwp2 = dpw2(i,j,k)
dwp3 = dpw3(i,j,k)
dwp4 = dpw4(i,j,k)
c
flb = f1
flt = f3
frb = f4
frt = f2
fbl = f1
fbr = f4
ftl = f3
ftr = f2
c
f1b = f1
f1t = f5
f2b = f2
f2t = f6
f3b = f3
f3t = f7
f4b = f4
f4t = f8
c
col = (vnl + abs(vnl))*flb + (vnl - abs(vnl))*flt
cor = (vnr + abs(vnr))*frb + (vnr - abs(vnr))*frt
cob = (vnb + abs(vnb))*fbl + (vnb - abs(vnb))*fbr
cot = (vnt + abs(vnt))*ftl + (vnt - abs(vnt))*ftr
c
co1(i,j,k) = (dwp1 + abs(dwp1))*f1b + (dwp1 - abs(dwp1))*f1t
co2(i,j,k) = (dwp2 + abs(dwp2))*f2b + (dwp2 - abs(dwp2))*f2t
co3(i,j,k) = (dwp3 + abs(dwp3))*f3b + (dwp3 - abs(dwp3))*f3t
co4(i,j,k) = (dwp4 + abs(dwp4))*f4b + (dwp4 - abs(dwp4))*f4t
c
cof(i,j,k) = cof(i,j,k) - (col + cob)*zl -
$ co1(i,j,k)*confc1
cof(i+1,j+1,k) = cof(i+1,j+1,k) + (cor + cot)*zl -
$ co2(i,j,k)*confc1
cof(i,j+1,k) = cof(i,j+1,k) + (col - cot)*zl -
$ co3(i,j,k)*confc1
cof(i+1,j,k) = cof(i+1,j,k) - (cor - cob)*zl -
$ co4(i,j,k)*confc1
20 continue
return
end
Another section of the code which is exactly the same but with following
lines for terms 'flb to ftr' does parallelize.
flb = f4 + 3.0*f1
flt = f2 + 3.0*f3
frb = f1 + 3.0*f4
frt = f3 + 3.0*f2
fbl = f3 + 3.0*f1
fbr = f2 + 3.0*f4
ftl = f1 + 3.0*f3
ftr = f4 + 3.0*f2
Is it doing that because, it finds insufficient work to do in the previous
case and more work to do in the second case (where there are 8 additional
multiplications inside the loop)? Any comments/suggestions would be
appreciated.
thanks,
hemant
--
----------------------------------------------------------------------------
Bitnet: v092qghg at ubvms.bitnet | Hemant W. Dandekar
Internet: hemant at asterix.eng.buffalo.edu| 303, Furnas Hall, Chem. Eng.
(716)636-2631 | SUNY at Buffalo, Buffalo NY 14260
More information about the Comp.unix.cray
mailing list