Fortran vs. C for numerical work (SUMMARY)
Dan Bernstein
brnstnd at kramden.acf.nyu.edu
Fri Nov 30 17:15:38 AEST 1990
Several of you have been missing the crucial point.
Say there's a 300 to 1 ratio of steps through a matrix to random jumps.
On a Convex or Cray or similar vector computer, those 300 steps will run
20 times faster. Suddenly it's just a 15-1 ratio, and a slow instruction
outside the loop begins to compete in total runtime with a fast
floating-point multiplication inside the loop.
Anyone who doesn't think shaving a day or two off a two-week computation
is worthwhile shouldn't be talking about efficiency.
In article <7339 at lanl.gov> ttw at lanl.gov (Tony Warnock) writes:
> Model Multiplication Time Memory Latency
> YMP 5 clock periods 18 clock periods
> XMP 4 clock periods 14 clock periods
> CRAY-1 6 clock periods 11 clock periods
Um, I don't believe those numbers. Floating-point multiplications and
24-bit multiplications might run that fast, but 32-bit multiplications?
Do all your matrices really fit in 16MB?
> Compaq 25 clock periods 4 clock periods
Well, that is a little extreme; I was talking about real computers.
> For an LU
> decompositon with partial pivoting, one does rougly N/3 constant
> stride memory accesses for each "random" access. For small N, say
> 100 by 100 size matrices or so, one would do about 30
> strength-reduced operations for each memory access. For medium
> (1000 by 1000) problems, the ratio is about 300 and for large
> (10000 by 10000) it is about 30000.
And divide those ratios by 20 for vectorization. 1.5, 15, and 150. Hmmm.
---Dan
More information about the Comp.lang.c
mailing list