Processor efficiency
Robert G. Brown
rgb at PHY.DUKE.EDU
Fri Jun 15 13:34:12 AEST 1990
We have a Power Series 220S in our department as a compute server.
It has 24 Mb of RAM, no graphics console, and two processors.
My question is this: we have empirically observed that small jobs
written in C or F77 for a single processor and optimized run at
around 3.5 MFLOPS (as advertised). The problem is, that if one takes
these jobs (typically a loop containing just one equation with a
multiply, a divide, an add, and a subtract) and scales them up by
making the loop set every element of a vector and increasing the size
of the vector and the loop, there is a point (which I have not yet
tried to precisely pinpoint) where the speed degrades substantially --
by more than a factor of two.
This point is >>far<< short of saturating the available RAM, and seems
independent of "normal" system load (which is usually carried by one
processor when the other is running a numerical task like this).
My current hypothesis is that this phenomenon is caused by saturation
of some internal cache on the R3000. Has anyone else noticed or
documented this? Is there a technical explanation that someone could
post? Since we (of course) want to use the SG machine for fairly
large jobs, it is important for us to learn about performance cutoffs
in order to optimize performance. On the other hand, if there is
something wrong with our SG-220, we'd like to learn that too...
Thanks,
Dr. Robert G. Brown
System Administrator
Duke University Physics Dept.
Durham, NC 27706
(919)-684-8130 Fax (24hr) (919)-684-8101
rgb at phy.duke.edu rgb at physics.phy.duke.edu
More information about the Comp.sys.sgi
mailing list