Processor efficiency

Fri Jun 15 13:34:12 AEST 1990

We have a Power Series 220S in our department as a compute server.
It has 24 Mb of RAM, no graphics console, and two processors.
My question is this:  we have empirically observed that small jobs
written in C or F77 for a single processor and optimized run at
around 3.5 MFLOPS (as advertised).  The problem is, that if one takes
these jobs (typically a loop containing just one equation with a
multiply, a divide, an add, and a subtract) and scales them up by
making the loop set every element of a vector and increasing the size
of the vector and the loop, there is a point (which I have not yet
tried to precisely pinpoint) where the speed degrades substantially --
by more than a factor of two.

This point is >>far<< short of saturating the available RAM, and seems
independent of "normal" system load (which is usually carried by one
processor when the other is running a numerical task like this).

My current hypothesis is that this phenomenon is caused by saturation
of some internal cache on the R3000.  Has anyone else noticed or
documented this?  Is there a technical explanation that someone could
post?  Since we (of course) want to use the SG machine for fairly
large jobs, it is important for us to learn about performance cutoffs
in order to optimize performance.  On the other hand, if there is
something wrong with our SG-220, we'd like to learn that too...

Thanks,

	Dr. Robert G. Brown 
 	System Administrator 
 	Duke University Physics Dept. 
 	Durham, NC 27706 
 	(919)-684-8130    Fax (24hr) (919)-684-8101 
 	rgb at phy.duke.edu   rgb at physics.phy.duke.edu