Multi-Processor Performance Problem
Mike Muuss
mike at BRL.MIL
Thu Jan 11 20:39:57 AEST 1990
I have been running RT, BRL's parallel-processing ray-tracing code,
on our 4D/240 and 4D/280 machines. I have noticed that there seems
to be an unusual amount of time recorded by gr_osview (and regular
osview) in the "system" category. When I am lucky, about 10% of
all processors is consumed this way; when I am unlucky, about 60%
of all processor time is consumed this way.
Thanks to the superb DBX that SGI provides, I was able to isolate this
activity to the library routine _hsetlock() calling the system call
sginap(0). Very odd. I fussed around for a while, and eventually
determined that the routine _hsetlock() only tries to acquire
the hardware interlock 20 times (in a *very* tight loop) before
giving up, and calling sginap(0).
This constant of 20 would seem to be from <ulocks.h> variable _USDEFSPIN:
#define _USDEFSPIN 20 /* default spin for lock */
Suspeciting the worst, I wrapped my calls to the library locking
routines with my own spin-lock checking first, and got an ENORMOUS
speedup -- virtually all the system time went away.
I would therefore request that in the next IRIX release, either (a)
the built-in constant be chosen so that the system call isn't performed
until at least 1 microsecond of looping has passed, or (b) that this
constant be user-settable, perhaps via the usconfig() call.
I suppose that this should be sent to the hotline, but I'm working nights
this week, so you get E-mail instead. Somebody at SGI please forward this
to the right folk(s).
Best,
-Mike
-----------
PS: For the curious, here is a chunk of the code I'm using in order to
handle the locks on the SGI:
#ifdef SGI_4D
# include <sys/types.h>
# include <sys/prctl.h>
# include <ulocks.h>
static char *lockfile = "/usr/tmp/rtmplockXXXXXX";
static usptr_t *lockstuff = 0;
void
RES_INIT(p)
register int *p;
{
register int i = p - (&rt_g.res_syscall);
ulock_t ltp;
if( !rt_g.rtg_parallel ) return;
if (lockstuff == 0) {
(void)mktemp(lockfile);
if( rt_g.debug & DEBUG_PARALLEL ) {
if( usconfig( CONF_LOCKTYPE, _USDEBUGPLUS ) == -1 )
perror("usconfig CONF_LOCKTYPE");
}
lockstuff = usinit(lockfile);
if (lockstuff == 0) {
fprintf(stderr, "RES_INIT: usinit(%s) failed, unable to allocate lock space\n", lockfile);
exit(2);
}
}
ltp = usnewlock(lockstuff);
if (ltp == 0) {
fprintf(stderr, "RES_INIT: usnewlock() failed, unable to allocate another lock\n");
exit(2);
}
*p = (int) ltp;
lock_usage[i] = 0;
}
void
RES_ACQUIRE(ptr)
register int *ptr;
{
register int i = ptr - (&rt_g.res_syscall);
if( !rt_g.rtg_parallel ) return;
/* Attempt to reduce frequency of library calling sginap() */
if( lock_busy[i] ) {
lock_spins[i]++; /* non-interlocked */
while( lock_busy[i] ) lock_waitloops[i]++;
}
ussetlock((ulock_t) *(ptr));
lock_busy[i] = 1;
lock_usage[i]++; /* interlocked */
}
void
RES_RELEASE( ptr )
register int *ptr;
{
register int i = ptr - (&rt_g.res_syscall);
if( !rt_g.rtg_parallel ) return;
lock_busy[i] = 0; /* interlocked */
usunsetlock((ulock_t) *(ptr));
}
#endif /* SGI 4D */
PPS: The 4D/280 is **fast**!
More information about the Comp.sys.sgi
mailing list