How can a user program hang the system?
Scott E. Townsend
fsset at neptune.lerc.nasa.gov
Fri Jun 8 00:28:03 AEST 1990
I've got a question regarding intermittent system hangs. A 'system hang'
means that all apparent activity on the Personal Iris monitor freezes,
including the mouse cursor.
Here's my setup:
1. There are two ordinary user programs communicating with a
remote system via shared memory in the remote system's
VME rack. One program loads & monitors the remote system's
execution, the other program provides a 'real-time' display
of remote data via a 3D surface plot.
2. The Personal Iris's VME adaptor is connected to the remote
system's VME rack via a VME repeater, the model 2000 repeater
from HVE Engineering Inc.
3. The remote system's shared memory is mapped via the mmap()
call:
cube_data = mmap(0, MEMORY_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED,
mem_fd, (VME_A24SBASE | 0XA0000000) + CUBE_ADDR);
where MEMORY_SIZE is 1 Meg and CUBE_ADDR is set at 4 Meg.
(This is for the surface plotter, a similar call is made for
the loader/monitor, mapping 4 Meg starting at 0)
Here's my symptoms:
1. The system usually runs fine, but some days (phase of the moon?)
it will hang after running for 30 seconds or so. (Enough time
to record 5-10 plots) This is semi-repeatable.
2. It seems to hang only if the surface plotter is running. The
surface plotter is just a simple tmesh algorithm, nothing
fancy, with a mesh size of 34 x 34 typically. Note that when
the system is using the surface plotter there is significantly
more communication across the VME repeater. Also, I can't run
just the plotter, I need the loader/monitor to get things
to run.
3. There is no console output if I have a console running on
the Iris display.
4. When the system hangs, the remote system still runs until its
communications time-out waiting for the frozen Iris.
So now the questions:
1. Has anyone had any experience with hanging repeaters on the
Iris's VME adaptor?
2. Under what conditions could a normal user program freeze the
system? I would think a programming error would simply cause
a segmentation fault or somthing similar.
3 Any suggestions on how I could debug this thing? I have access to
a VME bus monitor, but its triggering facilities are a bit
primitive.
Thanks for any and all help you can provide.
--
------------------------------------------------------------------------
Scott Townsend | Phone: 216-433-8101
NASA Lewis Research Center | Mail Stop: 5-11
Cleveland, Ohio 44135 | Email: fsset at neptune.lerc.nasa.gov
------------------------------------------------------------------------
More information about the Comp.sys.sgi
mailing list