Possible corruption of Message Queues on XENIX 386
Mark Delany
MDelany%hbapn1.prime.com at relay.cs.net
Tue May 8 13:02:49 AEST 1990
Has anyone else seen Message Queue corruption on XENIX 386 (SysV 2.3.1)
when under heavy load, particularly when the system is paging?
We're suspicious as ipcs gives strange values for CBYTES and QNUM. To
wit:
--------------------
Standard IPC package status
IPC status from /dev/kmem as of Thu May 3 11:01:02 1990
T ID KEY MODE OWNER GROUP CREATOR CGROUP CBYTES QNUM QBYTES LSPID LRPID STIME RTIME CTIME
Message Queues:
q 10 0x712806a1 SRrw-rw-rw- cacs group cacs group 65404 65535 1028 389 331 10:50:53 10:50:53 10:29:42
q 11 0x712806a2 -Rrw-rw-rw- cacs group cacs group 40 2 8192 331 389 10:50:53 10:50:53 10:29:42
...
--------------------
and on another occasion
--------------------
Standard IPC package status
IPC status from /dev/kmem as of Thu May 3 16:12:47 1990
T ID KEY MODE OWNER GROUP CREATOR CGROUP CBYTES QNUM QBYTES LSPID LRPID STIME RTIME CTIME
Message Queues:
q 20 0x712806a1 SRrw-rw-rw- cacs group cacs group 65445 0 1028 511 446 16:05:24 16:05:24 15:34:09
q 21 0x712806a2 -Rrw-rw-rw- cacs group cacs group 20 1 8192 446 511 16:05:24 16:05:24 15:34:09
...
--------------------
CBYTES and QNUM are 16 bit so it looks pretty much like an underflow
problem to me...
It only seems to occur when the system is heavily loaded and most likely
paging too. Further, the programs in question are making fairly extensive
use of Message Q's (as well as shared memory - if that's relevant) and it
is highly likely that more than one process is trying to access the same Q
at the same time. In other words, if there are any flaws in the locks
protecting these structures, then the progs will find them real soon!
Once this corruption occurs, all the programs wedge on message Qs. In
addition, the system often hangs after this has happened. The only
solution we've found so far is to re-boot :-(
What I'd like to know is: Has anyone else come across this? Were you able
to effect a work-around?
Naturally I've already call our supplier for help, but they're an indirect
supplier (ie not SCO) and, er, haven't been able to come up with any
solution or work-around for us thus far.
Thanks.
More information about the Comp.unix.wizards
mailing list