SCO NFS dies when heavily used
Larry Philps
larryp at sco.COM
Mon Jun 17 03:03:27 AEST 1991
In <342 at harem.clydeunix.com> wes at harem.clydeunix.com (Barnacle Wes) writes:
> In article <1991Jun06.171047.15327 at nss1.com>, mrm at nss1.simpact.com (Michael R. Miller) writes:
> > The SCO OS/NFS is exporting its directory. A SUN OS/NFS is importing
> > the directory. Large numbers of reads and writes are going back and
> > forth for some time -- sometimes just a few minutes to an hour, other
> > times a couple of days -- and then the software decides to lay over
> > and play dead. We need to reboot the machine to breath life into its
> > networking support.
> >
> > The SUN's NFS continues to operate although that window is "dead"
> > with the program running in the window waiting for a never-to-be-answered
> > NFS request. We have determined that the SUN isn't at fault by successfully
> > reading and writing another NFS mounted directory exported by another SUN.
> > The SUN is an OS 4.1 product.
>
> This doesn't necessarily mean that the Sun NFS is correct, or bug-free,
> but just that Sun NFS has a bug-set that is compatible with (surprise!)
> Sun NFS. If you have another SCO system, try doing the same test with
> an SCO client & server. This may help to narrow the possibilities.
>
> Also, when you encounter this problem, does the entire network on the
> SCO box die, or just NFS? In other words, do telnet, ping, finger, etc
> still work? If so, it may just be a problem with SCO-NFS. If it
> crashing the entire network, including inetd, the problem may be in
> your TCP/IP software rather than the NFS server. Does nfsstat show any
> problems before or after the crash, such as lots of rpc badcalls?
>
> Good luck bug-hunting.
I sent mail to Michael Miller regarding this problem, but since the
question has now resurfaced a week later, I figured I should let
everybody in on the scoop.
This *bug* has already been found and fixed. Please note that the
problem is in the WD8003 driver, not NFS.
It turns out that in certain circumstances (transmitting while under
extremely heavy receive loads), the WD 8003 card can drop a transmit
interrupt. The driver did not check for, and thus did not recover from
this situation. This will produce exactly the symptoms Michael is
seeing.
We also found that under even heavier loads, the entire system could
hang. This turned out to be the result of the NIC chip on the board
putting a bogus value into the next packet pointer register. If this
bogus value was 0, the driver would infinite loop at spl5.
Both bugs have been fixed in the current driver, and are now shipping
as part of the LLI Drivers EFS. You can get this from support for a
fee of approx $50 (I think), or uucp download it for free from sosco or
ftp it for free from sco-archive on uunet.
---
Larry Philps, SCO Canada, Inc.
Postman: 130 Bloor St. West, 10th floor, Toronto, Ontario. M5S 1N5
InterNet: larryp at sco.COM or larryp%scocan at uunet.uu.net
UUCP: {uunet,utcsri,sco}!scocan!larryp
Phone: (416) 922-1937
More information about the Comp.unix.sysv386
mailing list