help with mbuf leak problem?
karl_kleinpaste at cis.ohio-state.edu
karl_kleinpaste at cis.ohio-state.edu
Sun Sep 16 00:33:41 AEST 1990
Pyramid 98xe, OSx4.4c, nfsd 8, biod 8.
I've developed a nasty problem with one of my Pyrs in the last 16
hours or so. It has developed a serious problem with mbuf lossage.
Here's a netstat -m output just before his last reboot, about 10
minutes ago:
2003/2032 mbufs in use:
1877 mbufs allocated to data
12 mbufs allocated to packet headers
109 mbufs allocated to routing table entries
3 mbufs allocated to socket names and addresses
2 mbufs allocated to interface addresses
128/128 mapped pages in use
510 Kbytes allocated to network (99% in use)
1 requests for memory denied
Note excessive data mbuf allocation, and 99% utilization. Consider
the same thing from his twin, in the next cabinet, looking quite
normal and running for days:
86/288 mbufs in use:
3 mbufs allocated to data
4 mbufs allocated to packet headers
75 mbufs allocated to routing table entries
2 mbufs allocated to socket names and addresses
2 mbufs allocated to interface addresses
28/96 mapped pages in use
228 Kbytes allocated to network (29% in use)
0 requests for memory denied
This leakage started happening sometime around 5pm or 6pm last
evening. I have had to reboot almost hourly just to keep the @#$%
machine alive. I've experimented with several things, trying to find
the cause. Killing off assorted network daemons didn't help;
sendmail, nntp, inetd as a whole, routed, pcnfsd were all killed, and
yet the data mbuf allocation keeps ratcheting upward. I tried
rebooting with 16 nfsd/biod but this was no help either. Killing off
all nfsd/biod and the portmapper didn't help. Renicing nfsd and/or
biod didn't help. As near as I an see, nothing running on the Pyr
itself is the cause of this.
"etherfind -r -n src victim-pyr or dst victim-pyr" run from a nearby
SunOS4.1 Sun3 shows a great deal of NFS traffic, of this form:
UDP from another-pyr.1023 to victim-pyr.2049 128 bytes
RPC Call prog 200000 proc 1 V1 [93dc7]
UDP from victim-pyr.2049 to another-pyr.1023 104 bytes
RPC Reply [93dc7] AUTH_NULL Success
UDP from another-pyr.1023 to victim-pyr.2049 172 bytes
RPC Call prog 200000 proc 9 V1 [93dc8]
UDP from victim-pyr.2049 to another-pyr.1023 36 bytes
RPC Reply [93dc8] AUTH_NULL Success
UDP from another-pyr.1023 to victim-pyr.2049 172 bytes
RPC Call prog 200000 proc 9 V1 [93dc9]
UDP from victim-pyr.2049 to another-pyr.1023 36 bytes
RPC Reply [93dc9] AUTH_NULL Success
UDP from another-pyr.1023 to victim-pyr.2049 128 bytes
RPC Call prog 200000 proc 1 V1 [93dca]
UDP from victim-pyr.2049 to another-pyr.1023 104 bytes
RPC Reply [93dca] AUTH_NULL Success
But not all of this traffic is coming from another-pyr -- assorted
Pyrs, Suns, and the occasional HP show up.
I'm also getting messages like
NFS server write failed: (err=13, dev=0xffa610a4, ino=0xffa69bd0).
on the console occasionally. Errno 13 is EACCES. ???
The only anomalous thing about this Pyr's configuration is that it's
the departmental /usr/spool/mail NFS server. But that's been the case
for a couple of years now, nothing new or unusual about that.
As I said, I'm rebooting roughly hourly at this point to keep it
alive. It seems to perform admirably right up until the end, when the
2032/2032 mbuf condition hits. It reboots in 10 minutes and is fine
again for the next hour, while the mbuf count goes up.
Clues, anyone? I can't think of anything that would have been started
at 5pm on a Friday evening which might cause this sort of thing. What
sort of activity on the Pyr or elsewhere on my network should I be
looking for?
--karl
More information about the Comp.sys.pyramid
mailing list