NULs in NFS files
mp at allegra.att.com
mp at allegra.att.com
Sat Dec 31 00:00:22 AEST 1988
We've encountered a rather frustrating problem with NFS files (under 4.0
and 4.0.1 on Sun-3's) getting corrupted, usually by having short sequences
of NULs appear in them.
So far the only corrupted files we've found are some of the small .o files
generated when a new kernel is made on a diskless client. The new kernel
is compiled in /var/sys (an NFS filesystem; if /var is mounted on a small
local disk, there's no problem). /var/sys is comprised of (when possible)
symbolic links to the original files in /sys. The kernel I'm comparing
things against is called CLIENT, which is a result of a GENERIC config
file with one difference: vmunix is specified as having its default root
and swap on type nfs. Sometimes a .o file (usually ioconf.o) will have no
namelist, but what usually happens is that 3 or 4 of the files will each
have a streak of a few NULs and the resulting kernel won't behave right.
Some files have problems much more frequently than others: these are
stubs.o, sc_conf.o, in_proto.o, and mcp_conf.o. Here are some sample
differences:
diff between CLIENT/stubs.o and OMEGA/stubs.o
text data bss dec hex
32 72 0 104 68 stubs.o
? map
b1 = 0x0 e1 = 0x20 f1 = 0x20 `stubs.o'
b2 = 0x0 e2 = 0x68 f2 = 0x20 `stubs.o'
cmp -l gives
129 156 0
130 151 0
131 164 0
od | diff gives
*** CLIENT Tue Nov 8 10:13:44 1988
--- OMEGA Tue Nov 8 10:13:44 1988
***************
0000160 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000
! 0000200 156 151 164 000 000 000 000 000 000 000 000 042 000 000 006 100
0000220 000 000 000 004 006 000 000 000 000 000 000 040 000 000 000 014
---
0000160 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000
! 0000200 000 000 000 000 000 000 000 000 000 000 000 042 000 000 006 100
0000220 000 000 000 004 006 000 000 000 000 000 000 040 000 000 000 014
diff between CLIENT/ioconf.o and OMEGA/ioconf.o
text data bss dec hex
0 8400 0 8400 20d0 ioconf.o
? map
b1 = 0x0 e1 = 0x0 f1 = 0x20 `ioconf.o'
b2 = 0x0 e2 = 0x20d0 f2 = 0x20 `ioconf.o'
cmp -l gives
19 6 0
20 60 0
31 6 11
32 100 140
od | diff gives
*** /tmp/CLIENT Tue Dec 20 13:00:28 1988
--- /tmp/OMEGA Tue Dec 20 13:00:19 1988
***************
0000000 000 002 001 007 000 000 000 000 000 000 040 320 000 000 000 000
! 0000020 000 000 006 060 000 000 000 000 000 000 000 000 000 000 006 100
0000040 000 000 000 000 000 000 000 000 000 000 000 104 000 000 000 000
---
0000000 000 002 001 007 000 000 000 000 000 000 040 320 000 000 000 000
! 0000020 000 000 000 000 000 000 000 000 000 000 000 000 000 000 011 140
0000040 000 000 000 000 000 000 000 000 000 000 000 104 000 000 000 000
Environment: Server is Sun-3/280 with xy451, 2 supereagles. Client is a
diskless 3/260. Server has about 3 clients, but problem occurs even when
the other clients are idle. Both client and server are on DELNI's. Both
are running SunOS 4.0; bug occurs whether running the out-of-the-box 4.0
kernel, one compiled and linked using the GENERIC config file, and one
containing 4.0.1 fixes related to nfs problems (nfs_vnodeops.o,
nfs_client.o, vm_hat.o, and kudp_fastsend.o with subsequent enabling of
udpcksum). [Of course, these kernels are being compiled on the server,
not on the client!] Problem occurs even if a different 3/260 is used.
Problem occurs even if 2 xy451's are used in the server (we and Sun
initially thought it might be the old
xylogics-controller-can't-handle-2-disks bug, especially since I've
arranged the server's filesystems so that /usr is on a disk different from
the clients' root and swap, which hopefully keeps both disks' arms going
simultaneously.) There are no error messages on the consoles. There is
plenty of free disk space in the client root partition (about 20MB). The
client mounts its NFS partitions using whatever defaults Sun provides -
the options in the fstab are "rw" for / and "ro" for /usr.
By the way, here's a separate problem that I ran into when investigating
the above problem: when the additional xy451 controller was added I
thought I'd do the clients a favor and not make them reboot. So rather
than xy1 becoming xy3 (because it would be drive 1 on controller 1) and
invalidating their mounted NFS filesystems, I made a kernel that had xy1
be xyc1 drive 1, and commented out the lines for xy2 and xy3. Was I
sorry! The nightly "find" that searches for core files crashed the server
each night! It seems that just statting /export/root/.../dev/xy2a causes
a kernel mode bus error near specvp().
Mark Plotnick
allegra!mp
More information about the Comp.sys.sun
mailing list