Problems with the rexd and rshd daemons in Interactive 386/ix 2.0.2

Tue Sep 18 23:58:52 AEST 1990

	Hello, out there. 

	This is a repost of an article posted to comp.unix.i386 at the time
of the comp.unix.* reorganisation.  I didn't get any answer then, so this
time I'll broaden the distribution to include comp.protocols.{nfs,tcp-ip}
on top of using the new correct newsgroup; comp.unix.sysv386!

	We have an ethernet network with three nodes, all of them running
NFS.  One of the most useful commands is on(1) which runs commands on another
node but retains the environment (including the current directory).  Very
neat!  My problem is: I can't use this facility to run programs on our
386/ix (2.0.2 core, 1.1.2 TCP/IP, 2.0 NFS).  I get this error message:
"on: af clnt_call..RPC: Unable to receive" sometimes, and sometimes
I won't even get an error message!  The logfile /tmp/rexd.log looks
something like this:

Sep  6 09:00 (Rpchild/10444): Child #10444 processing RPC for request 
        REXD INFO: errno=22, msg="Invalid argument" 
Sep  6 09:00 (Rpchild/10444): About to fork execution child; cmd='ls' 
        REXD INFO: errno=9, msg="Bad file number" 
Sep  6 09:00 (Rpchild/10444): [RPC Child: svc_fds == 0, shutting down] 
        REXD INFO: errno=9, msg="Bad file number" 

or like this:

Sep  6 09:02 (Rpchild/10446): Child #10446 processing RPC for request
	REXD INFO: errno=22, msg="Invalid argument"
Sep  6 09:02 (Rpchild/10446): About to fork execution child; cmd='ls'
	REXD INFO: errno=9, msg="Bad file number"
Sep  6 09:02 (Rpchild/10446): [RPC Child: svc_fds == 0, shutting down]
	REXD INFO: errno=4, msg="Interrupted system call"

The other way everything works like it's expected to (e.g. running
a command on our NCR Tower using the 386/ix on(1) command).  Even
local usages, like "on localhost ls" fails!  What have I done wrong?
Is there a magical kernel parameter which is wrongly set?  Please help!

	And then there's this "remote shell" handle by /etc/rshd on the
386/ix.  Very often (not always, though) my client "remsh" on another node
gets hung after sending the standard input to the foreign shell.  Very
annoying indeed!  After I kill the client the daemon continues as if nothing
has happenned.  It seems like the EOF gets lost on the way, but reappears if
I kill the client.

Another possibly related weirdness of our 386/ix system is the presence
of all these strange TIME_WAIT, CLOSE_WAIT, FIN_WAIT_2 & CLOSED IP-sessions
that never goes away from our netstat:

Active Internet connections 
Proto Recv-Q Send-Q  Local Address          Foreign Address        (state) 
tcp        0      0  ix.1224                ix.111                 TIME_WAIT 
tcp        0      0  ix.1182                nix.1181               CLOSE_WAIT 
tcp        0      0  ix.1181                nix.1039               CLOSE_WAIT 
tcp        0      0  ix.shell               appli.1023             CLOSED 

The corresponding lines from our node "nix": (only the ones concerning "ix")
tcp        0      0  nix.1181               ix.1182                FIN_WAIT_2 
tcp        0      0  nix.1039               ix.1181                FIN_WAIT_2 

There are no corresponding line for the CLOSED connection to node "appli" in
the output from netstat on that node.  What's going on here?  Most things
does work like X11, NFS, rlogin, rcp etc.  It's just "rexd" & "rshd" that
fails!  Any Ideas?

					Niklas
---
Niklas Hallqvist	Phone: +46-(0)31-19 14 85
Applitron Datasystem	Fax:   +46-(0)31-19 80 89
N. Gubberogatan 30	Email: niklas at appli.se
S-416 63  GOTEBORG	       sunic!chalmers!appli!niklas
Sweden