Loss of RFNMs on ARPAnet hosts (the REAL FIX)
Clyde W. Hoover
clyde at ut-ngp.UTEXAS
Wed Oct 30 04:22:41 AEST 1985
Index: /sys/vaxif/if_acc.c 4.2BSD
APOLOGIA:
The previous fix posted was **** ALL WRONG. ****
My colleague who tracked down this bug did not (by his own admission)
explain the nature of the bug sufficently, hence the wrong
'fix'. This person, who will remain nameless, has suffered
and will continue to suffer the pains of the damned
because *I* ended up looking stupid on USENET. (I know, USENET
is full of stupid-looking people, but I was saving that for
net.singles).
Thanks to Art Berggreen <Art at ACC.ARPA> for his analysis of the
problem (included below) and to my nameless colleagues for spending
hours pouring over logic diagrams to figure out just how this bloody
box works.
NOTE: This is **not applicable** unless the modifications from Chris Kent
(cak at purdue.ARPA, posted 21 March 1984) have been made to
/sys/netinet/tcp_output.c. These modifications advertise a
maximum TCP segment size that is tuned per network interface.
Description:
Connections to certain hosts on the ARPAnet will start failing with
"out of buffer space" messages. Doing a 'netstat -h' shows
that the host (or the gateway to it) has a RFNM count of 8.
The RFNM count never drops below 8 and so the network path is
unusable until the system is rebooted.
The problem lies in the LH/DH-11 IMP interface.
Sometimes, most likely always, it will not set the <END OF MESSAGE>
flag in the control & status register if the input buffer is filled
at the same time that <LAST BIT SIGNAL> from the
IMP comes up.
This causes the LH/DH driver to append the next
incoming message from the IMP to the the previous message.
This process (appending of messages) will continue until
a message SHORTER then the input buffer size is sent --
a RFNM response does nicely.
This results in the LOSS of the succeeding messages (e.g. RFNMs)
since the 1822 protocol handling code expects to get only
<ONE> message from the LH/DH at a time.
This problem happens when the IMP MTU is advertised as the TCP
maximum segment size (as is done by the TCP changes from cak at purdue).
This allows an incoming message to be 1006 + 12 bytes long, which
equals the size of the 1018 byte input buffer in
the IMP (I believe) and so exercises the bug in the LH/DH.
The described problem would appear to happen ONLY if a message
from the IMP is one word longer than the buffer being read into.
When the buffer fills, leaving the data that contains the Last
Bit in the LH/DH data buffer, the Receive DMA terminates and
the EOM flag is NOT ON (because the user has not yet DMA'd
the End-of-Message into memory). What should happen when the
Receive DMA is restarted, is that the remaining word is read into memory
and the DMA should terminate with the EOM flag ON. If when the DMA is
restarted, the internal EOM status is lost, the following message would
be concatenated with the end of previous message.
A better solution than reducing IMPMTU (which doesn't really
fix the problem) would be to use I/O buffers that are slightly
larger than IMPMTU (and of course setting the Receive Byte Counter
to be larger than any expected message).
Fix:
/sys/vaxif/if_acc.c:
163c164
< (int)btoc(IMPMTU)) == 0) {
---
> (int)btoc(IMPMTU+2)) == 0) {
190c191
< addr->iwc = -(IMPMTU >> 1);
---
> addr->iwc = -((IMPMTU + 2) >> 1);
328,330c329,331
< len = IMPMTU + (addr->iwc << 1);
< if (len < 0 || len > IMPMTU) {
< printf("acc%d: bad length=%d\n", len);
---
> len = IMPMTU+2 + (addr->iwc << 1);
> if (len < 0 || len > IMPMTU+2) {
> printf("acc%d: bad length=%d\n", unit, len);
362c363
< addr->iwc = -(IMPMTU >> 1);
---
> addr->iwc = -((IMPMTU + 2)>> 1);
This fix really does the job properly.
--
Shouter-To-Dead-Parrots @ Univ. of Texas Computation Center; Austin, Texas
"All life is a blur of Republicans and meat." -Zippy the Pinhead
clyde at ngp.UTEXAS.EDU, clyde at sally.UTEXAS.EDU
...!ihnp4!ut-ngp!clyde, ...!allegra!ut-ngp!clyde
More information about the Comp.bugs.4bsd.ucb-fixes
mailing list