TIME_WAIT sockets clog system
Steven M. Schultz
sms at wlv.imsd.contel.com
Tue Jul 4 16:59:19 AEST 1989
In article <12417 at bloom-beacon.MIT.EDU> scs at adam.pika.mit.edu (Steve Summit) writes:
>There is an interesting discussion going on in comp.bugs.2bsd
>about an out-of-mbufs problem caused by an mget in ftp. The
>problem obviously occurs primarily on a pdp11 with its limited
>memory, but the 2.10bsd code is taken directly from the VAX
>version, and I have noticed the same problem (and indeed the
>original submittor acknowledges the possibility in the excerpt
>from his posting I've reproduced below) when doing an mput (as I
>recall) on an overloaded MicroVAX being used as a file server.
ahhh, so others have seen the problem on larger machines. i had
not seen any other references before, so i thought it only a
'theoretical' possibility to run a vax out of network memory.
>There is some debate about the efficacy of the proposed fix,
>which involves fleshing out the (previously stubbed) tcp_drain
>routine.
the pitfalls of my proposed change to the mbuf allocator
have been made known to me (i really should have known better).
an alternative solution is-being/has-been prepared.
a small change to mbuf.h is made, adding a new 'wait' flag
and modifying the MGET macro to test whether it is safe
(i.e. not being at splimp) to manipulate the tcb chain(s).
the 0340 and 0100 are the processor priority mask and network
priority (2) level for the pdp-11, but hopefully the idea is clear.
ideally the appropriate symbolic names should be used, but
"real work" reared it's head ;-)
the idea is to add another state that will NOT sleep, but WILL
invoke the drain code if the network code was at splnet. (thanks
to Dan Lanciani - ddl at harvard.harvard.edu for pointers in this
area).
it would be enlightening to know why sockets stay around so long
in a TIME_WAIT state (especially on a LAN) and what would break
if the timeout interval were reduced.
the tcp_drain() modification with the removal
of the un-necessary splimp call seems adequate. here's what
tcp_drain() looks like at the moment:
tcp_drain()
{
register struct inpcb *ip, *ipnxt;
register struct tcpcb *tp;
/*
* Search through tcb's and look for TIME_WAIT states to liberate,
* these are due to go away soon anyhow and we're short of space or
* we wouldn't be here...
*/
ip = tcb.inp_next;
if (ip == 0)
return;
for (; ip != &tcb; ip = ipnxt) {
ipnxt = ip->inp_next;
tp = intotcpcb(ip);
if (tp == 0)
continue;
if (tp->t_state == TCPS_TIME_WAIT)
tcp_close(tp);
}
}
and the change to mbuf.h:
/* flags to m_get */
#define M_DONTWAIT 0
#define M_WAIT 1
#define M_DONTWAITLONG 2 /* THIS IS NEW */
...
#define MGET(m, i, t) \
{ int ms = splimp(); \
if ((m)=mfree) \
{ if ((m)->m_type != MT_FREE) panic("mget"); (m)->m_type = t; \
mbstat.m_mtypes[MT_FREE]--; mbstat.m_mtypes[t]++; \
mfree = (m)->m_next; (m)->m_next = 0; \
(m)->m_off = MMINOFF; } \
else \
(m) = m_more((((ms&0340) <= 0100) && (i==M_DONTWAIT)) ? M_DONTWAITLONG : i, t); \
splx(ms); }
More information about the Comp.bugs.2bsd
mailing list