4.2 vs. 4.3 sockets
Flame Bait
joshua at athertn.Atherton.COM
Tue Dec 5 00:10:56 AEST 1989
I've got a programming problem which I hope someone already has the
answer for. It has already caused me much grief. (None of it good :-)
I have a client/server program which works just fine on BSD 4.2 type
systems (like SunOS 3.5), but it fails on BSD 4.3 type systems (like
SunOS 4.0.3 and AIX 2.2.1). I already changed the select system call
so that it uses FD_SET, FD_ISSET, fd_set, and friends.
Are there any other 4.2/4.3 differences? The changes I made were very
small. I used FD_SET instead of a bit set before the select call, FD_SETSIZE
instead of sizeof(int) in the select call, and FD_ISSET instead of a
bit test after the call. Is there anything else I need to change?
Some other facts: this error happens after 3962+/-10 identical operations,
and is very consistent. If I start a client and run 3000 operations,
kill it and run second client for 3000 operations, then all is well.
The client application looks like this:
listen to a TCP connection
repeat 4000 times:
send a UDP packet
recv a responce via TCP
close the accepted TCP connection
The reason for the UDP/TCP switch is that the server will respond using
UDP if it will fit in one UDP packet; if not, TCP is used. To tickle the
bug I need to make a huge number of UDP request/TCP responses. The server
is writing to the client, and the client is in the select call waiting for
the server, but they never make contact. They had made contact for the 3900
odd calls before this and they make contact on a BSD 4.2 machine. Weird.
This bug seems far too consistent for a timing problem, and the client runs
too many times for it to be running out of some resource like file descriptors.
Things that I have tried and have failed:
I replaced FD_SETSIZE with getdtablesize().
I got paranoid about writes only writing some of their data (I put in code
to check the return value, and loop to write the rest, if needed.)
I got paranoid about signals interupting my read/write calls. (AIX, where
this first hit me, is mostly System V).
I changed all my listen(sock,1) calls to listen(sock,5) calls. (For when
the client was listening for a TCP response).
I added a shutdown(sock,2) before closing the socket which the client
accepts from the server.
Another general question (hopefully unrelated):
If a write call System V type UNIX returns -1 with errno==EINTR, what
should be done? There is no way to know if part of the data did
get written. Or, is it always safe to restart the call from scratch?
I'm at wits end. Email or call with any ideas you have. Thanks.
Joshua Levy joshua at atherton.com home:(415)968-3718
{decwrl|sun|hpda}!athertn!joshua work:(408)734-9822
More information about the Comp.unix.questions
mailing list