Bizarre code in UUCP
mcg at tekecs.UUCP
mcg at tekecs.UUCP
Tue Jun 14 10:50:22 AEST 1983
I just finished battling a strange problem related to UUCP over 4.1a TCP/IP
sockets that was caused by some code in 'cico.c' which I could not
fully understand the reasoning behind. (Jump to the end of the message
to see the question and my fix).
The problem is this:
Symptoms:
Two systems are talking to each other over a TCP/IP channel.
It becomes clear that when one system is lightly loaded and
another is heavily loaded, the LCK..sysname lock file was
left on the loaded system, preventing further polling until
it was manually removed.
Cause:
The following two pieces of code are the culprits:
/* the very end of the mainline in cico.c ... */
alarm(MAXMSGTIME);
omsg('O', "OOOOO", Ofn);
DEBUG(4, "send OO %d,", ret);
if (!setjmp(Sjbuf)) {
for (;;) {
omsg('O', "OOOOO", Ofn);
ret = imsg(msg, Ifn);
if (ret != 0)
break;
if (msg[0] == 'O')
break;
}
}
alarm(0);
There is a window between the two omsg() calls during which
the lightly loaded system may have sent BOTH his omsg()'s
and called imsg(). He (the lightly loaded system) gets the
"OOOOOO" from the heavily loaded system from the first
omsg() call, and exits, implicitly closing his end of the
channel.
In the meantime, the heavily loaded system has finally gotten
around to executing the second omsg() call, and gets an error
because there is nothing/nobody to write to. In 4.1A, writing
to a socket which the other end has closed causes a SIGPIPE!
UUCP doesn't catch SIGPIPE, and uucico dies suddenly, silently,
and mysteriously, without a chance to clean up.
Second Problem:
Assume that the above problem didn't occur, or was fixed.
After the ending handshake, the routine cleanup() was called.
There is some code in cleanup() as follows:
cleanup(code)
int code;
{
....
/* toward the end of cleanup(), in cico.c */
if (Role == MASTER) {
write(Ofn, EOTMSG, strlen(EOTMSG));
}
There is the same problem here, i.e. uucico is writing to
a neighbor who may very well be dead and gone. A SIGPIPE
will occur here as well, if implicit delays have allowed the
other side to actually close.
THE QUESTION(S):
1) Why does uucico LOOP, sending "OOOOOO"'s to each other?
What's the point?
2) Why is there an initial call to omsg(), when it is immediately
called again, right before the imsg()? Is this neccesary?
3) In cleanup(), what is the real purpose of the EOTMSG?
Is this intended to cause the other system to turn the line off?
Is it really needed?
My Fix:
A bit of a kludge, I'm afraid. I was afraid to change the code
under normal circumstances, fearing I would introduce an
unforseen incompatibility with other uucico's. Thus, I merely
conditionally execute the first omsg('O', "OOOOO"), and the
write(..., EOTMSG), executing them ONLY on a NON-TCP/IP channel.
This solved my problem.
Also, to some extent this problem is caused by silly 4.1A
sending SIGPIPE in these circumstances, which seems completely
unreasonable.
Does anyone have any answers to my questions? My feeling is that it
is hisorical (hysterical) accident.
S. McGeady
{decvax,ucbvax,zehntel}!tektronix!tekecs!mcg
P.s:
I am copying this to the unix-wizards and bugs.uucp lists.
Feel free to reply to the list as well as me if you think others
would be interested.
More information about the Comp.unix.wizards
mailing list