sendmail abort while running the queue
dplatt at teknowledge-vaxc.UUCP
dplatt at teknowledge-vaxc.UUCP
Fri Feb 20 06:50:13 AEST 1987
I'm running into a strange sendmail abort and haven't been able to pin
it down... can anybody give me a hint?
The situation is as follows: I'm on a Sun 3/52 workstation, running
SunOS 3.2. My sendmail daemon has been invoked with the "-bd -q15m"
options. Occasionally, the queue-running daemon aborts (the forked
child, not the parent).
The conditions appear to be the following:
1) I've sent a message to a host that is down or unreachable.
2) sendmail has made repeated tries to deliver the message.
3) The system has been up (without a reboot) for at least a day.
4) The abort typically occurs between midnight and 8 AM.
The symptoms appear to be:
a) sendmail aborts quietly and dumps core; it doesn't generate a message
into the system log or to the console.
b) the "d" (data) file of the undelivered mailgram remains in the
/usr/spool/mqueue directory.
c) The "l" (lock) file is apparently being left in the mqueue directory,
as the next queue run generates an "id: locked" message in the
syslog. This happens only once, though... the queue run 30 minutes
after the abort does not report "id: locked", so it appears that
somebody is deleting the lock file.
d) the "q" (control) file is being deleted at some point, although I'm
not sure when; it's gone when I come to work.
My sendmail.cf is derived from the "sendmail.cf.subsidiary" file that
came with SunOS 3.2, with a couple of mods:
- I use the "or10m" option to cause SMTP connections to time out if the
foreign host doesn't respond within 10 minutes.
- I have two mailers ("ether" and "localether") which are defined with
the P=[IPC], A=IPC options. Ruleset 0 selects the "localether"
mailer for outbound mail being sent to hosts that don't have a domain
specification (i.e. are on our local Ethernet), and "ether" for hosts
with a domain spec. The "localether" mailer delivers mail directly;
the "ether" mailer passes the mail to our local Internet relay
host for delivery, and hacks the "From:" address to include the relay
host's name rather than the sending Sun's name (which isn't
registered on the Internet).
- I have a "frozen" sendmail.fc, derived from the sendmail.cf after the
last set of changes were made.
Any ideas what might be going on here? I've seen some symptoms in the
past that lead me to suspect that the SunOS 3.2 sendmail may begin to
suffer from "bit decay" after the system has been up for a prolonged
period of time [strange aborts, curable only by a reboot... killing
all copies of sendmail and restarting the daemon does NOT cure the
problem... sticky-pages damaged, perhaps?). Anybody else seen these
symptoms, or have a cure or a diagnosis procedure?
As a possible workaround, I've removed the "-q15m" from the daemon
invokation in /etc/rc.local, and have added a queue-running command in
crontab. It'll be interesting to see if the problem goes away!
Dave Platt
Internet: dplatt at teknowledge-vaxc.arpa
Usenet: {hplabs|sun|ucbvax}!dplatt%teknowledge-vaxc.arpa
Voice: (415) 424-0500
USnail: Teknowledge, Inc.
1850 Embarcadero Road
Palo Alto, CA 94303
More information about the Comp.unix.questions
mailing list