help: cron runs commands twice (is it timed's fault?)

rainbow rainbow at altger.UUCP
Sun Aug 26 20:56:14 AEST 1990


PRELUDE:
1) If this has been subject to the net lately, please write me a short mail.
2) WARNING: If no one can help me further here, I'll post this
            article to comp.unix.wizards! ;-)

The following includes some of my current employers equipment,
so I avoid naming any Hard- or Software (forgive me).

THE PROBLEM ITSELF: CRON RUNS COMMANDS TWICE
We are running a network with about 20 UNIX machines. The networking-
software is a NFS-lookalike (based on ethernet,TCP/IP).
Now on some machines we have the problem that cron(1M) appears to run
some commands twice.

For example I have a job that is to be executed once per week, say at
friday noon. But then, friday 12:00, cron decides to run my job twice
(there are two processes running (different PIDs, of course, but same
PPID), I receive two mails and it is logged twice in cron.log).
This doesn't happen every friday, nor does it happen on all machines
within the network, but unless it does on some. Now an obvious
workaround is to have all jobs using some kind of locking machanism
(as I have them now), but this is not the kind of solution I am
looking for. Up to now, noone could tell me why this silly cron
behaves so badly, therefore I decided to post this mail.

THE THEORY (OR: DOES CRON REMEBER THE LAST COMMAND IT EXECUTED?):
To make it somewhat easier, I already have a theory about what might
happen. Even if this is not the reason, it is a possible scenario,
that is to be avoided (is it already?).
  ((Unfortunately I have *absolutely* no UNIX-sourcecode available, so
  could someone please look up whether cron works the way I believe it
  to work?))
When cron is running, it wakes up every minute, takes a look at the
real-time clock, checks whether there is any work to do, optionally
does that work and then puts itself to sleep again.
This it probably does using the alarm(2) and pause(2) systemcalls.
As far as I know, alarm(2) is implemented the way that it schedules
a signal to be sent to the caller in HZ*seconds clockticks.

Now this is just what might cause the problem:
Everything will work fine, as long as your real-time clock is
incremented every HZ clockticks, too.
But, as we have all machines *synchronizing their clocks* within
our NFS-network, you can't rely on that.
On machines that have a faster local clock, it might happen that
timed slows down the local clock in order to synchronize it with
the others. Therefore a minute might be somewhat longer than
60*HZ clockticks (up to ten percent I think, that were 66*HZ
ticks).  Now when cron is woken up after 60*HZ clockticks the
real-time clock might still show the same time as it did the last
time cron woke up. If cron doesn't somehow remember that it
already did all scheduled jobs for that time, it will do them
once again!

THE FINAL QUESTIONS:
1) Has anyone seen the described problem on machines where timed
   was running?
2) Is this scenario publicly known and solved for a long time?
3) Does timed somehow inform programs like cron about time "changes"?
4) Does cron work that way? Or does it continually sleep until
   there is some work to be done (this could work on ATT SysV
   since cron has all necessary tables loaded and could be
   signaled by crontab(1) or at(1) when someone makes any changes)
   Or does cron(1M) remember the commands it already started?

Please mail me or post your replies to comp.unix.questions.

Thanx in advance
Joerg

(please excuse any mistakes; english is not my native language!)
Joerg F. Trinitis	Kaerntner Platz 4, 8000 Muenchen 21, West Germany
(rainbow at altger.alt.sub.org) BANG: ...uunet!mcvax!unido!altger!rainbow



More information about the Comp.unix.questions mailing list