Is HDB locking safe?
Thomas Truscott
trt at rti.rti.org
Thu Aug 16 05:46:12 AEST 1990
> ... HDB assumes that if the pid recorded
> in the lock file no longer corresponds to an active process, the lock file is
> defunct and can safely be removed. I can't for the life of me figure out a
> safe way of doing this.
A crucial detail in recovering from a breakdown in the lock protocol
is avoiding a race between two or more processes that are simultaneously
attempting recovery. Usually a strategic pause is all that is needed,
and as you can see in the HDB code below there is just such a pause.
> static int
> checklock(lockfile)
> char *lockfile;
> {
> ...
> if ((lfd = open(lockfile, 0)) < 0)
> return(0);
> ...
> if ((kill(lckpid, 0) == -1) && (errno == ESRCH)) {
> /*
> * If the kill was unsuccessful due to an ESRCH error,
> * that means the process is no longer active and the
> * lock file can be safely removed.
> */
> unlink(lockfile);
> sleep(5); /* avoid a possible race */
> return(1);
> }
>
> In this code there is no guarantee that lfd and lockfile correspond to the
> same file at the time of the unlink.
But there *is* a guarantee -- the "sleep(5);"!!
[I changed the sleep() line to match the one in 4.3 BSD uucp "ulockf.c"]
Consider a process "X" that discovers that the locking
process has terminated. X unlinks the lockfile,
but then it *pauses* before it attempts to grab the lock for itself
(done by code not shown above).
Now consider scenario #1 for another process "Y":
At nearly the same instant Y discovers
the dead lock, so it also unlinks the lockfile
(of course only one unlink can succeed) and it *also pauses*.
Whenever X and/or Y resume there is no lock present,
so attempts to grab it proceed in the usual way (code not shown above).
Now consider scenario #2 for Y:
Just after X has unlinked the lockfile, Y calls checklock()
and discovers no lock is present. No problem, it just
attempts to grab the lock in the usual way (code not shown above).
When X awakes from its slumber it will discover that Y has
already grabbed the lock, so X will just have to wait.
The HDB code is nice, but does have flaws:
(a) A "sleep(1);" is not enough to avoid a race on a very busy system.
(b) Lock recovery is obscure, so the sleep() call should be commented.
(c) Protocol breakdown is a bad thing, and should be reported:
logent(lockfile, "DEAD LOCK");
The 4.3 BSD ulockf.c routine has all of these nice features.
Tom Truscott
More information about the Comp.unix.wizards
mailing list