lockf, NFS, and file locking issues
rkc at xn.ll.mit.edu
rkc at xn.ll.mit.edu
Thu May 2 02:58:13 AEST 1991
=This is a slight modification of a posting that has occured elsewhere.
=It was suggested that I post these questions to these newsgroups.
I have written an application that is similar to a network database
application in which data is stored in on NFS-accessable file. To protect
from multiple simultaneous updates, I have used the lockf subroutine to lock
the entire file. I have had numerous problems with the lockf routine "locking
up". The symptoms vary:
S1. The client dies and the server doesn't realize it. In order to
avoid processes being killed when they own the lock, I catch the
following signals:
signal( SIGHUP, clnp );
signal( SIGQUIT, clnp );
signal( SIGINT, clnp );
signal( SIGILL, clnp );
signal( SIGIOT, clnp );
signal( SIGEMT, clnp );
signal( SIGFPE, clnp );
signal( SIGBUS, clnp );
signal( SIGSEGV, clnp );
signal( SIGSYS, clnp );
signal( SIGTERM, clnp );
Should I catch more?
FYI, Here's what the lock code looks like:
for(NumAttempts = 0;NumAttempts <= NUMPOLLS ; NumAttempts++){
if( lockf( fd, F_TLOCK, 0L ) != (-1)) {
success = TRUE;
break;
}
sleep(2);
}
I avoid the indefinate wait lock because this appears to increase the
probability that an error will occur.
S2. Sometimes the client doesn't die--it just hangs. Attaching the
hung program indicates something hangs inside of fcntl.
S3. Occasionally, I get messages like
unknown klm_reply proc(0)
unknown klm_reply proc(40)
Does anyone have any idea where these come from?
Other questions include:
1. Is there any known way to unconfuse our machines and reset
state without rebooting the things? Killing statd and lockd is not always
sufficient.
2. I was once told that sun released patches to their lock daemon, but
noone could direct me to them. Does a wizard know where such things exist?
3. If lockf cannot be made to work, would I be at risk using the old
technique of creating a "lock directory"? I've read that with NFS this won't
work, but I've never read a good explanation of the problems with this approach.
Are their other workarounds (semaphores, etc) that I should try?
I would prefer to get this to work properly using lockf, since this seems to
be exactly what lockf is designed for.
Our network consists of sparcstation 1+ and IPC's running either 4.0.1, 4.1 or
4.1.1, and sun3's running 4.0.3. Currently the client is on one of the sun3's.
In the near future we will also be using DG's aviion/UX workstations.
Thanks for any help you can provide,
-Rob
More information about the Comp.unix.internals
mailing list