file locking issues, NFS, lockf
rkc at xn.ll.mit.edu
rkc at xn.ll.mit.edu
Wed May 1 05:21:17 AEST 1991
=This is a slight modification of a posting that occured in comp.sys.sun.
=I received only a few answers which seemed to open as many questions as they
=answered. I now call upon the unix wizards to help me out.
I have written an application that is similar to a network database
application in which data is stored in on NFS-accessable file. To protect
from multiple simultaneous updates, I have used the lockf subroutine to lock
the entire file. I have had numerous problems with the lockf routine "locking
up". The symptoms vary:
S1. The client dies and the server doesn't realize it. In order to
avoid processes being killed when they own the lock, I catch the
following signals:
signal( SIGHUP, clnp );
signal( SIGQUIT, clnp );
signal( SIGINT, clnp );
signal( SIGILL, clnp );
signal( SIGIOT, clnp );
signal( SIGEMT, clnp );
signal( SIGFPE, clnp );
signal( SIGBUS, clnp );
signal( SIGSEGV, clnp );
signal( SIGSYS, clnp );
signal( SIGTERM, clnp );
Should I catch more?
FYI, Here's what the lock code looks like:
for(NumAttempts = 0;NumAttempts <= NUMPOLLS ; NumAttempts++){
if( lockf( fd, F_TLOCK, 0L ) != (-1)) {
success = TRUE;
break;
}
sleep(2);
}
I avoid the indefinate wait lock because this appears to increase the
probability that an error will occur.
S2. Sometimes the client doesn't die--it just hangs. Attaching the
hung program indicates something hangs inside of fcntl.
S3. Occasionally, I get messages like
unknown klm_reply proc(0)
unknown klm_reply proc(40)
Does anyone have any idea where these come from?
Other questions include:
1. Is there any known way to unconfuse our machines and reset
state without rebooting the things? Killing statd and lockd is not
sufficient.
2. I was once told that sun released patches to their lock daemon, but
noone could direct me to them. Does a wizard know where such things exist?
3. If lockf cannot be made to work, would I be at risk using the old
technique of creating a "lock directory"? I've read that with NFS this won't
work, but I've never read a good explanation of the problems with this approach.
Are their other workarounds (semaphores, etc) that I should try?
I would prefer to get this to work properly using lockf, since this seems to
be exactly what lockf is designed for.
Our network consists of sparcstation 1+ and IPC's running either 4.0.1, 4.1 or
4.1.1, and sun3's running 4.0.3. In the near future we will also be using
DG's aviion/UX workstations.
Thanks for any help you can provide,
-Rob
More information about the Comp.unix.wizards
mailing list