Mail and file locking in distributed disk systems
Dale Worley
worley at compass.com
Wed Nov 8 07:02:40 AEST 1989
We have been having some problems with using sendmail in a network of
diskless Sun workstations that all mount the same filesystems via NFS from
a set of file servers. Our fundamental desire is to have all of the
workstations see the same file system, and particularly, to see the same
set of mailboxes. As part of this philosophy, we have centralized system
management and have enabled superuser access over the network. (Of
course, each workstation has the same system of user names (implelemented
via Yellow Pages).)
In the past we have had all the workstations mount one /usr/spool/mail
directory, and each workstation's sendmail (and any mail readers) write
and read the mailboxes from that directory. In practice, we haven't had
any trouble. However, people have warned me that in such a case the
sendmails don't interlock each other correctly, and sometimes lose mail.
A torture test (46 sendmails on 46 workstations trying to send ten
messages to the same mailbox simultaneously) showed that while they
usually interlock correctly, they don't always. (In fact, they work
better than would be expected if they didn't interlock each other at all
-- why is this?) Trying to figure out the best way to solve this has
raised a number of questions, which I'm trying to find answers to. Please
mail replies to worley at compass.com, and I will compose a summary if people
are interested.
1. One solution to the problem is to set the "OR" option in sendmail.cf.
This causes each sendmail to SMTP messages to the sendmail on the disk
server which provides the /usr/spool/mail directory. Sendmails on this
one machine will interlock each other correctly, so it eliminates the
simultaneous-access problem. Unfortunately, in SunOS 4.0.3, OR is buggy
-- (1) it causes the client sendmails to send *all* mail to the sendmail
on the mailbox server, not just mail to be delivered locally, and (2) it
makes sendmail unable to figure out the sending user name when its
stdin/etc. are pipes.
2. It appears that sendmail uses the flock() file locking mechanism rather
than the lockf() mechanism. According to the manual pages, flock() only
locks the file on a particular CPU -- it does not interlock across the
network. On the other hand, lockf() appears to be a variant of the
fcntl() locking mechanism, which does work across the network, using the
services of the lockd locking daemon. Why does sendmail use flock(),
rather than lockf()? How much work would it be to convert sendmail (and
all the mail readers) to use lockf()?
3. I have read somewhere that sendmail delivers mail into mailboxes by
exec-ing /bin/mail. If this is so, then the "local" mailer entry in
sendmail.cf could be redirected to use a different program, so that the
conversion of sendmail to use lockf() could be done without modifying
sendmail itself. Is this really true?
4. Why are there two entirely independent locking mechanisms, and why does
only one work over the network? This seems to be a very strange "feature"
of NFS.
Dale Worley Compass, Inc. worley at compass.com
More information about the Comp.sys.sun
mailing list