Undocumented features
Guy Harris
guy at rlgvax.UUCP
Tue May 29 05:35:10 AEST 1984
> When I saw the note about the file sync'ing undocumented feature, I thought
> "Great! The people we have working on databases may have seen this, but if
> they haven't, I'll pass the note on to them." They're replies:
> > Person A:
> >
> > Person B will correct me if I'm wrong, but I believe this is a bit we had
> > already heard about. It would be quite useful, except for one minor
> > problem - it was put in for kernel use, and while it writes the data block
> > synchronously, it does NOT write the inode before returning to the user. It
> > was too bad, we thought we had found something useful before Person B spent
> > a few hours on the phone to [the USG UNIX people].
> >
> > Person B:
> >
> > Person A is correct about the utility (or lack thereof) of this feature.
> > Thanks for the information though.
> So BEWARE! If you use this "undocumented feature". There's a reason for it
> being undocumented!
This is correct. The problem is that "writei" calls "bwrite" instead of
"bdwrite" if the FSYNC flag is set in the file descriptor, *but* that's
not enough. If the B_ASYNC (asynchronous write - "bawrite") or B_DELWRI
(delayed write - "bdwrite") flag is already set in the buffer, the write
will be treated as an asynchronous or delayed write. For this to work,
you'd have to clear both those flags in the buffer before "bwrite"ing it.
The FSYNC bit is used only when writing superblocks in "update", and directory
entries in "unlink" (to make sure the directory entry is reamed out before
the inode it refers to is); since 1) you can't open a directory for writing
and 2) you obviously aren't going to "fsck" a cooked device corresponding
to a mounted file system, presumably that block of the file system will only
be written with a "bwrite". Unfortunately, it ain't so; a "link" system
call calls "wdir" which does a "bdwrite". This isn't a problem for "link",
as the S5 "link" code makes sure the inode is written to disk before writing
the directory entry, but could surprise a later "unlink" if that block
remains unwritten in the cache with B_DELWRI on. 4.2BSD (and, I believe,
4.1BSD, whose file system is the same V7 file system as S3 and S5 use)
flatly says "ALL WRITES TO DIRECTORY FILES WILL BE SYNCHRONOUS. PERIOD."
As such, I'd vote for S5 turning off any B_ASYNC or B_DELWRI bits if the
descriptor has the FSYNC flag set, and then making the FSYNC flag a documented
and official bit with an O_FSYNC flag for "open" and "fcntl". The only side
effect might be occasional performance degradation on directory I/O (less
overlap), but more file system integrity *and* the ability to provide database
integrity. A pretty good tradeoff, in my opinion. Besides, any such overlap
due to a directory being (mistakenly) written with a "bdwrite" is an accident
anyway.
Guy Harris
{seismo,ihnp4,allegra}!rlgvax!guy
More information about the Comp.unix.wizards
mailing list