Serious 4.2 kernel bug causes files and directories to be mangled

Tue Dec 13 12:31:28 AEST 1983

From:  Jeff Mogul <mogul at navajo>

Index: kernel 4.2BSD

Description:
	Symptom: files get trashed, with text mysteriously
	appearing at random places in random files.  Sometimes,
	directories become corrupted (by the way, fsck core dumps
	if you ask it to try to fix these.)

	One process is writing output to a file (usually, to a terminal
	or pseudo-terminal).  Somewhere on the system, another process
	opens a file (any file).  The output from the first process ends
	up in the file opened by the second process.
	
	This bug is so obviously similar to the 4.1BSD mx2.c mxclose()
	bug that it took me only a few minutes to find the superficial
	cause: iput() is being called with an inode whose i_count is already
	less than 1.  Thus, at some later point the inode shows up both
	allocated and free, and gets allocated a second time.  The
	disaster is inevitable once this occurs.
	
	I installed a bugcheck in iput() and quickly found that the
	real problem is that closef() (in sys/kern_descrip.c) is
	being called with a file argument whose f_count field is
	already zero.  Since nobody checks that this count is positive,
	the inode referenced by f_data is passed to iput() for a second
	time.
	
	By installing another bugcheck in closef() I verified this
	hypothesis.  However, after a week of placing debugging code
	into parts of the kernel (and reading a lot of it) I have not
	been able to find where the reference count is going wrong.
	Two possible theories are:
	
	(1)	Something is copying a struct file * but is not bumping
	the reference count
	(2)	There's a race going on somewhere.
	
	Theory #2 had some initial support from a possible bug between
	closef() and ino_close().  Both set f_count to zero, and ino_close
	may block (if I've understood things) between the point where
	it zeros the count and the point where closef() zeros the count.
	If another process allocates the (now apparently free) file
	in the interim, the potential for disaster is obvious.  However,
	in spite of a few traps I set this race is not leaving the traces
	I would have expected.
	
	Even so, this looks like it might be a bug in certain circumstances.
	the line in ino_close() that sets the count to 0 is crypticly
	commented "/* XXX Should catch */", which I interpret to be
	an indication of trouble.  The reorganization of the code between
	4.1BSD and 4.2BSD is significant; this particular race could not
	occur in the 4.1BSD code.
	
	One other "fact":  when closef() is called with f_count == 0,
	f_data->i_number is apparently always that of /dev/ttyp?
	Since we always use telnet (rarely, rlogin) because the system
	has no serial lines, these are the only "slow" output devices
	on the system.
	

Repeat-by:
	First, install the two panic()s given in the next section,
	or your files will be trashed.
	
	Then, run the system and login via telnet connections.  Our
	750 with a user community of about 8 (a development machine)
	crashes about twice a day.  I can't repeat it any more
	reliably than that.

Fix:
	To prevent files from being trashed,

	in sys/kern_descrip.c, in closef(), just before the line
	
	(*fp->f_ops->fo_close)(fp);
	
	insert:
	
	if (fp->f_count < 1)
		panic("closef: starting count < 1");

	and, to be safe (since a related bug has already popped up
	once before) in sys/ufs_inode.c, in iput() before the first line,
	insert:
	
	if (ip->i_count < 1)
		panic("iput: starting count < 1");

	This WON'T solve the problem; it will simply keep your files
	intact.