System V/AT vanishing inode problem fixed
Bob Thrush
rd at tarpit.uucp
Mon Jul 17 12:00:22 AEST 1989
I am finally posting the fix to the dreaded "vanishing inode
problem" for V/AT. The problem, as many have already indicated, is
due to an incorrect implementation of the inode cacheing algorithm in
the kernel ialloc and ifree functions. Rather than trying to patch
the problem, the fix replaces alloc.o in lib1. I expected to
have posted this much earlier; however, it seems that many things
conspired to hold it up. Anyway, with the advent of C News, this fix
is even more appropriate. This posting is followed by the
instructions for installing the fix.
Several people have tested the attached patch and have reported
favorable success. I have been using it since March 28 and it has
been weathering a full incoming news feed plus a few full/partial
outgoing feeds. I have not had a problem related to suddenly
vanishing inodes. Thanks to Mike Murphy and John Limpert for
independently testing the fix. (There were others whose names I
have misplaced. Sigh ;-})
I had gotten a very good description of the problem courtesy of
wayne at teemc. During several long nights, I managed to decompile
the errant alloc.o kernel module, add the few changes that were
mentioned in the attached posting, and remake the system.
On Nov. 24, 1987, Mayer Ilovitz posted a complete description of
the problem and a simple test to illustrate it. This helped me
understand the problem, fix it and exercise it. I have attached a
shar of that posting for those that wish to delve deeper,
otherwise hit 'n' now.
#! /bin/sh
# This is a shell archive. Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file". To overwrite existing
# files, type "sh file -c". You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g.. If this archive is complete, you
# will see the following message at the end:
# "End of shell archive."
# Contents: inode.notes
# Wrapped by rd at support on Sun Jul 16 20:01:48 1989
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'inode.notes' -a "${1}" != "-c" ; then
echo shar: Will not clobber existing file \"'inode.notes'\"
else
echo shar: Extracting \"'inode.notes'\" \(14555 characters\)
sed "s/^X//" >'inode.notes' <<'END_OF_FILE'
XReplied: Thu, 03 Mar 88 02:04:40 EST
XReplied: sharkey!mailrus!rutgers!ksuvax1.cis.ksu.edu!scott (Scott Hammond)
XForwarded: Thu, 03 Mar 88 02:02:17 EST
XForwarded: Jeff Lewis <pur-ee!lewie>
XReturn-Path: uucp
X>From mailrus!rutgers!ksuvax1.cis.ksu.edu!/dev/null Wed Mar 1 12:17:45 1989 remote from sharkey
XReceived: by mailgw.cc.umich.edu (5.59/1.0)
X id AA10725; Wed, 1 Mar 89 12:17:45 EST
XReceived: by mailrus.cc.umich.edu (5.59/1.0)
X id AA04178; Wed, 1 Mar 89 12:22:13 EST
XReceived: from ksuvax1.UUCP by rutgers.edu (5.59/SMI4.0/RU1.1/3.03) with UUCP
X id AA07118; Wed, 1 Mar 89 12:05:02 EST
XReceived: by ksuvax1.cis.ksu.edu
X (5.59++/CIS1.1) id AA05544; Wed, 1 Mar 89 11:04:53 CST
XDate: Wed, 1 Mar 89 09:02:24 PST
XFrom: sharkey!mailrus!rutgers!ksuvax1.cis.ksu.edu!scott (Scott Hammond)
XMessage-Id: <8903011702.AA07131 at rdahp.UUCP>
XTo: mailrus!sharkey!teemc!wayne, mailrus!sharkey!teemc!wayne
XSubject: Re: Dreaded System V inode problem
XNewsgroups: comp.bugs.sys5
XIn-Reply-To: <39256 at teemc.UUCP>
XOrganization: R & D Associates, Marina del Rey, CA
XCc:
X
XIn article <39256 at teemc.UUCP> you write:
X>
X> Several months ago, there was a discussion here about the way
X>System V eats inodes under certain circumstances (notably on the news
X>partition). Could someone who has corrected this please mail me a complete
X>description that I could forward to my vendor? I saved what I believed to
X>be pertinent articles at the time but they appear insufficient for him to
X>locate and correct the problem.
X>
X
XThis is the article I saved describing the problem. I have System V
Xsource code, and this explanation was sufficient for me to be able to
Xfind the bug myself. Unfortunately I haven't been able to 'fix' the
Xproblem because our source does not correspond to the release running
Xon our news machine, nor does it match the hardware.
X
X[I've been having some trouble with my main mail relay, so I'm sending
X this two ways. My apologies if you get two copies. Also beware the
X reply address.]
X--
XScott Hammond
XR & D Associates, Marina del Rey, CA
XEmail: rdahp!scott, rdahp!scott at sm.unisys.com, scott at harris.cis.ksu.edu
X--
XArticle 1730 of news.admin:
XPath: ksuvax1!uxc!tank!oddjob!uwvax!rutgers!njin!princeton!njsmu!mccc!pjh
X>From: pjh at mccc.UUCP (Pete Holsberg)
XNewsgroups: news.admin
XSubject: Re: Alzheimer's Syndrome
XKeywords: LONG!
XMessage-ID: <176 at mccc.UUCP>
XDate: 26 Sep 88 16:35:54 GMT
XReferences: <576 at mbph.UUCP> <406 at amyerg.UUCP> <10611 at ulysses.homer.nj.att.com> <12608 at ncoast.UUCP>
XReply-To: pjh at mccc.UUCP (Pete Holsberg)
XOrganization: The College On The Other Side of Route 1
XLines: 251
X
XHere's what I found in my archives about the inode problem:
X
XPath: mccc!princeton!rutgers!cmcl2!phri!cooper!mayer
X>From: mayer at cooper.cooper.EDU (Mayer Ilovitz )
XNewsgroups: comp.sys.att,comp.unix.wizards
XSubject: Analysis & test for 3b inode problem: applies to ALL users of SYSTEM V
XKeywords: 3b, SYSTEM V, inodes
XMessage-ID: <1133 at cooper.cooper.EDU>
XDate: 24 Nov 87 20:06:46 GMT
XOrganization: The Cooper Union (NY, NY)
XLines: 233
X
X
X Since I haven't seen anyone post a full description of the problem or
Xa test for it, here is my contribution.
X
X This document contains what I believe is a complete analysis of
Xthe System V inode allocation system and the problem that everyone is having.
XI have included a test procedure which should detect the problem on a UNIX
Xsystem and included a program that will help you perform the tests. Also I
Xhave some suggestions on properly fixing the bug.
X
X To begin with, let me describe what I have available to me.
XWe have a number of pretty standard Unix-PCs running System V 3.5 and
XSystem V 3.0 . We have a pair of 3b2-400 s running System V 3.0 Version 2. These
Xmachines each have a floppy diskette system. We also have an OLD 3b5 running
XSystem V Release 2 Version 2. I have access to the source for the Unix on the
X3b5 and the 3b2 systems. Our 3b5 runs our newsfeed using the rnews package.
XThis system has suffered the inode problems that everyone has been mentioning
Xon the net for the last few weeks. Since this system has no expendable files
Xsystems, I ran the tests on the 3b2 and the Unix-PC . Both of these systems
Xshowed the same error. From this, I suspect that all versions of ATT System V
Xunix have the problem. Furthermore, this problem may very well be in any
XATT System-V compatible version of Unix and may well have been present in
XSystem-III Unix. I therefore suggest, just to be on the safe side, that you
Xrun the test described below.
X
X The analysis and test was based on the source from the 3b5. A cursory
Xexamination of the source to the 3b2 showed the code to be essentially the
Xsame in the critical area though there are what appear to be minor cosmetic
Xchanges. For those of you with access to the souces, The file that needs
Xchanging is called alloc.c or s5alloc.c . If you don't have a file by this
Xname, look for a file that closely matches one of these names. The function
Xthat is causing the problem is called s5ialloc() or ialloc() .
X
X As far as I can tell ialloc and ifree are the low level inode
Xallocation control system. When an inode is needed, a call to ialloc() is made.
XWhen a file/directory is deleted, ifree() is used to release the inode.
XThese 2 functions use certain parameters that are kept in the superblock of
Xevery file system. tinode is the total number of free inodes in a file system.
XTo speed up inode allocation and freeing, the superblock maintains a table of
Xfree inodes. This table is called inode[]. The size of this table is given
Xby the #defined value NICINOD and is usually 100. ninode specifies the number
Xof free inodes available in inode[].
X
X When ifree() releases an inode, it first checks to see
Xif the inode table is full. If it isn't, the inode is added to the top of the
Xtable and ninode is adjusted. If the table is full and the inode being released
Xis less than the inode stored in inode[0], the newly released inode is put into
Xinode[0]. In this way, the allocation system knows where in the i-list a group
Xof free inodes are likely to be.
X
X When ialloc() is called, it tries to give the requesting process an
Xinode from inode[]. If none are available, ialloc() searches the i-list for
Xmore free inodes to reload inode[]. ialloc() will start this search begining
Xat the location of the last allocated inode as indicated by inode[] and
Xninode. The search continues untill NICINOD inodes are located or the end of
Xthe i-list is reached. inode[] will be reloaded from the top of the table
Xworking down to inode[0]. A mark is put in inode[] if less than 100 nodes were
Xfound. The next time inode[] runs out of nodes, this mark tells it to search
Xthe i-list from the very begining. If NO inodes were found during the search,
Xninode is SET TO 0 and the out of inodes error is printed on the system console.
X
X The problem that everyone is having is caused by the following
Xsituation. At the last reloading of inode[] exactly NICINOD inodes were found.
XTherefore, the inode at inode[0] is where the next search for inodes will begin.
XAs the system runs, more inodes are allocated and freed. Eventually, the last
Xfree inode in inode[] is allocated. The system waits until the next call to
Xialloc to determine if it needs to reload inode[]. If a node is released before
Xthe inode table is reloaded, the freed inode will go into inode[0], replacing
Xthe old value which would be used for searching the i-list. If the freed inode
Xwas higher in the i-list than the one it replaced in the table, ialloc will no
Xlonger know that it should check the lower portion of the i-list for free
Xinodes. It will think that everything below inode[0] is allocated already.
XIf a significant number of lower valued inodes are not freed before ialloc
Xhas to reload the inode table, ialloc will fail to find any free inodes even
Xthough they exist. Furthermore, because of the coding of ialloc(), unless an
Xinode is freed at some point, every time it tries looking for more inodes, it
Xwill start at the same place. So until the file system is dismounted and fsck'd,
Xunless some inodes are freed, the system will be stuck repeating the same search
Xand reporting the same failure.
X
X The original intent of the ialloc() - ifree() system is to minimize
Xthe time to find more free nodes by remembering the best location to start
Xsearching for more free inodes. Therefore, the best fix to ialloc would be
Xto first try to give the requesting process a free node. ialloc() should
Xthen IMMEDIATELY check to see if that was the last free inode it had, and if
Xit was, try reloading the inode table right then. This will prevent the
Xpossibility of the system from forgetting about the best place to search for
Xinodes. A side result of this is that the out of inodes message will appear
Xwhen the last free inode is allocated and not when ialloc failed to give
Xan inode. An argument could be made either way as to wether this side effect
Xis good or not. The other fix is to put a kludge into ialloc that, in the
Xevent that NO free inodes were found, it would immediately recycle through
Xthe i-list from the very beginning looking for inodes before deciding that there
Xare no free inodes left. If the i-list is large, this can be somewhat
Xinefficient.
X
X
X PROCEDURE TO TEST FOR THE 3B INODE ALLOCATION BUG
X
X
X This test is intended to be run on a floppy-based file system or an
Xexpendable file system. It is assumed that NICINOD, the number of inodes that
Xare stored in the superblock inode table is 100. If not, the test will have
Xto be adjusted accordingly.
X
X 1. create a file system with ~ 280 inodes using mkfs
X fsck the disk and mount it /mnt
X
X 2. verify with a df -t as to the number of free inodes and the total
X number of inodes in this file system.
X
X 3. allocate all the inodes on this filesystem. You can use the program
X fillnode given at the bottom of this document to help you do
X the job. The final result is that there should be 0 inodes left.
X Each file that you made on this disk should be named after its
X respective inode.
X
X 4. unmount the filesystem, do an fsck of the disk, remount and
X verify with a df -t that there are no free inodes.
X
X 5. free up the files with inodes 3-202. This will give you 200 free
X inodes on the filesystem. Verify this using step 4.
X
X 6. at this point, the file system will be mounted and the superblock
X inode table will contain inodes 3-102 for immediate allocation.
X
X 7. use fillnode to reallocate inodes 3-102. at this point you will have
X 100 free inodes when you do a df. This is the correct number of
X free nodes. At this point the superblock inode table will be
X empty.
X
X 8. use fillnode to allocate 1 inode. the inode that will be allocated is
X inode # 103. At this point the superblock inode table will have
X been reloaded from the i-list. the 0 element in the table will
X be inode 202 and the 99th element will be inode 103, which you
X just allocated.
X
X 9. Delete in order the files with inodes 30-39. At this point, the 0
X element in the inode table will be inode 31 while the 99th
X element will be inode 30. When you released inode 30, the
X table was not full, so it was put onto the top of the table.
X When inode 31 was released, the table was full so ifree checked
X to see if the just freed inode was less than the inode in the
X 0th element of the table. Since the 0th element up to this time
X was 202, ifree replaced the 0th element with inode 31. Note,
X The inode table is now full, containing 100 free inodes, the
X lowest free inode in the entire i-list being in the 0th element
X of the table. As you release inodes 32-39, they will fail the
X test by ifree, the result being that these inodes ARE free but
X simply aren't in the inode table. This is alright since when
X ialloc must reload its inode table it will start looking with
X the inode referenced in the 0th element of the table.
X
X 10. allocate another 100 inodes. fillnode will allocate in order
X inodes 30,104-201 and inode 31. At this point the superblock
X inode table is empty again. However, as always, ialloc will
X leave the table empty until it must allocate an inode and finds
X no inodes in the table.
X
X 11. free inode 240. At this point you have sealed your doom ! .
X ifree will put this inode into the lowest available entry in the inode table, DESTROYING ANY MEMORY THAT THE LOWEST FREE INODE IS
X AROUND INODE 31.
X
X 12. Do a df -t to confirm that you still have ~ 10 free inodes.
X
X 13. allocate an inode. This inode will be inode 240.
X
X 14. Do a df -t to confirm that you now have ~ 9 free inodes.
X
X 15. Call fillnode again and say goodbye to your free inodes!
X At this point you will get an out of inodes error on your
X console and the allocation attempt will return failure. A df -t
X say that there are NO free inodes. What happened was that after
X step 13 there were no free nodes in the superblock inode table.
X At this point, ialloc went searching through the i-list for
X more free inodes starting at the inode specified in the 0th
X element of the inode table. BUT this no longer references inode
X 31, where we know there is more space, but inode 240. ialloc
X searches from inode 240 to the end of the i-list, but all those
X inodes are allocated, so ialloc decides that there are no more
X free inodes and reports the out of inodes error,EVEN though
X you still have free inodes!.
X
X 16. unmount the filesystem. Do an fsck. This will report a bad inode
X count in the Superblock ( Sound familiar ) which you must
X fix. Remount and do an df -t to confirm that you really do
X still have a number of free inodes.
X
X IF THE SITUATIONS DESCRIBED IN THIS TEST HAPPEN TO YOU
X
X AND YOU ARE HAVING PROBLEMS BECAUSE OF THIS BUG
X
X CONTACT YOUR ATT CUSTOMER/TECH SUPPORT REP AND REPORT THE PROBLEM
X
Xbelow is the code for fillnode.c . This program will create a file in /mnt.
XThe file created will be named after the inode to which it was allocated.
XThe file will have 0 blocks allocated to it.
X
X#include <fcntl.h>
X#include <sys/types.h>
X#include <sys/stat.h>
Xmain()
X{
X int link(),open(),close(),fstat();
X struct stat buf;
X int fd;
X char name[30];
X
X if( (fd = open("/mnt/XXX",O_CREAT | O_WRONLY,0666) ) < 0 )
X {
X printf("can't open file\n");
X exit(2);
X }
X if( fstat(fd,&buf) < 0 )
X {
X printf("error fstating file\n");
X exit(3);
X }
Xprintf("inode is %d\n",buf.st_ino);
X sprintf(name,"/mnt/%d",buf.st_ino);
X close(fd);
X if( link("/mnt/XXX",name) < 0 )
X {
X printf("can't link to new name\n");
X exit(3);
X }
X if( unlink("/mnt/XXX") < 0 )
X {
X printf("can't unlink old file /mnt/XXX\n");
X exit(3);
X }
X exit(0);
X}
END_OF_FILE
if test 14555 -ne `wc -c <'inode.notes'`; then
echo shar: \"'inode.notes'\" unpacked with wrong size!
fi
# end of 'inode.notes'
fi
echo shar: End of shell archive.
exit 0
--
Bob Thrush UUCP: {ucf-cs,rtmvax}!tarpit!rd
Automation Intelligence, 1200 W. Colonial Drive, Orlando, Florida 32804
More information about the Comp.unix.microport
mailing list