Help! Altos 5.3.1 fork is failing!

Jim Rosenberg jr at oglvee.UUCP
Sun Oct 15 08:43:49 AEST 1989


We just recently "upgraded" [sic] an Altos 2000 from Xenix 5.2c to UNIX 5.3d.
uname reports the operating system as 5.3.1.  We have 4M RAM and before the
upgrade the machine just screamed.  Now we are paging like mad and getting
sporadic fork failures.  The increased paging activity has my users bitching
and moaning, but the fork failures are like a sniper loose in my system gunning
down processes sporadically.  The problem is surely *not* insufficient
process table slots.  crash(1) reports we have 180 slots (NPROC is 0 in the
tuning parameter file, which on this system is called /usr/sys/master.d/kernel)
and we've got nowhere within a country mile of that many processes.  The
per-user limit is 30, and we're getting fork failures where that's not exceeded
either.  The system error reporting is filled with messages like this:


000146 07:50:06 00e6f0f6 ... 0000 00 NOTICE: getcpages - waiting for 1 contiguous pages
000147 08:13:16 00e80082 ... 0000 00 
000148 08:13:16 00e80082 ... 0000 00 NOTICE: getcpages - Insufficient memory to  allocate 1  contiguous page - system call failed
                               ^^^^^^^^^^^^^^^^^^

In many cases I can exactly correlate one of these "system call failed"
messages with a fork failure.

According to the man page for fork(2) there are 3 ways a fork can fail:  No
process table slots left, exceeding the per-user limit, and a most obscure
indeed 3rd one:  "Total amount of system memory available when reading via
raw IO is temporarily insufficient".  Either the man page lies or this third
one is it.  I took a blind stab and guessed that the parameter involved here
is PBUF.  Altos recommends PBUF=8 straight across the board no matter how
much memory you have.  Sounds pretty odd to me, since on a 6386 running V.3.2
with 2 Meg RAM I've got 20, and never fiddled with it.  I jacked up PBUF to 16
-- but it made no difference.  So, my questions are:

What the bleep is getcpages?  It sounds like an internal kernel routine to get
continuous pages in RAM.  Is this call issued by the paging daemon?  How could
it fail on a request to get only 1 page unless I'm out of swap space?  (Which
I'm not.  We're getting these with many many thousand blocks of free swap
space -- we have a swap(1) which will show these.)

Is there a tunable parameter that will rescue me here?  

Altos seems to think that a failed fork should only get a "NOTICE".  Yeah,
well, I notice all right.  It's bad enough when the shell reports "No more
processes" -- you just try again and it works.  But we have all kinds of
batch jobs that spawn uux requests and other such things and they're just
getting shot right out of the sky.

Any words of wisdom gratefully accepted!  I skimmed over the likeliest parts 
of Bach to see if the light would dawn -- looks like I better go back and
reread the section on demand paging pretty carefully.
-- 
Jim Rosenberg                        pitt
Oglevee Computer Systems                 >--!amanue!oglvee!jr
151 Oglevee Lane                      cgh
Connellsville, PA 15425                                #include <disclaimer.h>



More information about the Comp.unix.wizards mailing list