Berkeley file system tuning

Sat Jan 14 02:46:35 AEST 1989

In article <4787 at macom1.UUCP> larry at macom1.UUCP (Larry Taborek) writes:
>... You can also determine wether you have 2, 4, or 8 fragments per
>block, but I believe that 4 is about right.  To high a fragment to
>block count (8), and the data from fragments may have to be copied
>up to 7 times to rebuild into a block (this would happen when a file
>would grow beyond the size that 7 fragments could hold, and the file
>system would copy these fragments into a block).  To low a fragment to
>block count (2), and the block/fragment concept isn't helping
>very much.

This is not quite how things work; and there is not too much reason to
worry about fragment expansion in 4.3BSD.  (It *is* a problem in 4.2BSD
if you use `vi', for instance, although just how much so varies.)

Only the last part of a file ever occupies a fragment.  When extending
a file, the kernel decides whether it needs a full block or whether a
fragment will suffice.  If a fragment will do, the kernel looks for an
existing block (in the right cg) that is already appropriately
fragmented.  If one exists and has sufficient space, it is used;
otherwise the kernel allocates a full block and carves it up.

In 4.3BSD, Kirk added an `optimisation' flag (space/time; tunefs -o)
which is normally set to `time'.  The kernel automatically switches it
to `space' if the file system becomes alarmingly fragmented, then back
to `time' when things are cleaned up.  This flag does not exist in
4.2BSD; in essence, 4.2 always chooses `space'.

Now, when expanding a file that already ends in a fragment to a new
size that can be a fragment, if the flag is set to `space', the kernel
uses the usual best-fit search.  But if the flag is set to `time', the
kernel finds a fragment that can be expanded in place to a full block,
or takes a full block if no such fragments exist.

All of this affects only poorly-behaved programs that write files a
little bit at a time.  In 4.2BSD, vi always wrote 1024 bytes, which in
a 4k/1k file system is as bad as possible.  It was possible for every
write system call to have to allocate a new set of fragments, copying
the data from the old fragments to the new.  In 4.3BSD, even such
programs only lose once per fragment expansion, because the next three
(in a 4:1 FS) can always be done in place (provided that fs->fs_optim
is FS_OPTTIME).  vi was fixed in 4.3BSD to write statb.st_blksize blocks.
(And enbugged at the same time: if st_blksize is greater than the
MAXBSIZE with which vi was compiled, it scribbles over some of its
own variables.  I keep telling them that compiling in MAXBSIZE is
wrong....  Yes, it *does* break, if you speak NFS with a Pyramid
for instance.)

[and on paging:]
>What I noticed on BSD systems I used to administer was that the 
>SECOND swap area was used exclusively until it filled, and then
>the swap overflow went to the first.  To me, this made sense as the
>second swap area was on our second physical disk, which generally
>has less i/o then the first physical disk is expeced to have.  (Any
>comments to this are appreciated).

No:  Swap space is created in dmmax-sized segments scattered evenly
across all paging devices; its allocation approximates a uniform random
distribution.  (See swfree() in /sys/sys/vm_sw.c and swpexpand() in
/sys/sys/vm_drum.c.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris at mimsy.umd.edu	Path:	uunet!mimsy!chris