Berkeley file system tuning
Chris Torek
chris at mimsy.UUCP
Sat Jan 14 02:46:35 AEST 1989
In article <4787 at macom1.UUCP> larry at macom1.UUCP (Larry Taborek) writes:
>... You can also determine wether you have 2, 4, or 8 fragments per
>block, but I believe that 4 is about right. To high a fragment to
>block count (8), and the data from fragments may have to be copied
>up to 7 times to rebuild into a block (this would happen when a file
>would grow beyond the size that 7 fragments could hold, and the file
>system would copy these fragments into a block). To low a fragment to
>block count (2), and the block/fragment concept isn't helping
>very much.
This is not quite how things work; and there is not too much reason to
worry about fragment expansion in 4.3BSD. (It *is* a problem in 4.2BSD
if you use `vi', for instance, although just how much so varies.)
Only the last part of a file ever occupies a fragment. When extending
a file, the kernel decides whether it needs a full block or whether a
fragment will suffice. If a fragment will do, the kernel looks for an
existing block (in the right cg) that is already appropriately
fragmented. If one exists and has sufficient space, it is used;
otherwise the kernel allocates a full block and carves it up.
In 4.3BSD, Kirk added an `optimisation' flag (space/time; tunefs -o)
which is normally set to `time'. The kernel automatically switches it
to `space' if the file system becomes alarmingly fragmented, then back
to `time' when things are cleaned up. This flag does not exist in
4.2BSD; in essence, 4.2 always chooses `space'.
Now, when expanding a file that already ends in a fragment to a new
size that can be a fragment, if the flag is set to `space', the kernel
uses the usual best-fit search. But if the flag is set to `time', the
kernel finds a fragment that can be expanded in place to a full block,
or takes a full block if no such fragments exist.
All of this affects only poorly-behaved programs that write files a
little bit at a time. In 4.2BSD, vi always wrote 1024 bytes, which in
a 4k/1k file system is as bad as possible. It was possible for every
write system call to have to allocate a new set of fragments, copying
the data from the old fragments to the new. In 4.3BSD, even such
programs only lose once per fragment expansion, because the next three
(in a 4:1 FS) can always be done in place (provided that fs->fs_optim
is FS_OPTTIME). vi was fixed in 4.3BSD to write statb.st_blksize blocks.
(And enbugged at the same time: if st_blksize is greater than the
MAXBSIZE with which vi was compiled, it scribbles over some of its
own variables. I keep telling them that compiling in MAXBSIZE is
wrong.... Yes, it *does* break, if you speak NFS with a Pyramid
for instance.)
[and on paging:]
>What I noticed on BSD systems I used to administer was that the
>SECOND swap area was used exclusively until it filled, and then
>the swap overflow went to the first. To me, this made sense as the
>second swap area was on our second physical disk, which generally
>has less i/o then the first physical disk is expeced to have. (Any
>comments to this are appreciated).
No: Swap space is created in dmmax-sized segments scattered evenly
across all paging devices; its allocation approximates a uniform random
distribution. (See swfree() in /sys/sys/vm_sw.c and swpexpand() in
/sys/sys/vm_drum.c.)
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain: chris at mimsy.umd.edu Path: uunet!mimsy!chris
More information about the Comp.unix.questions
mailing list