Disk striping? (4.3 BSD)
Don Speck
mangler at cit-vax.Caltech.Edu
Mon Feb 15 19:19:12 AEST 1988
On December 10 I wrote that disk striping has a couple of rather
serious restrictions. At the beginning of February I finally had
a pressing need for disk striping (to piece together two small
partitions into a usable-size filesystem after losing a disk),
so I finally debugged the striping pseudo-device driver that I'd
written, and found that neither restriction was necessary.
The basic method is for the strategy routine to copy the buf, fudge
the dev/blkno fields in the copy, and set B_CALL in the copy (NOT in
the original). At iodone time, a routine is called, which copies
back b_resid, b_error, and (only) the B_ERROR bit of b_flags, and
does an iodone() on that. The temporary buf is then freed.
To avoid the possibility of having to sleep on buf allocation,
requests that cannot immediately allocate a buf are linked into a
list. By having a private pool of bufs, we're assured that a buf
will soon be freed up by an interrupt, and when that happens the
list of waiting requests is examined. Ripping off swap buffers
doesn't work, since the swapper may hog them and the only way it
has to tell you when one becomes free is via sleep/wakeup, which
strategy routines are NOT supposed to use.
With those changes, it should be safe to use with Sun ND, etc.
(I don't see why it couldn't be used recursively if you wanted).
I tried various interleave factors, and found that with a single
disk controller, it's best to interleave by cylinders. Trying to
interleave by filesystem blocks messes up the rotdelay optimization.
Reading large files does not go any faster than with a single disk,
you only gain throughput if you have several independent readers.
cit-vax has been using this to hold netnews since February 7.
The code can be obtained by anonymous ftp from csvax.caltech.edu
(10.1.0.54), file pub/stripe.tar. Feedback is welcome, this is
still pretty experimental.
Don Speck speck at vlsi.caltech.edu {amdahl,ames!elroy}!cit-vax!speck
More information about the Comp.unix.wizards
mailing list