Fast File System throughput

Thu Jul 26 10:11:15 AEST 1984

For most folks the purpose of interleave factors is pure magic set by
trial and error. Nothing could be farther from the truth:

The interleave factor is a skewing in the physical placement or logical
numbering of sectors to match the cpu service time to the rotation time
between consecutively read sectors. ... hmm ok so what?

cpu services times:

	interrupt latency +
	device driver interrupt processing time +
	device driver transfer start time +
	( possible reschedule and context switch +)
	( cpu time for higher priority kernel processes +)
	( cpu time for higher priority interrupts +)
	read system call time +
	application processing time =

	interlace time in msec.

interlace time is:

	(sectors per track / rotation time in msec) * interlace time in msec =

	interlace time in sectors

interleave factor is:

	interlace time in sectors + 1

Gotcha's:

	1) Missing the interlace timing results in one full rotation (plus
	   a little) of lost time. Net throughput approaches 1 sector per
	   rotation depending on the frequency of misses.

	2) Since most disks rotate at 60 times per second, the typical clock
	   frequency then causes clock interrupts (callout processing and
	   possible wakeups) to occur using 1-3 sectors of interlace time
	   per revolution.

	   Thus minimizing use of high frequency callouts and the cpu time
	   they consume is mandatory (new device driver programmers seldom
	   worry about this).

	   If the interlace factor is set exactly (not counting clock interrupt
	   times), then one rotation time will be lost per clock interrupt
	   ... if there is little or no callout processing, then increasing the
	   interleave factor one or two will cover clock interrupts
	   with no reduction in throughput.

	3) Serial receive and transmitt interrupts for 9600 baud occur
	   at one msec intervals for DZ and SIO type devices, and for
	   19200 baud occur at 500 usec intervals. Thus a single 9600
	   baud line will, when active, invoke about 18 interrupts per rotation
	   at 3600rpm ... for most 5mb 5-1/4 drives this is one interrupt
	   per sector (512byte) on a track, and for 10mb drives one interrupt
	   per (1k byte) block on a track.

	   To prevent large step reductions in thruput when terminals
	   are active on Programmed I/O lines, the interlace factor
	   must be adjusted to allow some average number of lines to be
	   active.

	   NOTE: Peusdo DMA is almost manditory to get interrupt service times
	   down to 50-150usec/char over the normal 500-1500usec/char of
	   generalized C coded service routines.

	4) Serial cputimes for either Peusdo or real DMA approaches are in the
	   the area of 50-400usec/char that occur once per buffer done.
	   Since dma buffer lengths are often 16-32 bytes this is basicly
	   one long completion interrupt per tty line during each rotation.
	   The net effect: to prevent large step reductions in disk
	   throughput the interleave factor must also be adjusted to
	   cover the average number of tty lines active.

	5) Input traffic from other computers adds substatial load
	   when serviced in raw/cbreak mode. Every input character
	   requires wakeup processing at interrupt service time.

	6) Readahead has little effect on reducing the interleave factor.
	   The net effect is that it allows the service times for two
	   consecutive sectors to be averaged dynamically, resulting in
	   fewer misses due to infrequent events.

	7) All of this scales in a non-linear fashion depending on cpu speed.

All this sounds hopeless? ... For single user workstations at most one
line is active ... for larger multiuser systems disk throughput
is generally terrible (and resulting response times).

Setting the proper interlace factor requires a combination of measurements
from a good logic analyzer and tradeoff decisions after doing a good 
performance/control flow study.

The only fix is to use disk controllers that can handle multiple outstanding
requests -- few hardware systems handle this.

The above is a general view on interlace factors ... more important to
traditional 512/1kbyte filesystems ... but still a non-trival problem
for tuning 4.2 filesystems.

I will be giving a talk at the annual UNIOPS Conference in San Francisco
next week (8/2/84) which goes into a lot of detail on filesystem performance
issues ... of which interlace factors is a minor but important part.

John Bass (Systems Performance and Arch Consultant)
{dual,fortune,hpda,idi}!dmsd!bass        408-996-0557