4.2bsd kernel auto-nicing, scheduling
Michael Wagner
wagner at utcs.uucp
Fri Mar 7 02:33:56 AEST 1986
I've just been catching up on unix-wizards, which I haven't looked at in
a loooong time, and I found this discussion of 4.2 scheduling.
Interestingly, the Maryland stuff sounds *very* close to the
(rather heuristic) scheduler in CP (the 'monitor', if you will) of VM.
It (basically) has:
1) a starting priority for every user
2) no explicit upper/lower limits per user, but certain other constructs
generate something like a probability cloud around that starting
priority
3) a number of queues that users reside on. Aside from the ones that
designate resource unavailability (notably storage overcommitment),
there are basically 3 queues for 'runable' users, called (with typical
lack of imagination) Q1, Q2 and Q3. They correspond to very interactive,
somewhat interactive, and batch (by batch I mean it runs to completion
once started, not that you send it off elsewhere). In the HPO
(High Performance Option) variant of the product, the issue is clouded
(but vastly improved, I'm told...I'll shortly be able to report on this)
by the splitting of each of these into waiting-for-CPU-dispatch and
waiting-for-other-quick-services (page fault, etc) queues. They are
informally called the real-run-list and the run-list, respectively.
I can't recall the formal names right now. Interrupts move users
between the run-list and the real-run-list, and only the real-run-list
need be scanned on most redispatches.
4) A transaction can be loosely defined as the time between pressing the
enter key, and getting the last of the results. (for those who don't
know, S/370 moves terminal interaction (irrevocably) into front-end
hardware, so there is no raw-mode equivalent. The entire input is
presented to the mainframe in one interrupt when it is completed.
Completion is signalled by a special key (enter). Some (limited)
local editing is possible before hitting enter.) When a transaction
is started, it is given a 'deadline' based on the basic priority of
the user, some acknowledgement of the load average (called the expansion
factor in CP), and an initial (recent-history-based, I think)
characterization of the transaction. After that, the run lists are
sorted and serviced in 'deadline' order. This effectively prevents
indefinite postponement (but, as pointed out in earlier postings,
indefinite postponement strategies are rendered ineffective when
faced with inadequate hardware).
5) While a transaction is 'runable', it is thrown into the pool and gets
time-slices in rough proportion to it's deadline priority. The short-
term scheduler strategy is a modified round-robin (I think someone once
called it a round-turkey :-) ). If a transaction sucks up enough
resource, the scheduler decides it was wrong to characterize this
transaction as interactive, and it moves it to Q2 (which also has the
effect of recalculating it's deadline, because it has now shown itself
to be less interactive). A similar shift can occur Q2->Q3. Living in
'lower' queues has the effect of getting you larger slices of resource,
but less frequently.
There's more, but you get the point (and this is getting long). One of
the things I found amazing (coming from a background with more
sophisticated schedulers) was how well this all works, in the face of
tremendous loads. We used to run 3 large UTS systems (a 370 UNIX-like
system) and a smattering of little CMS users (mostly development and
maintenance) on our system. Even when the machine ran flat out, editing
response for CMS users was uneffected by the load.
(There were other problems, but they basically weren't CP's fault, and
are really another story).
As another example, we recently disconnected one channel path to DASD
(in order to provide a test path for a new system being build concurrently
with production). That cuts our access to disk in half.
No one seems to notice in their editing response. I can see it in the
performance reports, but it's minimal. I can also now flog the system
with artificially constructed workloads and actually effect other users
(something I basically couldn't do when both channels were connected).
I think we're seeing the effect of inadequate hardware there, though, and
not bad scheduling decisions.
One place where this all falls down is that the scheduler basically makes
only CPU dispatching decisions. It has no influence on I/O queue decisions,
nor paging queue decisions. This all works if you have overdesigned your
I/O configuration relative to the rest of the machine, but would fail
badly otherwise. This is relatively easy to do with 370 arch. I/O gear,
but (seemingly) much harder on VAXEN. I'm curious how this is compensated
for in scheduling for 4.2
Well, enough for now. I don't know how interested people on this list are
in hearing about work being done on other systems (especially blue ones :-)).
Michael Wagner (wagner at utcs)
More information about the Comp.unix.wizards
mailing list