Controlling high load on 4.2 unix
Keith Muller
muller at sdcc3.UUCP
Sat Oct 27 21:36:00 AEST 1984
In response to the high number of requests for the load control system
I described about one week ago, I am going to post it to net.sources
sometime next week. Enclosed below is a rough draft of the manual page
for the load control server.
Keith Muller
UCSD Academic ComputerCenter
ucbvax!sdcsvax!sdcc3!muller
---------------------------------------------------------------------
.TH LDD 8 "25 May 1984"
.UC 4
.ad
.SH NAME
ldd \- load system server (daemon)
.SH SYNOPSIS
.B /etc/ldd
[ -L load ] [ -M max_time ] [ -T alarm ]
.SH DESCRIPTION
.TP
.B \-L
changes the load average that ldd attempts to maintain to
.I load
instead of the default (usually 10).
.TP
.B \-M
changes the maximum time (in seconds) that a job can be queued to
.I max_time
seconds instead of the default (usually 14200 seconds or 4 hours).
.TP
.B \-T
changes the time (in seconds) that the
.I ldd
server waits between load average checks to
.I time
seconds instead of the default (usually 60 seconds).
.PP
.I Ldd
is the load control server (daemon) and is normally invoked
at boot time from the
.IR rc (8)
file.
The
.I ldd
server attempts to maintain the system load average (number
of
.I runnable
processes) below a preset value so interactive programs like
.IR vi (1)
remain responsive.
When the system load average (1 minute as shown bye
.IR uptime (1)
) is above the preset limit,
.I ldd
will "block" specific \f2cpu intensive\f1 processes from running and place
them in a queue.
These blocked jobs are not \f2runnable\f1 and therefore do not
contribute to the system load. When the load average drops below the preset
limit,
.I ldd
will remove jobs from the queue and tell them to continue
execution.
The system administration determines which programs are
considered \f2cpu intensive\f1 and places control of their execution under the
.I ldd
server.
.PP
A front end
.I client
program replaces each of the programs to be controlled by the
.I ldd
server.
Each time a user requests execution of a controlled program, the
.I client
enters the "request state",
sends a "request to run" datagram to the server and waits for a response. The
waiting client is "blocked" waiting for the response from the
.I ldd
server.
If the
.I client
determines that the
.I ldd
server is not running, the requested
program is executed as if there was no load control system.
A process will not block if the
.I ldd
server is not running.
.PP
The
.I ldd
server can send one of four different messages to the client.
A "queued message" indicates that the client has
been entered into the queue and should wait.
A "poll message" indicates that the message should be resent (the server
did not get the message).
A "terminate message" indicates that this request cannot be honored
and the client should exit abnormally.
A "run message" indicates the requested program should be run.
.PP
If the client does not receive an answer to a request after a certain
period of time has elapsed (usually 90 seconds), the request is resent.
If after a preset number of times
resending the request no response is obtained from the server,
the requested program
is executed. This prevents the process from blocking forever
if
.I ldd's
fails to respond to the requests (due to a failure).
.PP
After receiving the "queued message" the client enters the "queued state"
and waits for another command
from the server (usually getting the run command).
If the user does not have the environment variable "LOAD" set to "quiet",
the status string "queued" will be printed on stderr.
If no further commands
are received after a preset time has elapsed (usually 15 minutes),
the server re-enters the "request state" and sends the request
to the server again.
This assures that the server has not terminated or
failed since the time the client was queued.
.PP
The
.I ldd
server logs all recoverable and unrecoverable errors in a logfile. Advisory
locks are used to prevent more than one executing server at a time.
When the
.I ldd
server first begins execution, it scans the spool directory for clients that
might have been queued from a previous
.I ldd
server and sends them a "poll request".
Waiting
.I clients
will resend their "request to run" message to the new
.I ldd
server, and re-enter the "request state".
The
.I ldd
server will rebuild the queue of waiting tasks
ordered by the time each client began execution.
This allows the
.I ldd
server to terminate and be re-started without
loss or blockage of any waiting clients.
.PP
When the server receives a "request to run",
it has to determine if the job should run immediately, or be queued.
If the queue is not empty, the request is added to the queue,
and the client is sent a "queued message" to indicate that
it has been placed in the queue.
If the queue is empty,
the server checks the current load average, and
if it is below the limit,
the client is sent a "run message".
Otherwise the server queues the request, sends the client a "queued message",
and starts the interval timer.
The interval timer is bound to a handler that checks the system load every
few seconds (usually 60 seconds).
If the handler finds the current load average is below the limit,
jobs are removed from the queue and sent a "run message".
The number of jobs
sent "run messages" depends on how much the current load average has
dropped below the limit.
If the queue becomes empty the handler
will shut off the interval timer (as it no longer needed).
If the handler finds the load average is above the limit, it checks
how long the oldest process has been waiting to run.
If that time is greater than a preset limit (usually 4 hours) the job is
removed from the queue and told
to run regardless of the load.
This prevents jobs from being blocked forever due to load averages that
remain above the limit for long periods of time.
.PP
Commands can be sent to the server by the
.IR ldc (8)
control program. These commands can manipulate the queue and change the
values of the various preset limits used by the server.
.SH FILES
.nf
.ta \w'/usr/spool/ldd/cntrlsock 'u
/usr/spool/ldd ldd spool directory
/usr/spool/ldd/msgsock name of server datagram socket
/usr/spool/ldd/cntrlsock name do server socket or control messages
/usr/spool/ldd/list list of queued jobs (not always up to date)
/usr/spool/ldd/lock lock file (contains pid of server)
/usr/spool/ldd/errors log file of server errors
.fi
.SH "SEE ALSO"
ldc(8),
ldq(1),
ldrm(1).
More information about the Comp.unix.wizards
mailing list