SUMMARY: To "nice" or not to "nice" [LONG]
Sheryl Coppenger
sheryl at seas.gwu.edu
Thu May 30 01:04:51 AEST 1991
My original posting was:
>Historically (since before I worked here), there was a policy for
>users running large jobs which ran along these lines:
> 1) Run them in the background
> 2) Start them up with "nice" to lower priority
> 3) Only one such job per machine
>I have a user challenging that policy on the grounds that UNIX will
>take care of it automatically. I am aware that some systems have
>that capability built in to the kernel, but I am not sure to what
>extent ours do or how efficient they are. I have looked
>in the mauals for both of our systems (Sun and HP) and in the
>Nemeth book, but they are pretty sketchy.
>
I went on to ask if other sites had such a policy and if anyone
had information specific to the machines we used (SunOS 4.1/4.1.1, HP-UX 7.0).
I waited until after the holiday weekend to summarize, in case other sites
have short expires on their news articles.
Response was mostly from adminstrators and overwhelmingly FOR the policy.
Details varied depending upon the type of machine, the environment and other
factors (including, perhaps, the temperament of the administrator).
People who said the policy was unnecessary seemed to be, like the user
challenging the policy here, quoting the general texts about what SHOULD
happen in the UNIX operating system (the Maurice Bach book or the BSD Daemon
Book). The administrators were more likely to quote O'Reilly & Associates'
_System Performance Tuning_. I was read a statement over the phone along
the lines of "Users will tell you that 'nice' doesn't have any effect --
don't believe them". Some of those responding either had or were writing
software to automatically renice or start and stop processes. I have a copy of
one package and will try to get others and experiment here.
Some replies contained assumptions about the type of programs being run and
pointed out that programs which were I/O bound or doing a lot of paging would
not be affected by nicing. Something along those lines may be what's
happening when ksh or finger processes run wild and take over the CPU.
Unfortunately, I haven't had a chance to run experiments here. Blair Houghton
was kind enough to do so and post the results here, but since they were for
Ultrix I doubt I will get the same results on our systems.
Some interesting statements taken out of context:
"nohup" automatically nices jobs
(True on SunOS but not HP, and part of the problem is that
users are running from the shell and not backgrounding jobs)
Kernels that do renice default to 4 which is insufficient. On SunOS,
a "nice +4" will still allow large jobs to interfere with nfsd and
inhibit file server function.
(NFS interference was noticed here, and often we found large
jobs on file servers because users called in to complain
that they couldn't login to a workstation or got NFS "not
responding" errors.)
SunOS won't renice processes but HP-UX will. However, HP-UX will
pop the priority up high again after a time.
(I heard about the automatic renicing first in an HP context.
HP-UX handles realtime priorities, and I think you have the
option of loading daemons as realtime processes in order
to improve NFS, etc. Large processes have been less of a
problem on our HPs, but our graphics users notice a difference
when they're trying to run animations on an HP9000/835 and
jobs are running in the background. We also have a problem
with runaway ksh processes and the kernel never seems to
detect those and lower the priority enough to allow interactive
users to get their work done).
Ksh and csh do NOT change the priority of background jobs, but the
Bourne shell will.
(We run ksh mostly, occasionally csh or bash).
Users should be required to use the "batch" command instead of the
"nice" command, because "batch" lowers priority.
(I can find no evidence in the man pages that batch does
this. In the most recent version of the policy, we require
the users to batch AND nice jobs. Batch schedules according
to load according to the manual. It also lets users run
multiple jobs serially).
Many thanks to all who responded (and to those who probably will respond
to this posting too).
Below I include edited copies of the replies I received by mail. If anyone
didn't see the follow-up postings, I will be glad to mail them a copy (I
have 4, I think).
===============================================================================
>From nick at fwi.uva.nl Thu May 16 05:18:23 1991
Renice is only used by the scheduler to give each process a Priority. The
evaluation of priority takes into account recent CPU usage, and is weighted
heavily in favour of interactive processes that require CPU time in short
bursts. Typically a 'renice 19' on big processes has little effect, since
thay will be constantly paging, which is unaffected by nice values ( its in
the bottom half of the kernel I believe ). A `renice -19` will however quite
possibly stop your system, by giving other processes virtually no CPU activity.
One of the best sources of info on this side of UN*X is the BSD Daemon book
by Leffler, McCusik and Karels. IF you don't have access to the book I can
find out the full name and ISBN number I you need it. The book is _Mega_, a
sort of BSD bible.
>From onward at freefall Thu May 16 09:49:33 1991
There is no definite answer to the question you have.
As much of it is a matter of etiquette as it is OS specific,
plus, it depends very much on what the jobs do.
However, here are some points to think about:
0. Unix does not take care of it automatically. It only tries.
1. nice only modifies the Scheduling priority, not the execution/cpu
priority. (Internals).
2. processess lose priority if they are constantly runnable (ie. more
or less cpu bound). When they get an I/O interrupt, their priorities
jump up, so that they can complete their I/O call, but if they then
hangs on the cpu, the priority drop real quick again. (Job Type).
3. multiple large jobs do drag down a machine. Statistically, this is
NOT due to the cpu resource being exhausted, but due to the amount
of paging involved with large processes. (Job Size)
Suns do not do context switching too well when the number of runnables
go past 8 (or was it 16 - there was much discussion about this
in comp.arch 2 months ago)
3a. if the machine is diskless and 8 MB in real memory, even one large
job is noticeable if someone is also working on the workstation.
4. Policy suggestion: on hp90000s300, s400, s3 and s4, don't worry
too much if they are functioning as single user workstations and not
multiuser servers. s800 machines were designed to be multiuser
servers, except perhaps for the 815, so you may not want more than
3 or 4 large jobs running on it simultaneously.
5. Try to make a balance between:
a) online user response time
b) turnaround time for large jobs
c) machine resources (machines with lots of real memory tends
to run large processes much better)
d) machine dedication
e) do your users really need to run that many large jobs ? or
are they just letting the computer do their thinking for
them ? Remember the old days when resources were REALLY
expensive, and people tried out their models by hand before
putting them through the system.
>From kelly at remus.ee.byu.edu Thu May 16 10:43:23 1991
We have about 50 hp 9000/300 and we always ask users to start long
jobs niced as much as they can (19). We have enough machines that one can
usually be found with not much running. This does not hurt the owner
because as you probably know if nothing else is going on they will still
get all of the cycles even if they are niced.
Interactive users pay the price if the job occupies a lot of memory
as the process swaps in and out. We have solved this problem for most
users with a program called real-nice. It monitors key strokes about every
minute and will not swap a large job in unless the keyboard has been idle
for more than a minute.
I have had no problems convincing users to use these tools and the users
pretty much police each other. Once in a while I get a complaint and so we
have written a renice for hp-ux and that solves all of these complaints.
>From chip at pender.ee.upenn.edu Thu May 16 12:31:51 1991
I administrate a Sun 4/280. During my tenure it ran everything from
4.0 to 4.1. It is used for CPU intensive processes that last from
seconds to weeks. It is also our mail and news server for the
department, so we have to keep interactive performance up.
After a year of hand nicing processes in various combinations, I have
chosen the, and written a program to implement it. This policy was
designed to meet the following criteria:
1) Interactive use should be not be significantly degraded by system
load.
2) Since people frequently run CPU intensive processes in foreground,
some other way must be used to distinguish interactive from
non-interactive use.
3) Since people frequently will use screenlock rather than logout on
their personal workstations, interactive processes may accumulate
significant total CPU usage.
4) The implementation of this policy should not require users to do
anything.
5) People running CPU-intensive jobs should each get an equal portion
of the CPU. Specifically, someone running two processes should not
get twice as much CPU as someone running one process.
Here's the procedure I use now, with comments about what I'd like to
improve. I am planning on rewriting this program over the summer so
that I can use it on all of the machines that I administrate, and so
that I can distribute it to other sysadmins.
<some garbage was inserted here. I'm not sure how much was lost>
specific values were determined empiricly.
I have a program that runs every six minutes. It creates a list of
all processes that have used more than 2.5 CPU minutes. In my
environment this excludes two week old emacs sessions while catching
most CPU intensive processes in the first 6 or 12 minutes.
This list is then sorted by user, and each process is niced according
to the total number of processes owned by that user. IE, each user's
processes all run at the same nice value. Remember that we are
ignoring all interactive processes, so they are not reniced.
Nice values are assigned according to the following table:
Number of jobs 1 2 3 4 5 6 7
"nice" value 5 9 12 14 15 16 17
In general, I have had great success with this system. The users
prefer it to getting yelled at when they forget (or didn't know to)
renice their jobs. The users who did renice their jobs like the fact
that they don't have to bother, and no one else can "cheat". The
interactive users like the fact that system performance is pretty
stable.
Here are the things I'd like to improve:
1) These values "encourage" people to run one job at a time. I two
people are running one job each, and two people are running two jobs
each, and another person is running three jobs, the last person's jobs
are effectively stopped. A better scheme would be to renice the first
job to 4 and all other jobs to some high nice value. When the first
job finished, the next job would be reniced to 4, etc. I am
concerned, though, about the possability of someone running a long,
CPU intensive pipeline of commands. I haven't come up with a better
way to handle this while still maintaining "fairness".
When I notice "stopped" jobs of this sort, I send a form letter to the
user explaining that while running multiple jobs is not forbidden,
they would run much faster if done sequentially. This letter explains
how to use "batch" to run jobs sequentially. Most of my users were
not specifically choosing to run in parallel, but were simply trying
to run all three job overnight.
2) I'd like to add a test for the size of the jobs, so that one user
cannot use up the entire virtual memory of the machine. I am
considering killing single jobs if they use more than 48 Meg, and
multiple jobs if they total more than 32 Meg.
The reasoning is that while generally I don't want people using more
than 32 Meg, I understand that some jobs legitimately need more. But
if you are running a huge job tghat requires over 32 Meg, you
shouldn't be running other jobs at the same time.
I realize that this information is somewhat disorganized. Please feel
free to write me for further explanation or more information.
>From kaul at ee.eng.ohio-state.edu Thu May 16 12:48:16 1991
Most BSD derived systems will do that, but the reduction is
insignificant. They will nice the long running background job to
level 2 by default (SunOS is an example of this), but that still is
high enough to seriously interfere with a multi-user system.
What are other system administrators doing about this issue?
We have a policy that varies with the type of machine. On our Sparc2s
we allow more long running jobs, but ask people to keep the load below
8. In general, our policy is that anything that's going to take more
than 1/2 hour should be niced to level 20, a user can run one job on a
machine and no more than 2 longterm jobs on the network at once.
Further, for most of our machines (SLCs, Sun3s) we require that no
more than 2 longterm jobs be running, but we allow up to 8 on our
Sparc2s. Penalties for violating policy include a warning the first
time, a conference with the advisor and offender the second time and
the death penalty the third time (with no appeal possible).
If there are good reasons for the policy, I want to be able to
justify it as well as enforce it.
The reason is that we want people to be able to get work done. We
have had a few grad students who submitted 8 jobs to one machine and
brought everybody else's work to a halt. He didn't last long ;-)
-rich
ps. One thing you'll notice is that "nice" has no effect on I/O
limited programs. That's a small trouble around here, though, since
much of our work is numerically intensive.
>From bernhold at qtp.ufl.edu Thu May 16 12:52:19 1991
We use a similar policy around here, on our network of Sun 3/50s and
4/380 file servers, 4/490, FPS-500, and IBM RS/6000-530 compute
servers. The number of jobs depends on the machine, and is subject to
revision, since we are trying to find the best balance. On the 3/50s
we don't care how many jobs -- they are all on desktops and allocated
to individuals. On the file servers, we currently allow two at a time
- one long-running and one less than 1 hr. On the comupte servers,
the limits are somewhat higher, but we try to avoid having too many
jobs at once so that there is ample virtual memory for the running
jobs.
The Suns, running SunOS 4.1.1, perform much better with the jobs
niced. Otherwise they are competing against nfsds on basically an
equal level, which impairs the file server function. Running them
"nice" shifts the balance towards the file server capability --
basically the batch jobs run in the "holes". SunOS's scheduling
algorithm doesn't seem to do this "automatically" -- at least not to
the extent we want.
The RS/6000 and FPS-500 are run as compute servers, so we're less
concerned about niceness on them, though on the FPS, we are using
different levels of niceness to give priority to the group that paid
for the machine over those who get a free ride.
>From appmag!curly!pa at hub.ucsb.edu Thu May 16 14:04:00 1991
Don't know about HPUX's. Back at Carnegie Mellon, our bsd 4.[23] would
renice anything to +4 that had accumulated more than 5-10 minutes of
CPU time. And it wasn't enough. Empirically, `nice +8' (csh syntax)
was better, i.e. it would preserve interactive response. The
interactive users would get all the cycles they needed, and the
background jobs would compete for the rest. That's for CPU cycles.
Now if a memory hog came in, the machine could very well start to
thrash or run out of paging space. In this case only, it would be
important to limit the number of jobs. This was on a VAX 785.
I repeat, the system's handling was inadequate. I had to periodically
post instructions on the local bboard, because too many users didn't
know how to lower their priority manually.
On workstations AIX and DGUX (both SysV derivatives) I never noticed
any attempt by the system to change priorities. If I want to keep my
interactive response, I have to nice jobs to 12. If I had lots of
naive users, I would probably write a little renicing daemon...
>From octela!octelb.octel.com!jfd at mips.com Thu May 16 14:25:28 1991
I run SunOS (mostly 4.0.3, but some 4.1.1) so can't speak for HP-UX.
I don't believe that SunOS "automagically" prioritizes jobs for you
(oh that it were true!). I have users fire up multiple large jobs that
beat the heck out of the machine. Nice'ing these makes a world of difference,
particularly for interactive response (my servers are CPU & NFS servers, and
handle 10-20 logins). Multiple large jobs really kill performance, quickly
putting the machine into thrashing mode (particularly compiles on the same
spindle). Of course these are 3/480's, my 4/490 does a little better :)
The kernel may handle prioritizing things like NFS service vs. local I/O, but not
multiple user jobs. Unless you count multi-tasking, which means you can run
multiple jobs, and they should get near equal time (depending on a lot of
factors). But what you really want is to prioritize interactive vs. long
term CPU jobs such that big compiles don't affect joesphine user's rn session:)
But it all boils down to politics, and what are the local "policies", what
managment can/will support and how creative the users get at submiting jobs.
My experience is once you get management to agree to specific policies, stick
to them, once you allow an exception you open the floodgates. But, it is a
good idea to have the policy specify exceptions, and when/how they are allowed.
So, at the end of the fiscal year, with deadlines looming, we can say "Yes, you
can run multiple jobs, but it requires xxx permission". The neat trick is to
get xxx to understand when to give permission.
Are you running sps/ps/vmstat to look at what the system is doing? This might
help "prove" the OS isn't scheduling intelligently. I also found "System
Performance Tuning" (O'Reilly & Assoc.) useful.
>From jmattson at UCSD.EDU Thu May 16 14:35:47 1991
Here, we have about 40 Sun-3's and about 35 Sun-4's running 4.1.1, along
with s few other oddballs (HP, Vaxen, etc.). We are responsible for
faculty, staff, and graduate student machines in offices and labs. Faculty
and staff are rarely a problem, but the graduate students are working with
insufficient computing resources, and there have been several problems with
people being inconsiderate of others.
On the primary graduate student machine (a sun 4/370 w/32mb of memory), we
don't allow long-running jobs at all. We run a daemon that enforces nice 19
on all jobs with over 5 minutes accumulated CPU time (with the exception of
shells, editors, etc.). Furthermore, if the system performance gets really
bad for interactive use, we will look for any long-running jobs that are in
violation of the usage policy and ask their owners to kill them.
We do provide one machine explicitly for long-running jobs (a sun 4/280 with
56mb of memory and LOTS of swap). The same daemon enforces nice 4 on all
jobs with over 5 minutes accumulated CPU time here (same exceptions).
We also have problems with a graduate student lab of 12 Sparcstations.
People tend to leave long-running jobs on these machines which can really
degrade interactive performance (especialy since these machines only have
16mb of memory). Here, the same daemon will STOP any job with over 5
minutes accumulated CPU time if there is an active (idle less than 5
minutes) console user, and if the job doesn't belong to the console user.
Stopped jobs will be continued when the console user logs off or goes idle
for more than 5 minutes.
We have found that the biggest problem with long-running jobs is not their
CPU usage, but their memory usage. On a machine like a Sparcstation with
SCSI drives, paging is just too slow. Once the physical memory of these
machines is exhausted, paging starts and performance drops by an absolutely
incredible amount. (The machine can be idle 75% of the time waiting on disk
pages, even with several jobs in the run queue.) This is the primary reason
for not allowing long-running jobs when there are interactive users on the
machines.
SunOS 4.1.1 uses the same priority scheme that 4.2 BSD used. Jobs with the
best priority are scheduled in round-robin fashion. Every second,
priorities are recalculated, so that jobs which have not obtained much CPU
"recently" will get better priorities. This ensures that no one starves.
The nice value is used in the priority calculation, to reduce the demand
that a particular job makes on the CPU. However, even a very nice job will
get some CPU every now and then--even on a heavily loaded system. The
problem is that if there are high demands on physical memory, the nice job
will probably have lost all of its pages while waiting, and it will
immediately page fault when it gets scheduled to run. With enough big jobs
running in the background, your machine will start to thrash. Nothing in
SunOS checks for this or attempts to do anything to alleviate it.
>From chs!danq at jetson.UUCP Thu May 16 15:14:29 1991
Well, experiment will probably quickly convince you that for instance
running multiple troff jobs at once will be slower than running the
same jobs sequentially. Unix will attempt to be fair about running
several large jobs, in the sense that it will attempt to give them all
equal parts of the cpu over relatively short periods of time
(seconds). Because of the time spent context switching between jobs,
and (if they're large enough) the swapping resulting from using more
than available physical memory, several large jobs at once will take
longer to run than the same jobs in sequential order.
The scheduler may have some bias toward interactive jobs built in, but
it is quite easy for large jobs on Sun's to make life miserable for the
interactive user. Nicing these jobs will help. Running only one at a
time will help. Running large jobs at odd hours (using at) will help.
The scheduler does not have the smarts to do these things itself. I
don't think any internal knowledge about Unix is necessary; the effects
of running large jobs is immediately evident in slower response time.
If you don't see slower response time, then it's probably not worth
worrying about. If you do, experiment with renicing the job(s) in
question.
The "top" command (available from the sun archives at Rice) is helpful
in telling what jobs are actually eating up the cpu. You might want to
try running that.
>From chris at suntan.ncsl.nist.gov Thu May 16 15:28:52 1991
Someone already posted about specific systems that automatically
renice a cpu-bound process. Most don't, however. It's a good
solution for those who don't follow the policies you've outlined.
I would omit the third policy though. If a process is niced, I
haven't seen any significant performance degradation if there are one
or five of them. That is, processes sitting in the ready-to-run queue
(but not running due to low priority) have little effect on system
performance on SunOS systems I've worked with. You should perform the
same experiment on your system. Yes, the load average WILL go up (all
that tells you are the # of processes *ready* to run, not actually
running), but interactive response should be more than adequate.
However, I've taken this problem and cut it off at the head. All our
users run "tcsh" which executes /etc/Login if it exists. All
workstations have this file; my personal Sun 386i workstation running
Sunos 4.0.2 is called "suntan":
# tcsh file exec-ed by all users before ~/.cshrc
#
if ( $HOST == suntan && $USER != chris && $USER != root ) then
/etc/renice +20 $$ >& /dev/null
echo "System response may seem a bit sluggish..."
endif
# Stan is a special case. On *all* systems he get's niced.
#
if ( $USER == stan ) /etc/renice +15 $$ >& /dev/null
A little confusing, but basically if it's not me (Chris) or root
logging into my system, their login shell gets reniced severely and
all their subprocesses inherit the login shell's nice level. Other
machines they don't get reniced at all since I don't use them. :-)
This has the unfortunate side-effect that although their cpu-intensive
processes don't interfere with me, all their processes run at the same
priority. Thus, if they launch something into the background, a current
editing session will run at the same (low) priority. Now that I think
about it, I should nice them in their login shell to 15 so they can
nice their background jobs to 20 should they desire.
Surprisingly, this setup works REALLY well! Most users don't even
notice the subtle login message when they get reniced.
You might want to run your shell's executable through "strings" to see
if it executes any files prior to the users home .login/.cshrc/.profile.
>From revell at uunet.uu.net Thu May 16 18:46:03 1991
I think your user may have been talking about "nohup"'ed jobs. Nohup
increments the priority by 5. I don't know of any systems that alter
the priority just because the job is changed to the background.
>From @jhereg.osa.com:nightowl!det at tcnet.uucp Fri May 17 04:57:21 1991
What shell(s) are your user's using? Ksh and Csh do not change the priority of
jobs in the background while Sh will automatically "nice" the job by four.
Here is the results of the command "sleep 300 &" under the shells ksh, sh, and
csh, respectively:
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME COMD
10 S 1001 13787 13784 0 39 20 4026b4 11 e0000000 ttyF01 0:00 sleep
10 S 1001 13792 1 0 39 24 4026b4 11 e0000000 ttyF01 0:00 sleep
10 S 1001 13814 13813 0 39 20 4026b4 11 e0000000 ttyF01 0:00 sleep
^^
>From cantin at nrccsb3.di.nrc.ca Fri May 17 18:02:44 1991
I insist my users use "batch" instead of nice. I forbid them to use &
because it penelizes too much interactive users. "batch" runs at a
lower priority AND sends any generated output to the user via mail.
You can also limit the number of "batch" jobs running on the system by
modifying /usr/spool/cron/queuedefs. This way, users can send
many jobs for execution, but if the maximum limit is reached, those jobs
will be queued simply to be run when other complete.
>From mcorrigan at UCSD.EDU Fri May 17 23:32:34 1991
>
> 1) Run them in the background
>
Yes,
> 2) Start them up with "nice" to lower priority
>
Yes.
> 3) Only one such job on a machine
>
Depends on how *big* since 2 mediums could make one big
>I have a user challenging that policy on the grounds that UNIX
>will take care of it automatically. I am aware that some systems
Not so. Some UNIXes ( BSD ) will automatically renice a job to
a nice of 4 after a certain number of minutes, but HP-UX does
not do this. It is true that the UNIX scheduling algorithm
lowers the priority of a job based on how time it has gotten recently
but then the priority pops back up if it got low for a while.
The algorithms are sophisticated with at least 2 regimes of
scheduling, but all these are intended for interactive response
to be maintained at an acceptable level , or that a fair share be given
at all times to ALL jobs. When A job lasts for 50 hours, then
it just doesn't make sense to allow it do get a fair share with the
interactive users. If you lower the priority to as low as it can go ( nice
== 19 ) then I find that the job may get no time for part of the
day but whenever the system is idle the job gets right back in
there for 100% of the cpu ( like from midnight to 9 am).
For canned software packages I do the reniceing myself
by wriintg a C program that is the one that is in the path
that nices itself and then calls the real package with all the
same args but now runs at low priority to start with.
>From bach!chuckp at ncr-mpd.ftcollinsco.NCR.COM Mon May 20 19:22:32 1991
see batch(1). It's part of SVr[34] and included on Suns. Don't know about
the others.
>From cks at hawkwind.utcs.toronto.edu Tue May 21 22:58:08 1991
Blair Houghton has already posted some nice numbers on this subject
(and some formulae). My local experience has been that a nice value of
between 10 and 15 will keep the interactive users from feeling the
extra load, even if the niced jobs are thrashing around on the disk a
fair bit. I've run parallel kernel builds at +10 to +19 on not too
studly Vaxen and only had people monitoring the load notice (load
averages of around 20+). However, even one or two processes grinding
away at nice 4 (the default 'renice' value on the few kernels that do
this to processes) will be easily noticed by the users.
You might want to see if you can get some sort of job batching
system; there are a number of nice ones floating around. The better
ones do things like stop the running job(s) once the load average
climbs to high, or stop the running job(s) when N people are logged
on, and so on. Better yet, you get the source, so you can put in
custom hacks if necessary to adapt them to local convetions.
--
Sheryl Coppenger SEAS Computing Facility Staff sheryl at seas.gwu.edu
The George Washington University (202) 994-6853
More information about the Comp.unix.internals
mailing list