Workstation benchmarks (long!!)
Paul Raveling
raveling at isi.edu
Sat Jul 14 07:10:42 AEST 1990
Last week I ran some benchmarks with interesting results
for evaluating some combinations of workstations, operating systems,
and C compilers. Following the formfeed below is a rather long
report on the results. The two systems being most seriously
compared were an HP 9000/370 and a Sun 4, but various results
are included for a Sun 3 and a VAX 8650.
Beyond some obvious conclusions about which {hardware/OS/compiler}
is fastest in various circumstances, one result that I find
interesting supports an old hypothesis of mine in the area
of OS theory. This hypothesis is essentially that context switch
overhead is the principle determinant of OS performance in the
presence of a typical multi-process workload.
Please note that this is cross-posted to several newsgroups
that may have an interest in the machines, OS's, and compilers
that were compared. It would be appropriate to edit the
Newsgroups line in any followups. Also, please be aware that
I don't subscribe to most of these newsgroups; the best way
to get a question to me would be by email or by a followup
to comp.sys.hp.
----------------
Paul Raveling
Raveling at isi.edu
Last week I ran two suites of benchmarks to compare various
combinations of workstation hardware, operating systems, and
C compilers. Emphasis was on:
** HP 9000/370 vs Sun 4 vs Sun 3
** HP-UX vs BSD
** Native C compilers versus gcc
One suite was the small collection that I've been using for
a couple years, the other is the BYTE UNIX benchmarks published
recently on comp.sources.unix.
Some Conclusions
----------------
-- Comparisons between HP-UX and BSD on HP 9000/370's
indicate that BSD is generally much faster. The main
differences are in speed of context switching and i/o.
-- Context switch overhead is probably a key determinant
of overall system performance. The HP-UX/BSD comparison
shows strong similarity between relative speed ratios for
BYTE's system loading test and context switch benchmarks;
the same correlation does not apply well to other low level
benchmarks.
-- The C compiler that produced the fastest code at maximum
optimization was the vendor's C compiler on both the
HP 9000/370 and the Sun 4. However, gcc may produce
faster floating point code on the Sun 4.
-- Processor speed tests show that the HP 9000/370 and
Sun 4 are about equally matched, except in two areas:
The Sun 4 is faster in floating point and recursion.
-- BYTE's I/O throughput tests showed that the Sun 4 was
surprisingly slow. Both the HP and a Sun 3 were faster.
Measured Results
----------------
All results that follow are expressed as relative speed ratios
based on some measured quantity: User process time, system time,
real time, or i/o rates.
1 is assigned to the fastest measured result.
n means "n times slower than fastest"; "n" is
expressed to 2 fractional digits (e.g. "1.23")
I.e., the lower the speed ratio, the faster the performance.
In a few cases two or more different machines/systems/compilers
produced a dead tie for the fastest measured result. In this
case both show "1" as their relative speed ratio. "1.00"
indicates a speed very slightly slower than the fastest,
for which the ratio rounds to 1.00.
1. Best optimizing compiler:
On HP 9000/370's it was HP-UX's compiler. On Sun 4's it
was Sun's, except that gcc was better in BYTE's floating point
math tests. Measured results were user process time, and
on benchmarks marked with "(r)", the "{dhry/whet}stones/second"
rating reported by the benchmark.
Compilers on the HP were:
"HP-UX cc": Native compiler from HP-UX 6.5
"gcc": gcc 1.37.1
"BSD cc": gcc 1.34, as supplied by Utah for BSD
Compilers on the Sun 4 were:
"Sun cc": Native compiler from SunOs 4.0.3
"gcc": gcc 1.37.1
HP 9000/370 Sun 4
Benchmark HP-UX cc gcc BSD cc Sun cc gcc
--------- -------- --- ------ ------ ---
dhrystone 1 1.26 1.19 1 1.17
dhrystone(r) 1 1.25 1.43 1 1.17
whetstone 1 1.15 1.10 1.01 1
whetstone(r) 1 1.16 1.07 1.01 1
tak 1 2.10 2.06 1 1.24
dhrystone2a(r) 1 1.15 1.41 1 1.76
dhrystone2b(r) 1 1.14 1.39 1 1.79
arithoh 1 2.18 1.76 1 1
register 1.01 1.02 1 1 10.51
short 1.12 1 1.00 1.00 1
int 1.01 1.03 1 1 1.03
long 1.01 1.03 1 1 1.02
float 1.07 1 1.60 2.42 1
double 1 1.10 1.04 1.14 1
tower of hanoi 1 1.83 1.83 1 1
2. Relative processing [hardware] speeds:
These results also are based on user process time.
For the HP and Sun 4, the measurements used are those for
whichever compiler's executable was fastest. Only the
installed "cc" was used on the Sun 3 and the VAX.
This doesn't precisely show relative hardware speed
because it's at the mercy of the available C compilers.
Benchmark HP 9K/370 Sun 4 Sun 3 VAX 8650
--------- --------- ----- ----- --------
dhrystone 1.18 1 5.02 2.44
dhrystone(r) 1.18 1 5.03 2.45
whetstone 1 1.01 23.65 2.62
whetstone(r) 1 1 24.34 3.77
tak 1.88 1 3.91 3.47
dhrystone2a(r) 1.13 1 3.73 2.08
drhystone2b(r) 1.12 1 3.74 2.16
arithoh 1 1.18 4.12 (Test failed on VAX)
register 1.38 1.40 2.39 1
short 1 1.40 2.01 1.16
int 1.26 1.50 2.17 1
long 1 1.20 1.73 1.37
float 2.23 1 67.62 4.91
double 1.54 1 39.15 2.94
tower of hanoi 1.50 1 4.25 (Test failed on VAX)
See item 4, 3 pages farther on, for a comparison of
relative i/o speeds. These would be largely dependent
on hardware, but as item 3 on the next page shows,
choice of operating system is also significant.
3. Relative operating system system speeds:
Direct comparison of HP-UX 6.5 and BSD 4.3 on identical
HP 9000/370's. Tests included 3 types of benchmarks:
-- Low level processor-intensive tests
-- Low level i/o-intensive tests
-- High level tests of a simulated workload
Low level processor-intensive tests:
System Time Real Time
::::::::::: :::::::::
Benchmark HP-UX BSD HP-UX BSD
--------- ----- --- ----- ---
pt [context switch] 2.08 1 2.21 1
iocall 1 1.15 1 1.13
system call overhead 1 1.19 1 1.19
pipe throughput 1.33 1 1.28 1
pipe-based context sw. 2.74 1 2.15 1
process creation 1.33 1 1.15 1
execl throughput 1.45 1 1 1.28
Low level i/o-intensive tests:
Filesystem throughput, based on reported KBytes/second.
Test Time System Read Write Copy
--------- ------ ---- ----- ----
1 sec HP-UX 1 1.17 1.27
BSD 1.08 1 1
10 sec HP-UX 1.27 1.29 1.91
BSD 1 1 1
20 sec HP-UX 1.48 1.48 1.67
BSD 1 1 1
High level tests of a simulated workload:
Bourne shell script and UNIX utilities
Concurrent Background ........Time........
Processes System & Compiler User System Real
--------- ----------------- ---- ----- ----
1 HP-UX cc 1.04 2.09 2.02
HP-UX gcc 1 2.22 1.98
BSD cc 1.29 1 1
2 HP-UX cc 1 2.22 2.15
HP-UX gcc 1.02 2.28 2.22
BSD cc 1.36 1 1
4 HP-UX cc 1.03 2.21 2.43
HP-UX gcc 1 2.20 2.12
BSD cc 1.32 1 1
8 HP-UX cc 1 2.29 2.16
HP-UX gcc 1.01 2.30 1.73
BSD cc 1.30 1 1
4. Net relative OS-related system speeds, comparing different
all tested combinations of hardware, OS's, and C compilers.
Comparisons in the immediately following table are based on
measured real time, except for the "n-sec" i/o benchmarks.
HP 9000/370 Sun 4 Sun 3 VAX
::::::::::::::::::: ::::: ::::: :::
HP-UX BSD SunOS SunOS BSD
::::::::::: ::: ::::::::::: ::::: :::
Benchmark cc gcc cc cc gcc cc cc
--------- -- --- -- -- --- -- --
pt 2.39 2.45 1.08 1 1.10 2.29 1.52
iocall 2.08 2.34 2.34 1.04 1 3.92 2.27
sys call ovhd 1.12 1.20 1.33 1 1.02 3.61 1.20
pipe th'put 2.09 3.08 1.63 1.03 1 3.36 1.44
context sw. 2.15 3.69 1 1 1 2.28 1
process creat'n 1.37 1.54 1.19 3.46 3.40 7.05 1
execl th'put 1.01 1.03 1.29 1.84 1.79 3.51 1
1-sec read 1.03 1.10 1.15 1.26 1.31 1 [0.17]
1-sec write 1.14 1.21 1 1.40 1.45 1.10 [0.08]
1-sec copy 1.43 1.15 1 1.31 1.31 1.17 [0.36]
10-sec read 1.29 1.25 1 2.50 2.25 1.50 [0.13]
10-sec write 1.29 1.29 1 2.50 2.25 1.50 [0.11]
10-sec copy 1.87 1.95 1 3.07 2.26 1.65 [0.25]
20-sec read 1.44 1.53 1 2.55 2.55 1.53 [0.14]
20-sec write 1.44 1.53 1 2.55 2.55 1.53 [0.12]
20-sec copy 1.64 1.71 1 2.12 2.25 1.33 [0.27]
sh+ut load(1) 2.02 1.98 1 1.72 1.51 1.64 1.33
sh+ut load(2) 2.15 2.22 1 5.20 1.99 1.98 1.29
sh+ut load(4) 2.43 2.12 1 1.65 1.75 1.99 1.33
sh+ut load(8) 2.16 1.73 1 1.49 1.47 1.78 1.12
Hardware Configurations
-----------------------
HP 9000/370: 24 MB RAM, 68881 floating point (no FPA)
I/O via NFS mounts to another HP 9K/370
on local ethernet
Sun 4 24 MB RAM, programs loaded from local disk,
other I/O via NFS mounts to VAX 8650
Sun 3/80 8 MB RAM, programs loaded from local disk,
other I/O via NFS mounts to VAX 8650
VAX 8650 20 MB RAM, I/O to local disk
Notes
-----
1. All tests were run at least 3 times, and the BYTE benchmarks
ran many tests 6 times. The results reported are mean values
for all trials.
2. Measurements based on real time should be treated with
a bit of suspicion, particularly on the VAX, which supports
a substantial amount of activity in both user jobs and
NFS i/o. The BYTE benchmarks reported 95 interactive users
when they started on the VAX.
** A notable case is that variance was unusually high for
the "whetstones per second" rate reported on the VAX.
However, user process times reported for the same tests
were much more consistent.
The workstations should be fairly safe from loading by local
processes, but their i/o speeds are vulnerable to loading on
the local ethernet and file servers.
** And yes, gcc-generated code WAS slower by an order of
magnitude on the Sun 4 "register" benchmark. This is so
blatantly odd that I repeated both compiling and running
this test to be sure the numbers were correct and consistent.
3. The VAX's I/O was MUCH faster than the workstations,
sometimes by up to an order of magnitude. This may be
partly due to use of only local disks rather than NFS-mounted
files systems on the VAX. However, older benchmarks also
had suggested that workstations using local disks still
offered much less data bandwidth than the VAX.
In order to provide a meaningful comparison among workstations
for i/o, performance ratio "1" was assigned to the fastest
workstation. This is why the VAX's performance is fractional.
** A particularly interesting result was that i/o to/from
an NFS-mounted file system was slower on the Sun 4
than on the Sun 3. Both machines were using the same
file system on the same server.
More information about the Comp.unix
mailing list