A Correction to IMSL/NAG comparsion
Booker Bense
benseb at nic.cerf.net
Fri Mar 8 11:13:29 AEST 1991
This is in reference to a previous article I posted. It seems I have
put the Nag libraries in a bad light by choosing the incorrect
subroutine for comparsion. First the IMSL people get on my case and
now I've got the NAG people irked at me %-)! What large software
organization can I offend in my next post?!! Seriously, this is only one
set of results for one particular class of problem on one particular
machine. Like most benchmarks, it's probably meaningless to your
particular problem. People at both organizations have been very
helpful in advising me. Ask them about your particular problem and
you might be surprised at the response.
I have obtained some interesting preliminary results and have
some ***MORE**** retractions to make.
--First as far as I can tell NAG does not use BLAS in any form.
-- Examining the loadmaps from the code
--that ran these tests reveals that the NAG routines are largely
--self-contained, the only calls they make are to error handling and
--machine constant routines.
- I was using the wrong NAG routine. Some one from NAG kindly
corrected my mistake. I was originally going to use F03AFF
(recommended in the documentation) but it was doing more than the
other routines (i.e. computing the determinant to high accuracy). So
I looked for something that was simpler, however in this case it
turned out to be the wrong thing to do.
- F03AFF does use BLAS level 2, SGEMV and STRSV from libsci.
-- IMSL uses BLAS level 1 calls from the
--system libraries and has it's own version of some BLAS level 2
--routines ( SGEMV in this example). These times are determined by
--querying the hardware performance monitor before and after the
--subroutine call. The test matrices in this case were the best possible
--case i.e:
--
-- cond(A) ~= 1
-- A(i,i) > A(i,j) i != j.
--
--Each routine returned results accurate to machine precision.
--More difficult cases will be included in the final version.
--
--
--SGEFA - CRI libsci optimized version of linpack routines
--FO1BTF - Nag Mark 13 ( References an algorithm by Croz,Nugent,Reid & Taylor )
--LFTRG - IMSLmath version 10.0 ( Uses linpack Algorithm )
--GENERIC - fortran linpack complied with vector optimization on.
--
--All units are in Mflops/second. A = A(size,size)
--
--
--Size 101 203 407 815
--
--SGEFA 99.955 131.174 148.675 158.382
--
--FO1BTF 77.289 105.933 131.063 146.328
--
--LFTRG 72.544 156.559 218.848 257.777
--
--
--The next set of results is from forcing IMSL to use the libsci
--version of SGEMV.
--
--
--Size 101 203 407 815
--
--SGEFA 97.777 130.377 149.025 157.939
--
--FO1BTF 72.429 108.292 132.440 147.396
--
--LFTRG 105.384 213.625 255.089 289.730
--
--This result is from a run using generic fortran BLAS and Linpack routines
--from the slatec libraries.
--
--Size 101 203 407 815
--
--GENERIC 35.94 64.359 96.345 136.265
--
--
--This set of results is from using BLAS level 1 from bcslib and SGEFA
--from bcslib
--
--Size 101 203 407 815
--
--BcsSGEFA 175.777 238.377 274.025 292.939
--
*****NEW******
F03AFF 128.968 189.378 238.028 277.606
*****NEW******
--
--LFTRG 139.384 218.625 269.089 289.730
--
--
--
--- The mflops rates are all from a running on 1 cpu of an 8 cpu YMP in
--multi-user mode (UNICOS 5.1) i.e. around 0% idle time.I would say that
--the results have a repeatablity of around 5% with results from the
--small sizes being more repeatable. Due to the way the YMP memory is
--organized, memory fetchs are a function of system load and the larger
--problems are more affected by this.
--
---Conclusions:
--
--1. It pays to read the loadmap, the only difference between run 1 and
--run2 was in the load command.
--
-- 1: segldr -limslmath,nag *.o
-- 2: segldr -lsci,imslmath,nag *.o
--
--2. These are only best case results. I wanted to find out the the
--fastest possible speed for these routines. The routines in question
--are the simplest possible, in a real problem you would probably want
--to use the more sophisticated versions and do some checking on the
--condition number before you believe the results.
--
--3. Imsl is alot faster than I would have expected, I thought the
--speeds for the SGEFA would be consistently faster that either IMSL or
--NAG. 290 Mflops is as fast as any code I've run on a single processor,
--330 is the speed you're guaranteed never to exceed. The algorithm
--quoted in the Nag reference manual is one designed for pageing
--machines, I don't know how much they massaged it for the YMP. All of
--these numbers do reflect some effort at machine optimization ( compare
--with generic ).
F03AFF does Crout LU decomposition and is obviously a far better choice
than the original subroutine that I used. The documentation mentions
something about ``higher precision '' used for inner products. This makes
it somewhat of an ``apples & oranges'' comparsion. Perhaps the difference
will be noticable when I get the ``bad case'' version running.
--
--4. Subroutine calls are expensive, the large difference between the
--generic version and the libsci version is can in part be explained by
--increased number of subroutine calls. The libsci versions of both SGEMV
--and SGEFA have had almost all of their subroutine calls inlined. As
--the size of the problem becomes larger the generic version approaches
--the optimized version because the subroutine overhead is roughly
--linear in the problem size while the number of required flops is
--cubic. This also explains the large difference between imsl with and without
--the libsci SGEMV for small problems.
--
--
-It's hard to draw conclusions on speed with the routines doing
somewhat different tasks. One result that appears from the f03aff data
is that Level 2 blas does not provide you with any speed advantage
until you reach a certain minimum size. One thing to note is that the
``unsophisticated'' user (i.e. one that doesn't read loadmaps ) %-)
would not see the advantages in speed that using libsci BLAS provides
IMSL. Whether we messed up in our installation is another question
entirely.
- Booker C. Bense
prefered: benseb at grumpy.sdsc.edu "I think it's GOOD that everyone
NeXT Mail: benseb at next.sdsc.edu becomes food " - Hobbes
More information about the Comp.unix.cray
mailing list