Basic Linear Algebra Subroutines (BLAS)
Jean-Pierre Panziera
jpp at pipo.corp.sgi.com
Sat Jul 14 10:05:56 AEST 1990
In article <90Jul13.100737edt.8304 at ephemeral.ai.toronto.edu>,
tff at na.toronto.edu (Tom Fairgrieve) writes:
> From: tff at na.toronto.edu (Tom Fairgrieve)
> Subject: Basic Linear Algebra Subroutines (BLAS)
> Date: 13 Jul 90 14:08:02 GMT
> Organization: Department of Computer Science, University of Toronto
>
> Does SGI have an optimized version of the BLAS (Basic Linear Algebra
> Subroutines) available for the 4d/240? If so, how does the performance
> of this version compare to a version produced by the f77 compiler with
> -O3 optimization level set? I'm interested in all 3 levels of the BLAS.
>
> Thanks for any information,
> Tom Fairgrieve
> tff at na.utoronto.ca
As far as I know SGI does not have an official version of BLAS3,
I may be wrong.
However I have optimized/parallelized a Fortran version of
the matrix multiplication routines of Blas3
I get pretty good results on a 220-GTX :
dgemm 5-11 Mflops
zgemm 10-14 Mflops
sgemm 10-16 Mflops
cgemm 12-17 Mflops
the lowest performances are for A * trans(B), the highest for trans(A) * B
I am sure it can be improved and I do not warranty it is bug free.
More information about the Comp.sys.sgi
mailing list