Basic Linear Algebra Subroutines (BLAS)

Sat Jul 14 10:05:56 AEST 1990

In article <90Jul13.100737edt.8304 at ephemeral.ai.toronto.edu>,
tff at na.toronto.edu (Tom Fairgrieve) writes:
> From: tff at na.toronto.edu (Tom Fairgrieve)
> Subject: Basic Linear Algebra Subroutines (BLAS)
> Date: 13 Jul 90 14:08:02 GMT
> Organization: Department of Computer Science, University of Toronto
> 
> Does SGI have an optimized version of the BLAS (Basic Linear Algebra 
> Subroutines) available for the 4d/240?  If so, how does the performance
> of this version compare to a version produced by the f77 compiler with
> -O3 optimization level set?  I'm interested in all 3 levels of the BLAS.
> 
> Thanks for any information,
>   Tom Fairgrieve
>   tff at na.utoronto.ca

As far as I know SGI does not have an official version of BLAS3,
I may be wrong.

However I have optimized/parallelized a Fortran version of
the matrix multiplication routines of  Blas3 

I get pretty good results on a 220-GTX :

dgemm 5-11 Mflops
zgemm 10-14 Mflops
sgemm 10-16 Mflops
cgemm 12-17 Mflops

the lowest performances are for  A * trans(B), the highest for trans(A) * B

I am sure it can be improved and I do not warranty it is bug free.