Optimized blas1,2,3 available anonymous ftp
Olivier Schreiber
schreiber at schreiber.asd.sgi.com
Sat Mar 2 06:23:53 AEST 1991
% ftp sgi.com
or
% ftp 192.48.153.1
Connected to 192.48.153.1.
220 SGI.COM FTP server (version 5.60 IRIX 02/25/91 16:25) ready.
Name (192.48.153.1:guest): anonymous
331 Guest login ok, type your name as password.
Password:
230 Guest login ok, access restrictions apply.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd /pub/lib/libblas
ftp> pwd
257 "/pub/lib/libblas" is current directory.
ftp> ls
200 PORT command successful.
150 Opening ASCII mode data connection for '/bin/ls'.
total 1595
-rw-r--r-- 1 ftp guest 2735 Mar 1 11:04 README
-rw-r--r-- 1 ftp guest 813228 Feb 26 10:20 libblas.a.Z
226 Transfer complete.
BLAS : Basic Linear Algebra Subroutine is the library used as a toolkit
for the LAPACK project.
LAPACK : "Linear Algebra Package" is a project originated by Jack Dongarra
from Oak Ridge National Lab.
This project supported by National Science Fondation (NSF) will put
together a new set of linear algebra functions, supposed to supplant
both LINPACK and EISPACK packages.
To achieve maximum efficiency across all types of hardware, the
LAPACK routines are based on matrix-matrix BLAS 3 routines(e.g. DGEMM).
This implementation , is much more performant than anything based on
vector-vector BLAS 1 routines(e.g. DAXPY), or even matrix-vector BLAS 2
routines (e.g. DGEMV).
release time for LAPACK : April 1991
------------------------------------------------------------------------------
WARNING --- This current version is an "alpha-version" !
- The real (prefixe S) and double precision (prefixe D) of BLAS2
and BLAS3 had been hand optimized-parallelized (in Fortran).
- The only complex routines hand optimized/parallelized are CGEMM and
ZGEMM (in Fortran).
- The BLAS1 routines are not parallelized, the most important are hand-
coded in Assembly language.
- Although these routines had been intensively tested, it is possible
that a few bugs are left.
------------------------------------------------------------------------------
to load on your machine :
uncompress libblas.a
------------------------------------------------------------------------------
Example of performance :
dgemm double precision > 60 Mflops on a 4D/380
------------------------------------------------------------------------------
known problem:
Performance may be reduced when arrays are perfectly aligned to cache-size
boundaries. This may happen when "leading" dimensions are powers of two.
For example, it is better to declare a matrice (1025,1024), rather than
(1024,1024).
------------------------------------------------------------------------------
Send comments/complains/bug reports to :
Jean-Pierre Panziera
Silicon Graphics
fax : (415)962-9601
E-Mail: jpp at corp.sgi.com
--
Olivier Schreiber Technical Marketing schreiber at sgi.com (415)335 7353 MS/7L580
Silicon Graphics Inc., 2011 North Shoreline Blvd. Mountain View, Ca 94039-7311
More information about the Comp.sys.sgi
mailing list