Optimized blas1,2,3 available anonymous ftp

Sat Mar 2 06:23:53 AEST 1991

% ftp sgi.com 
or
% ftp 192.48.153.1 
Connected to 192.48.153.1.
220 SGI.COM FTP server (version 5.60 IRIX 02/25/91 16:25) ready.
Name (192.48.153.1:guest): anonymous
331 Guest login ok, type your name as password.
Password:
230 Guest login ok, access restrictions apply.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd /pub/lib/libblas
ftp> pwd
257 "/pub/lib/libblas" is current directory.
ftp> ls
200 PORT command successful.
150 Opening ASCII mode data connection for '/bin/ls'.
total 1595
-rw-r--r--   1 ftp      guest       2735 Mar  1 11:04 README
-rw-r--r--   1 ftp      guest     813228 Feb 26 10:20 libblas.a.Z
226 Transfer complete.

BLAS : Basic Linear Algebra Subroutine   is the library used as a toolkit
	for the LAPACK project.

LAPACK : "Linear Algebra Package" is a project originated by Jack Dongarra 
	from Oak Ridge National Lab.
	  This project supported by National Science Fondation (NSF) will put
	together a new set of linear algebra functions, supposed to supplant
	both LINPACK and EISPACK packages. 
	  To achieve maximum efficiency across all types of hardware, the 
	LAPACK routines are based on matrix-matrix BLAS 3 routines(e.g. DGEMM).
	This implementation , is much more performant than anything based on
	vector-vector BLAS 1 routines(e.g. DAXPY), or even matrix-vector BLAS 2
	routines (e.g. DGEMV).

	release time for LAPACK : April 1991

------------------------------------------------------------------------------
	WARNING --- This current version is an "alpha-version" !

	- The real (prefixe S) and double precision (prefixe D) of BLAS2
	and BLAS3 had been hand optimized-parallelized (in Fortran).

	- The only complex routines hand optimized/parallelized are CGEMM and
	ZGEMM (in Fortran).

	- The BLAS1 routines are not parallelized, the most important are hand-
	coded in Assembly language. 

	- Although these routines had been intensively tested, it is possible 
	that a few bugs are left.
------------------------------------------------------------------------------
to load on your machine :

uncompress libblas.a
------------------------------------------------------------------------------
Example of performance :

        dgemm           double precision  >   60 Mflops on a 4D/380
------------------------------------------------------------------------------
known problem:

Performance may be reduced when arrays are perfectly aligned to cache-size
boundaries. This may happen when "leading" dimensions are powers of two.
For example, it is better to declare a matrice (1025,1024), rather than
(1024,1024).
------------------------------------------------------------------------------
Send comments/complains/bug reports to :

	Jean-Pierre Panziera
	Silicon Graphics
	fax   : (415)962-9601
	E-Mail: jpp at corp.sgi.com

--

Olivier Schreiber  Technical Marketing schreiber at sgi.com (415)335 7353 MS/7L580
Silicon Graphics Inc.,  2011 North Shoreline Blvd. Mountain View, Ca 94039-7311