VAXclusters and UN*X

Wed Jun 5 08:28:58 AEST 1985

Following is a very brief tutorial on VAXclusters, and how they relate
to unix*:

A cluster is defined by a set of proprietary protocols for implementing
a loosely-coupled multi-processing system.  Two of the key protocols
are System Communication Services (SCS), software which defines and coordinates
members of the cluster; and the Distributed Lock Manager, which allows
locks to be shared between processors.

These protocols are entirely software-based there are no hardware
dependencies in them except at the lowest levels.  Also, the
protocols are such that control is distributed dynamically between
members of the cluster; in fact, there is no such thing as a
"cluster controller" (the HSC50 is logically a peer of the VAX
processors).

The HSC50 is a high-speed IO server.  It services requests for
logical disk blocks.  It does not know anything about file structure:
this is imposed by the VAX processors via the MSCP protocol.
The HSC50 performs various sorts of optimizations (similar
to those done by the FFS) and has a peak transfer rate of nearly
4MB/sec.

The RA-series disks are not dynamically dual-ported.  Dual-porting
was implemented in RA disks for the purpose of allowing the disk
to be accessed by a secondary controller in the event the primary
fails.  In a cluster, a typical configuration would be a disk
dual-ported between either 2 HSC50s or an HSC50 and a UDA50.
Only one path will be active: in the event of the HSC50 failing,
the alternate path will be dynamically failed-over to.

DECnet is totally unrelated to clusters.  It is possible to run
DECnet over a CI bus (using SCS), but a cluster can
run fine without a byte of DECnet code (it IS extremely useful for
system management, however).

Allowing a unix (or any other) system to participate in a cluster would
require implementing at a minimun SCS, the connection manager (software
which decides when to form, change, and dissolve clusters), the
distributed lock manager, and MSCP.  This is a large amount of
code, much of it embedded in VMS (and therefore subject to VMS
licensing restrictions), and porting it would be a major
undertaking.  A major re-write of the file system would be necessary,
and adopting some sort of standard for file locking would be
highly recommended.

All the above work would just give you a distributed file system.
If you wanted distributed job and device queues, you would
have to implement the Distributed Job Controller as well.  Given
the VMS-ish flavor of this protocol, this task might be distasteful,
not to mention non-standard.

In conclusion, the bottom line shakes out as follows:

	o  "cluster" of homogeneous UNIX systems with distributed
	   file system only:  technically feasible but a lot of
	   work (>> 1 man-year).

	o  the above with distributed queues: more work, problems
	   with maintaining a standard version of unix

Regards,

				--- Paul Jensen
				    Digital Equipment Corporation

------------------------------------------------------------------------
Disclaimer:  All information in this response is drawn from public
	     sources.  All opinions expressed are solely my own.
	     In particular, I haven't the faintest idea of the
	     future or current plans of either Ultrix or VMS
	     engineering.

*unix is a trademark of AT&T.