Summary for Protection in Cray
John D. McCalpin
mccalpin at perelandra.cms.udel.edu
Fri Jan 11 10:55:03 AEST 1991
> On 10 Jan 91 23:07:15 GMT,chiueh at sprite.Berkeley.EDU (Tzi-cker Chiueh) said:
chiueh> chiueh at sprite.Berkeley.EDU (Tzi-cker Chiueh) writes:
> So why does Cray get rid of virtual memory altogether ? Or does anybody
> know how much performance improvement can we gain from getting rid of VM
chiueh> I suggest you see the IEEE proceedings from the Supercomputing
chiueh> conference that was held last month in New York. Cray
chiueh> published an article in these proceedings that describes their
chiueh> memory architecture and gives clock timings for current and
chiueh> future memory architectures.
chiueh> Summary of what follows: - Memory speed is THE supercomputing
chiueh> bottleneck. - Cray can fetch from memory in 17 cycles.
chiueh> Demand paging would lengthen this time significantly. -
chiueh> Virtual memory trades speed for money. Supercomputers do not
chiueh> compromize on speed. - Cray Y-MP/8s have 4 gigabyte per
chiueh> second memory bandwidths. - Supercomputing working sets and
chiueh> problems sizes tend to be equal. - Demand paging would
chiueh> complicate an already very complicated instruction scheduler.
chiueh> Memory speed is THE bottleneck in supercomputing. It is was
chiueh> makes Cray king of the hill. The Japanese have faster peak
chiueh> CPU speeds, but their memory bandwidths are inferior. This is
chiueh> a key reason why Cray machines are the fastest computers
chiueh> available for most production benchmarks (with notable
chiueh> exceptions.)
chiueh> The number of cycles needed to transfer the first word from
chiueh> memory to a register is one of the most critical timings in
chiueh> the supercomputer. Cray can do this in 17 cycles. An SX3
chiueh> requires 70 cycles. An ETA 10 needed hundreds of cycles.
chiueh> Adding demand paging will significantly lengthen this cycle
chiueh> time. If you can add demand paging without adding cycles to
chiueh> this memory fetch time, then I am sure Cray will make you a
chiueh> rich person.
chiueh> Supercomputers with virtual memories have been tried. The CDC 205 and the
chiueh> ETA10 are examples. When these machines ran codes where the problem size
chiueh> exceed the RAM size (paging), they ran 10 time slower than when paging did
chiueh> not occur.
chiueh> Virtual memory is a technique of trading time for money. Virtual memory
chiueh> costs less than real memory, but is slower. Slower memory is not an
chiueh> option for supercomputing. Witness the success of Cray and the demise of
chiueh> ETA.
chiueh> The Cray achieves two words read and one word written per clock per CPU.
chiueh> On a Y-MP/8 this is a memory bandwidth of 4 gigabytes per second. Disks
chiueh> bandwidths are not adequate to keep up with this type of demand.
chiueh> The theory of virtual memory depends on the working set being smaller than
chiueh> the problem size. In most supercomputer applications working set is the
chiueh> problem size. I am sure the architecture of these applications was
chiueh> influenced by programming for real-memory machines, so this is somewhat of
chiueh> a circular argument. However, for the status quo, this is true.
chiueh> Cray's are vector machines with extremely sophisticated instruction
chiueh> schedulers. The Cray often has server instructions issued at once in the
chiueh> same CPU. X-MPs and Y-MPs scoreboard conflicts between instructions and
chiueh> are able to compensate for bank and section memory delays. These delays
chiueh> tend to be for one to four cycles. The instruction scheduler architecture
chiueh> would be even more difficult if it had to account for page-fault delays of
chiueh> many thousands of cycles. An approach to this problem would be to require
chiueh> the compilers to never allow a vector sub-section to cross a page
chiueh> boundary.
chiueh> -- Kent
chiueh> --------------------------------------------------------------------------------
chiueh> Saw your information request about Crays, and thought that I might be
chiueh> able to point you towards some useful information:
chiueh> I suggest that you check up on Control Data's Cyber 180-series
chiueh> (currently Cyber 2000-series) machines - they are a full hardware
chiueh> Multics implementation, and have some truly "unique" virtual memory
chiueh> hardware. I can personally vouch that the address translation
chiueh> hardware, which also is doing access control checking, is VERY fast,
chiueh> and it has several extra levels of indirectness more than most
chiueh> other folks' virtual memory architectures. Cyber 180 is such a
chiueh> complete Multics that there is actually NO REAL MEMORY ADDRESSING
chiueh> MODE. It is NOT POSSIBLE to access memory by real memory address, the
chiueh> hardware doesn't have the capability!
chiueh> It is also interesting that when a Cyber 180 is emulating Cyber 170
chiueh> mode, it ALSO has base/limit register hardware in operation, since the
chiueh> 170 architecture is real-memory, and only has base/limit restrictions.
chiueh> When a Cyber 180 is running in 170 mode, it really is running a
chiueh> virtual real-memory machine on its virtual memory hardware (just
chiueh> saying this makes my mind feel like a pretzel).
chiueh> If nothing else, the CDC stuff should make interesting counter-culture
chiueh> reading material for you. It was/is truly different.
chiueh> I also suspect that in the Crays (although I have never read the
chiueh> hardware prints of a Cray, only the CDC machines), the bounds checking
chiueh> is being done on the VIRTUAL address, as it were, not the real memory
chiueh> address. This method allowed the old CDC machines (the ones Seymour
chiueh> Cray designed) to do their access checking in the CPU, not the memory
chiueh> controller, and thus kill of the references earlier in the
chiueh> instruction.
chiueh>
chiueh> -- Gregory
chiueh> ----------------------------------------------------------------------------
> Furthermore, this check is done for EVERY reference.
>If this is indeed the case, this protection check process should be as
>expensive as address mapping in machines that have VM.
chiueh> Why do you assume this? Given that the latency of Cray memory is 4
chiueh> cycles or so, the check can be done after the address is sent off to
chiueh> memory and can generate a fault before the data gets back.
>So why does Cray get rid of virtual memory altogether ?
chiueh> Well, many supercomputer applications can't page and have to swap. In
chiueh> that case, why provide VM?
chiueh> -- greg
chiueh> In article <1990Dec19.181343.10365 at agate.berkeley.edu> you write:
> The kind of protection I have in mind is access right control (e.g., read-only)
> "Normal virtual memory systems" perform this kind of protection check while
> doing logical-physical address mapping. The protection bits are either in page
> tables or TLB. Now, since Cray doesn't have virtual memory, the question is
> does it provide access control, if so, where does it put this check ?
chiueh> The Cray does not provide extensive access control. For each running program
chiueh> a (consecutive) part of actual memory is mapped to the logical address space
chiueh> of the program (which starts at 0). With each reference the logical address
chiueh> is compared to the logical bounds register, and the base register is added
chiueh> to it before going to memory.
> From the previous responses, it seemed that Cray only provides out-of-bound
> protection check. Furthermore, this check is done for EVERY reference.
> If this is indeed the case, this protection check process should be as
> expensive as address mapping in machines that have VM.
chiueh> Clearly this is much less expensive than true VM; only two registers are needed
chiueh> to do everything (address translation and bound checking), and those two
chiueh> registers reside directly in the CPU.
> So why does Cray get rid of virtual memory altogether ? Or does anybody
> know how much performance improvement can we gain from getting rid of VM ?
chiueh> This is much less expensive because check and translation go on in parallel
chiueh> within a single clock cycle.
chiueh> -- dik
--
John D. McCalpin mccalpin at perelandra.cms.udel.edu
Assistant Professor mccalpin at brahms.udel.edu
College of Marine Studies, U. Del. J.MCCALPIN/OMNET
More information about the Comp.unix.cray
mailing list