/750 Machine check 2 (again)
dave at edcaad.UUCP
dave at edcaad.UUCP
Fri Aug 26 22:48:00 AEST 1983
Thanks to extensive discussions with our helpful local DEC people, and
several flaky /750s, I can add some details to the work of Peter Col-
linson (ukc!pc), Tucker Withington (vaxine!ptw), and Dennis Ritchie
(research!dmr) on the /750's machine check 2 handler in 4.?BSD:
1. If the TB PARITY ERROR bit in the stored Error Summary Register is
set (mcf->mc5_mcesr&4), and irrespective of the state of the other
bits in this register, recovery may be attempted. We have seen
these errors with bits 0 and 3 set.
2. It appears that the TB must be invalidated, by mtpr(TBIA, 0), as
soon as possible, and in any case before the Error Summary Regis-
ter is cleared by mtpr(MCESR, 0xf).
3. It is NOT always possible to recover from these errors. An
instruction may be resumed if:
a) It has not affected the processor mode. This can be deter-
mined by comparing the processor mode in the machine check
frame with the mode in the interrupt frame. A panic must be
issued if they differ.
b) If the instruction is single-byte, and its op-code has a one
bit in the following table:
0000111101101011 REI,RET,etc.
1111111110111111 JSB
1111111111111111
1111111111111111
1111111111111111
1111111111111111
0000000000101111 EMODF,CVTFD,etc.
0000111100000000 Double Prec. FP
1100000101001010 EMUL,EDIV,etc.
1111111111111111
1111111111111111
1111111111111111
0000001111111111 PUSHR,POPR,etc.
1111111111111111
1111111111111111
1111111111111111
0000000111111111 CALLG,CALLS,etc.
Further, VMS disables the cache if cache errors happen less than 100ms
apart, and disables half the Translation Buffer and uses the other half
if it detects failures less than 100ms apart.
Code to implement all these features for 4.1c BSD has been written;
when it has been tested it will be posted. Unfortunately, testing is a
matter of sitting and waiting for the hardware. In the meantime, here
are the fixes to /usr/sys/vax/machdep.c to improve machine check handl-
ing.
1. The error messages for the different machine check types for the
/750 should read as follows:
char *mc750[] = {
0, "ctrl str par", "cp tbuf", 0,
0, 0, "ucode lost", "bad ird"
};
2. The 750's case in the first switch in machinecheck() should look
like:
#if VAX750
case VAX_750:
printf("%s fault\n", mc750[type&0x7]);
break;
#endif
3. The /750's case in the second switch in machinecheck() should be:
#if VAX750
case VAX_750: {
register struct mc750frame *mcf = (struct mc750frame *)cmcf;
mtpr(TBIA, 0); /* Assume bad - ala VMS */
printf("\tva %x errpc %x mdr %x smr %x rdtimo %x tbgpar %x cacherr %x\n",
mcf->mc5_va, mcf->mc5_errpc, mcf->mc5_mdr, mcf->mc5_svmode,
mcf->mc5_rdtimo, mcf->mc5_tbgpar, mcf->mc5_cacherr);
printf("\tbuserr %x mcesr %x pc %x psl %x mcsr %x\n",
mcf->mc5_buserr, mcf->mc5_mcesr, mcf->mc5_pc, mcf->mc5_psl,
mfpr(MCSR));
mtpr(MCESR, 0xf);
if ((type&0xf)==MC750_TBPAR
&& (mcf->mc5_mcesr&0x4)
&& ResumeableInstr(mcf)) {
printf("tbuf par!?!: flushing and returning\n");
return;
}
break;
}
#endif
4. The following routine should be added to machdep.c
#if VAX750
static u_short InstrBitMap[] = {
0x0f6b, 0xffbf, 0xffff, 0xffff,
0xffff, 0x002f, 0x0f00, 0xc18a,
0xffff, 0xffff, 0xffff, 0x03ff,
0xffff, 0xffff, 0xffff, 0x01ff
};
static int
ResumeableInstr(mcf)
register struct mc750frame *mcf;
{
register u_int OpCode;
register u_int ret;
/*
* If instruction changed mode cannot resume
* (this part untested)
*/
if ((mcf->mc5_svmode)&03 != (mcf->mc5_psl&PSL_CURMOD)>>24) {
printf("CP mode changed\n");
return (0);
}
/*
* VMS has the process mapped in to the system's
* address space. Don't think UNIX does.
* (this part tested)
*/
OpCode = ( mcf->mc5_errpc&0x80000000 ?
*((char *) mcf->mc5_errpc) : fubyte(mcf->mc5_errpc) );
ret = ((InstrBitMap[(OpCode&0xf0)>>4])>>(OpCode&0xf))&1;
printf("Instruction %x %s resumable\n", OpCode, (ret ? "" : "not"));
return (ret);
}
#endif VAX750
David Rosenthal {vax135|mcvax}!edcaad!dave
More information about the Comp.unix.wizards
mailing list