problems with BPT/TRACE traps
rws at mit-bold.ARPA
rws at mit-bold.ARPA
Sat Apr 28 22:06:35 AEST 1984
From: "Robert W. Scheifler" <rws at mit-bold.ARPA>
Description:
We have been developing an in-process debugger, using the VAX
BPT instruction and the trace bit mechanism, with the signals
handled in the same (user) process. This works fine by itself.
We are also using keyboard generated SIGQUITs to interrupt the
program and get the debugger's attention, so the user can poke
around and then ultimately continue execution. This also works
fine by itself.
However, combining the two mechanisms got us into trouble. The
basic problem is that SIGTRAP needs to be handled synchronously,
but SIGQUIT (and others) can preempt it. Signals are not handled
on a first-come-first server basis, but on a "find-first-set" basis,
which means lowest signal number first.
So the scenario is this. A BPT trap takes place, and a
psignal(SIGTRAP) takes place in trap(). Just then the user
types the quit character, and you eventually get to ttyinput(),
which does a gsignal(SIGQUIT). Now we continue on inside
trap(), doing "if (ISSIG(p)) psig()", and the signal chosen
is SIGQUIT, surprise. So we hack the stack for SIGQUIT,
and go off to the first instruction of the signal trampoline code.
However, the psignal() back when did an aston(), so at this point
we take the AST, and we are back in trap() doing another
"if (ISSIG(p)) psig()", and so we hack the stack for SIGTRAP,
only now, lo and behold, the PC is no longer at the BPT instruction,
but at the start of the signal trampoline code instead,
which is mighty confusing.
But, you say, the solution is of course to mask out SIGTRAP inside
of SIGQUIT. But, I say, there are two problems with this. The
first, which I can live with, is that then the SIGQUIT handler
can't be debugged. The bigger problem is that it still doesn't work.
There is an "extraneous" REI in the signal trampoline code (that I
have complained about before for a different reason). This REI is
executed on the way out of a handler, and is a one instruction
bridge back to user code that gets executed WITHOUT the signal mask
defined by the handler. So even if you mask SIGTRAP inside SIGQUIT,
you simply change the PC at the time of the SIGTRAP to be at the
REI rather than the CALLS in the trampoline code.
Our solution to this problem was to notice that, if the SIGTRAP
handler does nothing, the BPT instruction will be executed again
and we will get another trap. So, we don't mask SIGTRAP inside
SIGQUIT, and in the SIGTRAP handler we check the PC, and if it's
in the trampoline code, we just return and let the BPT execute
again.
Having taken a BPT, we need to reinstall the actual instruction,
execute it using the T-bit, and then reinsert the BPT instruction.
Once again, the PC you get in the SIGTRAP handler can be bogus.
Just returning won't work, however, because the T-bit has been cleared
and you won't get another trap. Fortunately, the debugger can know it
is expecting a T-bit trap, and can save away the correct PC, and ignore
the PC reported by the kernel.
Actually, as it turns out, you CAN get multiple SIGTRAPs from setting
the T-bit. I don't think this was intended. Fix is provided below.
Repeat-By:
See above.
Fix:
In trap(), in trap.c, change
case T_TRCTRAP+USER: /* trace trap */
locr0[PS] &= PSL_T;
to
case T_TRCTRAP+USER: /* trace trap */
locr0[PS] &= ~(PSL_T|PSL_TP);
In sendsig(), in machdep.c, change:
regs[PS] &= ~(PSL_CM|PSL_FPD);
to
regs[PS] &= ~(PSL_CM|PSL_FPD|PSL_T|PSL_TP);
I didn't bother to figure out if both changes are necessary, but
they can't hurt.
More information about the Comp.unix.wizards
mailing list