write_ID: timeout trying to write to VIPER (on QDSS displays)
Chris Torek
chris at mimsy.umd.edu
Wed Nov 28 16:48:19 AEST 1990
In article <1990Nov25.202720.11199 at watcgl.waterloo.edu>
idallen at watcgl.waterloo.edu (Ian! D. Allen [CGL]) writes:
>Using MIT's R4 xdvi on our MIT Xqdss colour VAXes often hangs the display.
The QDSS is a horrible thing.
(The Ultrix 1.2 QDSS driver is even worse. The story I heard was something
like `VMS programmer gets first Ultrix assignment: write kernel driver for
QDSS'.)
It is not possible for X11 and the kernel to stay completely in sync
always, but as long as something has console output captured (so that
console writes go to some pty rather than directly to the display) this
should not be a problem.
Here is what I wrote for my own documentation when I rewrote the QDSS
driver for my own purposes. This is not a solution, but might give people
some insights (and will tell you why I say the QDSS is a horrible thing.)
It was intended eventually to be standalone documentation, but at the
moment really needs the VCB02 hardware manual as a companion.
Incidentally, there is a section in the VCB02 manual that says `do not
do XYZ as it can short out the drivers in the vipers'. The Ultrix
driver did (and probably still does) XYZ. (Mine does not.)
------------------------------------------------------------------------
Before we begin, here is a short description of the hardware.
(Well, okay, so it is a long description. The hardware is very
complextificated.)
The QDSS (VCB02 or `Dragon') is composed of a bunch of
special-purpose chips. The simplest (from our point of view) are the
so-called `vipers', or video processors. There is one viper per memory
plane, with a maximum of 8. Say we have a four-plane system. Any
point on the screen has a `color value' between 0 and 15 inclusive.
This value is a composite of planes 0, 1, 2, and 3 as bits 0,1,2,3
respectively. Fiddling with plane 0 changes the low-order bit of the
color value. (The color value goes through a lookup table to produce
red, green, and blue intensities as in many conventional display
systems; like those systems, only the green intensity is used on
grey-scale monitors.)
Each viper can only talk directly to its own plane. This creates an
interesting problem: how to communicate between vipers, e.g., to make
all pixels that were odd have the value 15 (copy plane 0 to all other
planes). This is accomplished through the `I/D' bus. The I/D bus is
only 8 bits wide, but each `cycle' is actually made of two bus cycles
to make it look 16 bits wide. It operates in pairs of these pairs,
alternating instructions and data (hence the name). The instructions
are piddly little things that can affect only 16 bits of bitmap memory
or viper register at a time, but a tremendous number of instructions
run each second, so things move reasonably fast. Moreover, all vipers
operate in parallel, so each instruction can diddle up to 8 16-bit
words at a time.
Often you do not want all the vipers to do the same thing. For
instance, often one viper should write bits from its plane onto the I/D
bus for others to see, but the others should not overwrite those bits
with their own. To disable some vipers, there is a separate chip
select register. When it is set to 0xff all 8 planes are enabled and
will work together. Setting it to 0x01 disables all but plane 0.
Typically a rasterop will have all planes enabled; the select register
is mainly used to set various viper registers differently, but
rasterops do honor the chip select, and have no effect on planes that
are not selected. There is a separate chip select used for scrolling
(to be described later), since it is a sort of `background' operation
and should not disturb normal rasterops.
The viper chips were designed with some flexibility in mind, and not
all of their `features' are used in the QDSS. In addition, there
appear to be some bits left over from earlier revisions. In a few
cases we just have to ignore some oddity and press on.
The (up to 8) vipers are all directed by a chip called the `adder'
(yes, this Dragon is full of snakes). The adder (`address processor')
mostly computes rasterops. It does a lot of other stuff too, but we
cannot make much use of that since it has some peculiar limitations.
So the adder does rasterops, telling the vipers when to do their things
by counting scan lines and pixels and providing the right instructions
at the right times. The adder is also responsible for moving data from
the I/D bus to or from the CPU (polled or DMA, but see below) so that
we can get bits into and out of the consarned thing in the first
place. In any case, a plane's memory can only be reached through its
viper; the adder and vipers have to cooperate, and the CPU has to tell
them how.
For the most part, `how' is determined by the contents of various
registers in each viper. A `logical function' register controls the
combination of a `source' and a `destination' to produce a new bit of a
new color; the new bit is written, or the old left alone, depending on
two mask values that are ANDed together. (The adder supplies a third
mask that is all ones in the center of the rasterop, and has zeroes as
needed along the horizontal fringes to keep the rasterop in bounds.)
The `destination' value is always whatever was on the screen before,
but the `source' value and the contents of both mask registers are
under control of still more viper registers. (Now how much would you
pay? But wait, there's more!...) The bits derived from the logical
function select either a foreground or background color. That is:
/* approximation of viper rasterop algorithm */
old_data = *screen_location; /* 16 bits */
switch (control_register) {
case discard: break;
case to_source: source_register = old_data;
case to_m1m2: mask1_register = mask2_register = old_data;
case to_m2: mask2_register = old_data;
}
v = apply_lf(logical_function_register, old_data, source_register);
new_data = (v & foreground_register) | (~v & background_register);
mask = mask1_register & mask2_register & rasterop_mask_from_adder;
*screen_location = (new_data & mask) | (old_data & ~mask);
Remember, though, that each plane gives only one bit of a displayed
color, so typically the fg and bg registers of all vipers are all 1s
and all 0s, and the `color' is determined by the result v of the
logical function. In essence, a zero in the fg and bg registers has
the effect of *suppressing* the result at that point; a one in both has
the effect of *setting* it; and a zero in fg and a one in bg *inverts*
it. (Confused yet? *I* was.) To write a solid color c, one sets the
source register to all 1s, the control register to `discard', and the
various vipers' fg registers to all-1s or all-0s according to the value
c (e.g., color 5 has vipers 0 and 3 all-1s, and 1 and 2 all-0s) (there
is an easy way to do this, called `Z-axis' register setting, described
below). But this is *still* not the whole story: each LF
register---there are four available---also contains bits that control
whether the source, mask 1, and mask 2 are complemented, and whether
something called `resolution mode' is applied. Resolution mode is only
for displays with fewer than 1024x864 pixels, so we shall ignore it,
and complementing the source can be done directly with the function, so
we shall ignore that too. (Perhaps the source complement mode is
useful with resolution mode. It seems completely useless otherwise.)
Before we can correct the approximation above, though, we need to know
a bit more about the way the hardware does rasterops.
A rasterop specifies 0, 1 or 2 sources and 1 destination. (A
rasterop without a destination is pointless.) With only a destination,
the rasterop simply combines the source register with the bits at the
destination according to the given logical function. With one or two
sources, we get one or two memory read operations before the r/m/w that
updates the plane's screen memory. Those operations are dumped through
two control registers. These are chosen from one of two `banks' of
operand control registers. In addition to the disposition of the
screen data, the control register tells whether to read the I/D bus,
and whether to write it. The actual algorithm, then, is this:
/* viper registers */
static short ctl[2][4]; /* ctl[x][3] present but unused */
static short lf[4]; /* logical function registers */
static short src, m1, m2; /* source, mask1, mask2 regs */
static short fg, bg; /* fore- and background regs */
/* there are 0, 1, or 2 src_cycles (but can have #2 without #1) */
/* these parameters come from adder */
/* there is also a shift constant, which I have simplified away */
void rop_src_cycle(bank, mem, which)
int bank; /* bank 0 or 1 */
short *mem; /* bitmap memory address */
int which; /* source 1 or 2 */
{
short c = ctl[bank][which - 1], id, md;
/*
* All these operations occur in parallel. Presumably,
* if c&SEND, id will be the same as md, but while something
* like this is mentioned in passing in the manual, I would
* not count on it myself (it depends on the timing in
* the viper).
*/
md = *mem; /* shifted left or right if necessary */
if (c & SEND) *ID_BUS = md;
id = *ID_BUS;
/*
* These may occur in parallel too (i.e., do not route
* remote and local data into the same register; it is
* not guaranteed and might even break the hardware):
*/
RD_REGS(c) = id; /* NONE, SRC, M1M2, or M2 */
LD_REGS(c) = md; /* NONE, SRC, M1M2, or M2 again */
if (c & SS) magic(); else slow_to_half_speed();
}
/* again, parameters are from adder */
void rop_rmw_cycle(bank, mem, lfnum, edgemask)
int bank; /* bank 0 or 1 */
short *mem; /* bitmap memory address */
int lfnum; /* logical function 0/1/2/3 */
short edgemask; /* left or right edge mask, or ~0 */
{
short c = ctl[bank][2], id, md, s, v, mask, f;
/*
* The same comments as for rop_src_cycle above apply.
* In addition, it is not obvious that all of this really
* works in the hardware (the r/m/w timing is tighter).
* But RD_SRC does work for PTB X mode, despite the
* manual's claim that ctl[bank][2] ``may be unnecessary''
* since ``there may be no reason to program either
* destination CSR to other than 000000''.
*/
md = *mem;
if (c & SEND) *ID_BUS = md;
id = *ID_BUS;
RD_REGS(c) = id;
LD_REGS(c) = md;
if (c & SS) bad_stuff_happens_I_guess();
f = lf[lfnum];
s = f & LF_NOTNOTSRC ? src : ~src;
if ((f & LF_NORES) == 0)
s = smear(s); /* ``resolution mode'' */
mask = (f & LF_NOTM1 ? ~m1 : m1) &
(f & LF_NOTM2 ? ~m2 : m2) & edgemask;
v = apply_lf(LF_MASK(f), md, src);
*mem = (((v & fg) | (~v & bg)) & mask) | (md & ~mask);
}
Using two sources, and directing one of the sources to the mask
register(s) and the other to the source register, we can get the effect
of tiling or stippling, or more generally, writing under a mask. In
particular, for tiling, the adder has a way to specify that source 2
(but not source 1) has a size which is a small power of two; the adder
will feed the vipers a repeating address pattern (thus repeating the
tile apparently-infintely).
This brings us back to rasterops, and in particular, rasterop
`modes'. In addition to the two optional sources, the logical function
register index, and the control register bank index, the rasterop can
be in one of three modes: `normal', where the source and destination
are the same size (but see below); `linear pattern', where source 1
repeats as needed if it is smaller than the destination, and `fill',
for polygon filling, where the source and destination are not used as
rasterops at all. Two more bits are used for fill mode: X or Y fill;
and normal two-edge fill, or baseline fill. Filled polygons are
described below. For regular rasterops (and, presumably, polygons),
there are four more mode bits: hole fill enable (normally on, but off
for single-pixel-wide lines); source 1 index enable; source 2 index
enable; and pen down. Pen down must be set; if it is not, nothing
happens. (Pretty stupid, eh? But apparently REGIS wants it.)
`Indexing' is used to make up for the sins of scrolling. It should be
enabled whenever source 1 and/or the destination are in on-screen
memory and scrolling might be going on (more below).
[N.B.: the manual calls the banks 1 and 2, and the logical function
registers 1, 2, 3, and 4. I have subtracted 1 since things make more
sense that way.]
Before I can describe fill mode, I need to explain something else.
Some clever fellow observed that, if the destination of a rasterop were
defined by an arbitrary pair of vectors, the `rasterop' could draw
solid-color lines in arbitrary directions, or rotate text, or
accomplish all manner of uninteresting things. So, while sources 1 and
2 must be rectangular, the destination is described by a `fast vector'
and a `slow vector'. Bits are read and written along the fast vector
until it runs out, then the adder steps along the slow vector. If the
fast vector points along the X axis, and the slow vector along the Y
axis, we get a normal rectangular rasterop. It also goes much faster:
when the fast vector has no Y component, the adder does its thing 16
bits at a time. (The slow vector can have an X component; this does
not hobble the adder.)
These vectors are defined with origin-x, origin-y and delta-x,
delta-y pairs so as to make it convenient for the adder to use
Bresenham's Algorithm to paint the pixels. This (B's A) can result in
writing some pixels twice, or in skipping some; the hole fill enable is
used to fix up the latter, and the former only matters if the rasterop
uses the destination bits for exclusive-or, or complements them. Note
that holes and doubling cannot occur for normal (x/y axis aligned)
rectangular rasterops. For the most part, we can ignore these
phenomena. Note also that this does not write the last point along
the vector (so we get a half-open interval).
Polygon filling is done by taking over the source 1 and destination
slow vectors. Starting from a point (normally one held in common), the
adder will draw lines along either the X or Y axis until one or both
vectors run out. (Thus, the `fast' vector has dy=0 [x axis] or dx=0 [y
axis] and has its dx or dy depend on the difference between the current
points along each of the two vectors, where that point is scanned in
the direction of that axis.) When a vector runs out, the adder says it
is done, and by reloading one or both vectors and doing the polygon
fill again, one can finish or continue the polygon. Really, this is a
fill-from-line-to-line operation, where the filling is done by drawing
horizontal (X mode) or vertical (Y mode) lines. Optionally, the source
2 vector can be replaced with a horizontal or vertical line; this is
the `baseline fill' mode. (Why it exists at all, when one of the two
edge vectors can be horizontal or vertical anyway, is beyond me.)
Polygon fill can suffer from from doubling, but not from holes.
Polygon fill does write the last point: the lines it draws are over
the *closed* interval that includes the two edge points.
All rasterops and polygon fills use Bresenham's error-accumulation
technique to define which points will be plotted. Two `error adjument'
registers in the adder allow changing the initial error value for the
fast and slow destination vectors (only occasionally useful) or for
the polygon lines. The latter allows shifting the polygon edges by
half a pixel, which *is* often useful.
Scrolling, and the index enable bits, are another clever hack.
Someone noted that since the display has to sweep across the screen
horizontally (as a `fast' vector) and vertically (as a `slow' one)
anyway, it should be possible to read bits from the screen offset from
where they would normally be displayed, and to copy them to their
`correct' position at the same time. The adder contains a set of
scroll registers for controlling this action. The scrolling area is a
rectangle somewhere on the screen (off-screen memory cannot be scrolled
this way since it is not displayed). Bits within that rectangle are
read at some offset. The offset can be any positive value in the Y
direction, but cannot be more than +15/-16 in the X direction. Bits
beyond the offset are replaced with a `scroll fill' value from the
viper's FILL register. A negative Y offset would cause duplication, so
negative offsets are not allowed; instead, another bit `everts' the
region, so that everything *not* in the scrolling region moves upward,
and the video-memory Y-offset register is adjusted when the display
frame is all displayed.
The index enable bits simply tell the adder that, if its operation
reaches into the area that is scrolling, it should add the new or old
index values to the x and y coordinates of those points, to compensate
for the fact that the bits are about to show up elsewhere.
Alas, the scroll hardware will only do vertical scrolls on a four-bit
boundary, so most of the time we cannot use it. When we can, it seems
like too much trouble anyway, as various operations must be done at the
start of a video frame, which appears to require instantaneous response
to a framing interrupt. X11 does not use the scrolling hardware.
Just when you thought you were done with rasterops...: The source
1 raster can also be scaled up or down during a rasterop (but not a
polygon fill). A 13-bit binary fraction is available for up- or down-
scaling. We have no particular use for it and never touch it.
Of course, there has to be a way to set the viper registers and the
two chip select registers. This is done with a `register load'
command. There are three kinds of register load (write) operations:
external, viper, and `Z-axis viper' loads. External loads are used to
set the chip selects; viper loads are used to set viper registers.
Each viper load sets that register in all the selected vipers, so to
load just one viper's foreground color register, for instance, we have
to disable all the others, do the load, and then reenable them. This
rapidly gets annoying, so there is the third kind of load. A Z-axis
load writes one (1) bit to each of the currently-selected vipers, by
writing 16 bits and having each viper pick up the one corresponding to
its plane number. The viper then makes 16 copies of that bit and
shoves it into the appropriate register. Only the foreground and
background color and the fill and source registers can be loaded this
way (but those are the ones needing all-1 or all-0 values most often).
These Z-axis loads also specify a `Z block', which must be 0 for the
VCB02. It appears to be intended for 24-bit color displays, which
appear never to have got off the ground. That appears to a good thing.
All of this lets us move bits around on the screen, but not get them
there in the first place. Fortunately, the adder also supports CPU-to-
bitmap and bitmap-to-CPU (`processor') transfers, and in two modes.
PTB and BTP transfers can be done in `Z-axis' mode, where the 8 vipers
get or put one bit at a time from each screen position. The 8 bits are
assembled to (or disassembled from) a byte, which shows up as the low
byte on the I/D bus. Thus we can read or write the current color at
any pixel location. The other mode, `X-mode', lets us read from one
viper (which must be set up beforehand with the chip select register)
or write to one or more vipers (but only if they all get the same value
for each pixel---this can only write all 1s or all 0s; Z-axis transfers
are easier, so it will usually be one viper). Both of these are actually
implemented as a form of rasterop; most of the same features apply,
except that PTB rasterops do only an r/m/w cycle regardless of which
csr register is used (see the pseudo-code below). PTB and BTP transfers
can be assisted by the DMA gate array (more about this below).
Finally, the adder also does all the timing and sync generation for
the QDSS display. It is explained a bit in the VCB02 manual, but is
irrelevant for our purposes, and need only be set up once, thence to
be correct forever (unless some goon fiddles with the knobs). The
MicroVAX hardware takes care of it for the console display; the driver
does it once for other displays.
Take heart, for we are done with the adder and viper chips. All we
have left are DMA and template RAM, and the DUART, video RAM CSR, and
color maps. Of these, only the DMA gate array is fancy.
The DGA acts as the interface between the rest of the QDSS and the
Q-bus, so it has interrupt enable registers that affect everyone. It
also takes care of displaying a cursor. The cursor is simply a 16x16
pixel object that either obscures the bits underneath it, or allows
them to show through; for each obscured bit, the cursor is either on or
off. The first 16 words (`A data') are the enables (obscures), and the
second 16 (`B data') the bits to show where enabled. The bits for the
cursor appear in the last 32 words of the `template RAM'. This
`template RAM' is an 8 Kword (16 KB) chunk of memory on the QDSS. The
first 64 words are used as a DMA FIFO.
When it is not busy showing the cursor, the DMA gate array can be in
one of three modes: idle; doing PTB or BTP DMA; or processing `display
list' commands. The FIFO must be always empty before changing modes,
but otherwise (when doing another operation like the last one), it
suffices to wait only for the DMA byte count to reach zero.
PTB and BTP DMA are straightforward: simply set up the adder to do
the appropriate PTB or BTP, then ask the DGA to do it. If the DMA is a
Z-axis transfer, it can (but need not) be done with `byte packing'
mode, to make each 16-bit Qbus transaction carry two bytes to or from
the bitmap. The hardware appears to be able to transfer an odd number
of bytes even with packing enabled.
Display list mode has nothing to do with real display lists; forget
whatever you may know about them. The QDSS uses it instead to mean
running a bunch of microcoded commands. The commands are loaded into
the FIFO before being run; a special command (JMPT) saves the current
FIFO-execution address (if running from the FIFO) and loads a new
address, which must be somewhere in template RAM. It continues to run
from that location until it gets another JMPT. A JMPT that jumps to
location 0 (actually, anywhere in words 0..63) fetches the saved
address and resumes. JMPT is a thus subroutine call instruction in the
FIFO and a branch or a return in template RAM. Another special command
(PTB n) tells the DGA to treat the next n words as data for the adder's
IDD register. (Byte unpacking is not available here.) Otherwise,
unless bit 15 is set, the command is treated as data to be stuffed into
the adder's ADCT register (thus indirecting into some other adder
register, since bit 15 is off).
If bit 15 is set, bits 14, 13, and 12 have special meanings. Bit 14
suppresses writing to the ADCT register. Unless it is on, bits 11..0
are sent to ADCT (along with bit 15, thus setting ADCT itself; my guess
is that all 16 bits are sent, and the adder ignores the extras). Bit
13 makes the DGA read and execute one word from the FIFO (even if it is
already running from the FIFO, though it is then pointless). Bit 12
forces the next `execute' cycle to treat the whole word as data, to be
stuffed into ADCT. Typically 14 and 13 would be set together, lest
the `fetch from the FIFO' command be written to ADCT. (But this could
be useful for, e.g., `write the next argument to the foo register'.)
The template RAM is thus used to hold `macros' for oft-repeated
operation. These can end in infinite JMPT-loops, provided the loop
reads the FIFO, as the loop will stop when the DMA byte count runs
out, and the next operation will start from the FIFO.
The DUART is a perfectly ordinary DUART, probably some flavour of
Intel or Signetics part. The color map (what do you mean, me
inconsistent? the QDSS manual says it is a color map, not a colour map)
is also perfectly ordinary, except that instead of a red, green, and
blue value for each position, it has a red table, a green table, and a
blue table, each 256 words long. Both the DUART and the color map are
entirely write-only.
Here is the overall rasterop algorithm again, with all the nonsense
compressed out. `which==3' is the r/m/w cycle. (Refer to the expanded
version for details.)
short ctl[2][4], src, m1, m2, lf[4], fg, bg;
do_rop_cycle(which, bank, is_ptb, mem, lfnum, edgemask)
int which, bank, is_ptb; short *mem; int lfnum; short edgemask;
{
short c = ctl[bank][which-1], id, md, s, v, mask, f;
md = *mem; if (c & SEND) *ID_BUS = md; id = *ID_BUS; <<simultaneous>>
RD_REGS(c) = id; LD_REGS(c) = md; <<set null|src|m1m2|m2>>
<<should have c&SS iff which!=3>>;
if (which == 3 || is_ptb) { /* do an r/m/w cycle */
f = lf[lfnum];
mask = (f&LF_NOTM1? ~m1:m1) & (f&LF_NOTM2? ~m2:m2) & edgemask;
v = apply(LF_MASK(f), md, src);
*mem = (((v & fg) | (~v & bg)) & mask) | (md & ~mask);
}
}
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain: chris at cs.umd.edu Path: uunet!mimsy!chris
More information about the Comp.unix.ultrix
mailing list