Raw vs. block device

Sun Jan 8 07:08:49 AEST 1984

From:            Rich Wales <v.wales at ucla-locus>

Steve --

One statement you made in your explanation of raw vs. block I/O may need
a little bit of clarification or expansion.

	Physio() hands to the disk device strategy routine the
	"block number" of the request.  The block number is
	derived quite simply as u.u_offset>>BSHIFT.  u.u_offset
	is the current "lseek" position of the open raw device
	file, BSHIFT is log2(BSIZE).  Thus, all RAW I/O opera-
	tions must occur on a BSIZE boundary.

This is true in most cases (in particular, it is true for the disk driv-
ers that come distributed with most UNIX systems), but it need not be
true in general.

The "physio" routine does set the "b_blkno" variable to the disk block
number, as you described.  However, a raw driver interface does not have
to use the "b_blkno" value proffered by "physio" if it doesn't want to
-- since it still has access to "u.u_offset" in the user structure.

So it is possible to design a raw interface that doesn't do operations
on a BSIZE boundary -- provided that the device in question supports
such activity.  Indeed, I did this very thing in an RX02 driver I wrote
-- since the sector size on an RX02 is either 128 or 256 bytes (depend-
ing on the density of the disk), I ignored the "b_blkno" calculated by
"physio" and did my own computation based on "u.u_offset".

This brings up another interesting facet of UNIX, by the way -- namely,
"When can you safely refer to the user structure in kernel code?"

The basic rule is that you can safely refer to data in the user struc-
ture only in those parts of the kernel code that are executed in direct,
immediate response to a request by the user program -- such as a system
call (CHMK instruction) or a trap.  In the case of a system call or a
trap, the user process context (including, in particular, the process's
user structure) is retained and can safely be referenced.

Kernel code executed asynchronously (e.g., as the result of an inter-
rupt), on the other hand, must not do anything with or to the user
structure, because the process that happened to be running at the time
of the interrupt is, in general, simply an innocent bystander with no
logical connection to the interrupt condition.  This is the reason, by
the way, why you can't put a "uprintf" (kernel-generated write to user's
terminal) in an interrupt routine -- "uprintf" identifies the user's
terminal by looking in the user structure (u.u_ttyp), and a "uprintf" in
an interrupt routine would end up writing to a random terminal.

In dev/hp.c, for example, the only routines where it is safe to refer to
the user structure are "hpread" and "hpwrite" (which are called as a re-
sult of a "read" or "write" on a raw device).  Although "hpstrategy" is
also used for raw I/O, you can't refer to the user structure in it be-
cause it is also used for block I/O (which is asynchronous to the pro-
cess or processes doing the "read"s or "write"s).

In the case of the RX02 driver I mentioned earlier, I put code in my
"rxstrategy" routine to compute a sector number based on "u.u_offset"
and the disk density.  I could safely do this because my driver sup-
ported ONLY a "raw" interface, and thus my "strategy" routine would al-
ways be invoked in the context of the user process that issued the
"read" or "write" system call.  If I had chosen to implement a "block"
interface as well, I would have had to use two different "strategy"
routines -- one for raw I/O (specified as an argument to the "physio"
calls), and one for block I/O (in the "bdevsw" array in dev/conf.c).
The "raw strategy" routine could safely use "u.u_offset"; the "block
strategy" routine, on the other hand, would have to make do with the
"b_blkno" value from "physio".

-- Rich <v.wales at UCLA-LOCUS>