Asynchronous I/O under UNIX
Larry McVoy
lm at snafu.Sun.COM
Fri Dec 29 18:35:46 AEST 1989
peterson at crash.cts.com (John Peterson) writes:
> My collegues and I have worked out a rough sketch of a way of doing
>asynchronous I/O. One would fork off a copy of your process, the child
>would 'nap' until an I/O request came from the parent. Upon receipt of
>an I/O request, the child goes off and issues a synchronous I/O request
>like one ordinarily does, and then set a flag of some sort when the I/O
>has completed. The data to be moved would be stored in memory accessible
>to the parent and child processes, probably using System V shared memory.
Yeah, this will work. A couple things to note:
(1) This is a bad idea for writes, especially under SunOS 4.x. See
(2), (3), (4) below.
It's a great idea for reads. Especially if you do it right. I would
keep a pool of processes around - i.e., don't do a fork per read,
do a fork iff you haven't got someone hanging around (forks are not
cheap, contrary to popular opinion). Also, let read ahead work for
you. Oh, yeah, do yourself a favor and valloc() your buffers rather
than allocating space off the stack. It won't help you now, but
I'm looking at ways of making I/O go fast and one game I can play
will only work if you give me a page aligned buffer. And use mmap()
if you can. It's much nicer than sys5 shm and it's in 5.4.
(2) Writes are already async, especially so on SunOS 4.x. I think it
is limited by segmap, which is around 4megs. On buffer cache Unix's,
you'll be limited to the size of the buffer cache (no kidding) which
is fairly small, around 10-20% of mem.
(3) Having lots of outstanding writes doesn't buy you very much. In fact,
it can really lead to weird behavior. Everyone should know that (on
simple controllers, at least) writes go through disk sort. Including
synchronous writes (NFS is a heavy user of sync writes). Well, given
that you go through disk sort, you won't ever get to starvation (i.e.,
a buffer will get written out) but you can get to something I call
being very hungry. Suppose you have a disk queue that starts out
with requests for cyl 0 and 100. Then suppose you do a series of
writes onto cyls >=0 but < 100. The buffer waiting for cyl 100
will wait until all of those i/o's (that came in after it did)
complete. That buffer waiting for 100 is in the "hungry" state.
Fortunately, this doesn't happen very often. Traces I've taken indicate
that disk requests (due to the BSD fs) are nicely grouped. You have to
have lots of busy processes doing unrelated I/O to get into this state.
I suspect the async i/o could hit this problem.
(4) Those outstanding writes cost memory. You have to grab the users data
before saying "I'm done". SunOS 4.x claims this is a feature "Our
writes finish faster than your writes, especially for big ones" seems
to be the party line. Well, for what I do this is a waste of mem
so I run a hacked version of ufs that limits outstanding writes
(mail me if you have src and want to try this - it's trivial to
implement and tunable. I'd be interested in outside comments).
(4) Reads could work really well.
What I say is my opinion. I am not paid to speak for Sun.
Larry McVoy, Sun Microsystems ...!sun!lm or lm at sun.com
More information about the Comp.unix.wizards
mailing list