Is Unix stdio slow?
Chris Torek
chris at mimsy.UUCP
Mon Sep 26 02:18:59 AEST 1988
[moved from comp.os.vms]
In article <3951 at psuvax1.cs.psu.edu> schwartz at shire (Scott Schwartz) writes:
>I've seen unix programs (things like a grep replacement) that got
>similar speedups by replacing stdio calls with read/write and a large
>buffer. I wonder how much of that 2-6x is from overhead in stdio,
>rather than in the filesystem.
Quite a bit. The overhead is not all in stdio itself, but in inefficient
expansions of stdio macros:
> while ((c = getchar()) != EOF)
> putchar(c);
If you are not going to inspect each character, it seems a bit silly
to bring them in and put them out one at a time, but let us take a
look at what the traditional dumb Unix C compiler generates for this:
jmp while_test
loop:
<<code for putchar>>
while_test: | while ((c = getchar()) != EOF)
sub $1,stdin_cnt | --cnt >= 0 ?
jls refill
mov stdin_ptr,r1 | *(unsigned char *)p++
clr r0
movb @r1+,r0
mov r1,stdin_ptr
jmp merge
refill: push stdin | : _filbuf(&stdin)
call _filbuf
tst @sp+ | (some sort of pop)
merge: mov r0,C(fp) | c = ...
cmp r0,$-1 | c == EOF?
jne loop | no, loop
<<code to exit>>
Now, stdio reads in blocks, typically 512 to 65536 characters at
a time, so the most common path through this code is
sub $1,stdin_cnt; no_jmp; mov stdin_ptr,r1 | 3
clr r0; movb @r1+,r0; mov r1,stdin_ptr | 3
jmp merge; merge: | 1
mov r0,C(fp); cmp r0,$-1; jmp loop | 3 = 10 instrs
Most of this is unnecessary! If we put stdin_ptr in a register
in advance, and note that r0 (and hence C) can never be -1 since it
is zero-extended from a byte, we should see something like
sub $1,stdin_cnt; no_jmp | 2
clr r0; movb @p+,r0 | 2
mov r0,C(fp); jmp loop | 2 = 6 instrs
The code to call `filbuf' has to save this extra pointer, but that is
no big deal. If BUFSIZ is 1024, we saved 4*1023=4092 instructions at
the expense of one instruction when calling filbuf, for a total savings
of 4091 instructions per read! We might save `only' ~2000 instructions
if stdin_ptr must be left global (which is true if the loop is
nontrivial).
What about putchar? It is even worse:
| putchar(C(fp))
sub $1,stdout_cnt | --cnt >= 0 ?
jls flush
mov stdout_ptr,r0 | *p = c,
mov C(fp), at r0
tstbit $7,stdout_flg | flag&IS_LINE_BUFFERED
jz no_nl_flush
cmp @r0,$10 | && *p == '\n' ?
jne no_nl_flush
push $10 | _flsbuf(stdout, '\n')
jmp fls0 | (merge)
no_nl_flush:
clr r1 | : *(unsigned char *)p++
movb @r0+,r1
mov r0,stout_ptr
mov r1,r0
jmp merge
flush: push C(fp) | : _flsbuf(stdout, c)
fls0: push stdout
call _flsbuf
cmp @sp+, at sp+
merge:
The most common case is, again, that the buffer does *not* fill.
Unfortunately, this case still has to test for line buffered
output, and the flag can (and in fact does) change after calls
to _flsbuf, so the test cannot easily be hoisted out of the
loop (it can be done, by splitting and cross-branching in the
flush code, but this is not often a good optimisation: this is
a peculiar case).
Eliminating line-buffered output descriptors shrinks putchar()
considerably:
| p is `unsigned char *'
| --cnt >= 0 ? *p++ = c : _flsbuf(stdout, c)
sub $1,stdout_cnt
jls flush
mov stdout_ptr
mov C(fp), at r1
clr r0
mov @r1+,r0
mov r1,stdout_ptr
jmp merge
flush: push C(fp)
push stdout
call _flsbuf
cmp @sp+, at sp+
merge:
The copy-a-buffer-at-a-time loop looks like this:
jmp while_tst
loop:
push n | write(1, buf, n)
push buf
push 1
call write
add $6,sp | (pop)
while_tst:
push size | n = read(0, buf, size)
push buf
push 0
call read
add $6,sp
mov r0,n
tst r0 | while (... > 0)
jgt loop
which has *no* per-character overhead, compared with a minimum 6+11
(getchar+putchar) instruction per-character overhead for stdio. When
the buffer size is large (e.g., 1024), that is 17000 instructions of
overhead per read and write call. read and write are each thousands of
instructions, but they no longer swamp the overhead in stdio. In
programs that do real work with each character, at least some of the
stdio `overhead' is in fact necessary, so you lose less.
Using fread and fwrite can help, if they are intelligently written.
(They are NOT so written in old BSD releases.)
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain: chris at mimsy.umd.edu Path: uunet!mimsy!chris
More information about the Comp.unix.wizards
mailing list