Speed of seeks
John P. Linderman
jpl at allegra.UUCP
Tue May 20 00:58:34 AEST 1986
With a little help from Chris Torek, Peter da Silva noted
>>According to the library, the argument to [fseek] needs to be the
>>returned value from [an ftell()] call: it may indeed be a magic cookie
>>like a sector/offset pair (and in fact "magic cookie" is the way it's
>>described in the manual). It is under RSX/11M and on the ATARI 800.
and Chris goes on to add
>Actually, those who write libraries such that fseek takes a byte
>index either have missed that point, or are very clever: Because
>PDP-11 and Vax Unix systems have always had fseek and ftell return
>byte indicies, there is no doubt quite a bit of code that depends
>on that behaviour. These programs are technically broken, but it
>is hard to keep a customer happy by informing it [note non-sexist
>word :-)] of this fact.
This points up the typical bind faced by those who wish to write programs
that are portable but not pathetic. I wanted to write a sort program
that checkpointed itself... after sorting and writing a coreload of data,
the data is (kind of) safe, and, if the system crashes, it isn't
necessary to sort it again, if we can just seek past it in the input
file. In general, we don't know how many complete records will fit in a
coreload until we are part way through a record that is too big to fit.
If seek addresses are byte offsets, no big deal. Just keep track of how
many bytes have been read after each complete record. But if seek
addresses are magic cookies, bad news. We can use ftell to find the find
the address of the middle of the current, incomplete, record, but that
does us no good at all, since it is the address of the START of the
record that we need, and you cannot do cookie arithmetic. Or, we can do
an ftell after every complete record, just in case the next record won't
fit. Ughleee! That ftell boils down to a system call, and after stdio
goes to all the bother of minimizing system calls, we come along and add
one after every record. I have tried this, and it results in performance
that is anywhere from 10 to 100 times as bad! Completely unacceptable.
In the particular case, there is a way out. Since restarting from a
checkpoint is rare, and since input may be coming from a pipe anyway,
it is both tolerable and preferable to get to byte offset N by doing
N getc's (I like to call this a getceek.) But the solution is peculiar
to this problem, and I would have chosen to ``technically break'' the
program if I had been forced to choose between complete portability and
acceptable performance.
John P. Linderman Seek and Ye Shall Grind allegra!jpl
More information about the Comp.unix.wizards
mailing list