Portability across architectures..
Steve Summit
scs at athena.mit.edu
Tue Sep 13 11:29:30 AEST 1988
In article <103 at simsdevl.UUCP> dandc at simsdevl.UUCP (Dan DeClerck) writes:
> I've run across a need to have data files in various forms of UN*X
> be portable to each other.
> I could write data out to files in ASCII, but this is cumbersome,
> slow and may hamper the products' marketability.
Please strongly consider using ASCII after all. The advantages
are many; the disadvantages are comparatively minor.
1. ASCII is well-nigh universal; portability is virtually
assured. Even if you ever want to go to an EBCDIC
machine, conversion utilities are bound to be readily
available (and conversion may indeed happen implicitly
when transferring a text file to such a machine).
2. It's usually not nearly as inefficient as you'd think.
Ironically, even sophisticated computer programmers
commonly ignore the fact that computers are just
blisteringly fast and can usually complete a seemingly
inefficient ASCII parse in far lees time than it takes
to think about it. (I am aware that there are high-
bandwidth, high-performance systems which cannot afford
the luxury of an ASCII parse, and are well-advised to use
binary transfer methods. I maintain that surprisingly
many real applications do not fall into this category,
and can use ASCII without paying a performance penalty.)
3. Reading and writing ASCII formats isn't really that
cumbersome; in fact I'd argue that binary formats, when
properly designed to account for word ordering and other
difficulties which ASCII formats easily overcome, are
more cumbersome in the long run.
4. Don't overlook debugging. ASCII formats can be
inspected with cat, piped through grep and sed and other
familiar utilities, patched with ordinary text editors,
etc., etc. The first program you write for your binary
format is usually not the application you were trying to
write, but the disassembler you find you need for
debugging; getting the disassembler working is often a
prerequisite for getting the end application working.
5. ASCII formats can make good, backwards-compatible
version number schemes easy to implement. Data formats
inevitably require revision to accommodate new features.
Fixed binary formats, especially those that simply write
structures out as bytes, are usually not amenable to such
changes, unless you did a lot of work to make them
extensible (which is another aspect that makes binary
formats more, not less, cumbersome than ASCII).
Introducing a "version 2" format then requires a host of
extra translation utilities, and nasty incompatibility
problems when programs try to read files of the wrong
format. (These compatibility problems can be successfully
worked around, but only if all files contain a version
number, which is usually not recognized or implemented
until version 1 is in place and version 2 is being
contemplated, by which time it's too late.)
Suppose, on the other hand, that your ASCII format
consists of arbitrary lines of text, with a keyword at
the beginning of each line indicating what kind of data,
(e.g. what field of a structure) that line contains. If
programs ignore unrecognizable lines (a good practice),
"version 1" programs can read "version 2" files without
modification, if the version 2 keywords are a superset of
version 1's. Version 1 filters and editors can even
modify version 2 files, without losing version-2-specific
information, by saving, and echoing to the output,
unrecognized lines without interpretation.
(It's true that a binary format employing variable-length
records with a type field in a consistent place would
also enjoy these advantages. Such records are in fact
common in network protocols.)
The only real problem I've ever had with ASCII data interchange
formats is that you tend to lose a bit of precision when reading
and writing doubles, but you can minimize this by printfing
things with %.ne, for n sufficiently large. If the precision
inherent in the data is less than that of a double, you're only
"losing" something you didn't have in the first place.
I'm not sure how using ASCII data formats could "hamper a
products' marketability." If not an efficiency concern, it's
probably some attempt to keep information hidden in a cryptic
binary format rather than having it in plain text that anyone
could read.
Steve Summit
scs at adam.pika.mit.edu
More information about the Comp.lang.c
mailing list