unix file system
Jon Campbell
jcampbell at mrfort.DEC
Fri Jul 26 03:53:22 AEST 1985
From: Jon Campbell
Digital Equipment Corp.
Marlboro, MA
617-467-6876
DECnode:MRFORT::JCAMPBELL
To: UNIX developers and users
Subject: problems with the UNIX file system
Some of us at Digital think we have found a basic problem with the UNIX
file system for FORTRAN. The problem is that there is no place to put
various kinds of information about the contents of the file. More
specifically:
1. The FORTRAN language requires that one be able to have "random
access" files, with a fixed "recordsize". The obvious UNIX
implementation is one which uses a fixed number of bytes (perhaps
even with a <newline> at the end) for each "record". However, there
is no way on UNIX that one can open such a file and find out the
size of each record. Thus it is impossible to write a utility to
look at, modify, or extract data from such a file without the user
having previous knowledge about the file.
2. As you probably know, most FORTRAN output data files reserve the
1st character position of each output line for a "FORTRAN carriage
control character". When the file is printed (or, in some
circumstances, typed) these control characters are supposed to be
translated into corresponding vertical motion characters (such as
one or more line-feeds, a form-feed, a vertical tab, etc.) and the
<newline> character at the end of the "record" is removed.
So FORTRAN output files are "different" than other files, even
though you cannot tell that by looking at them - they just have
"funny numbers" in the 1st character position of each line. UNIX
provides a utility for piping the FORTRAN output through a
translator module, so that the vertical motion characters appear
directly in the output file. But often that is not what is
desirable. Often one wants to leave the file in its original
("FORTRAN data file") state, modify it many weeks later, and then
print it. Again, as in the case above, the user must know that the
file was produced by a FORTRAN program and pipe it to a filter
program on the way out to the printer or terminal.
3. The ANSI Magnetic Tape Label Standard defines a set of file
attributes in the file labels which must be filled in when the tape
is written. Among them are record size and carriage control
(referred to in the Standard as "Form Control").
I would like to propose that UNIX users and developers begin thinking
about which "file attributes" (knowledge about the file that would be
useful to know for generalized programs which cannot have previous
knowledge about each file) would be useful to attach to UNIX files.
Keep in mind that these "attributes" would NOT in any way detract from
the simplicity of UNIX - one would not have to use them; they would be
Page 2
there only for those users who wish to carry information about the files
along with the files. Nor would files with attribute information be
looked at by UNIX in any way than they are looked at now - they just
have some more information about them that can be discovered when they
are opened. No "file management layer" is implied for UNIX by the
creation of these "attributes".
We would not even have to make an "incompatible change" for the printing
of files with the "FORTRAN data file" attribute: a new command could be
introduced to take the place of LPR for those users who wish the utility
to find out whether the attribute is set and print the file accordingly;
many people would probably continue to use LPR.
Below is a list of those "attributes" which I have found useful in my
work in implementing the FORTRAN runtime library for TOPS-10 and
TOPS-20. Many of them have been included in the ANSI Magnetic Tape
Label Standard:
Carriage control
FORTRAN - funny numbers in char position 1, translated on printing
LIST - take just the contents of the "record", add a <newline>. This
is for files which have no <newline> characters in them
NONE - print the file as it appears (the default)
Character set (for those folks who want to have both EBCDIC and ASCII files)
Record format - (refer to the Tape Label Standard)
Delimited - each record has a 4-character byte count in front of it
Fixed - all records have the same length, with no terminators
Undefined - the default - no implied record format
Record size (For "fixed" record format, the size of all records;
for variable-length records, this is usually interpreted
as the maximum record length - zero means "unknown"
maximum record length)
File type (for "data management" programs...)
Sequential (the default)
Others (user-definable, for various flavors of other types
of access, such as [ugh] indexed sequential, database, etc.)
Bytesize (for typesetting applications which use 16- or 32-bit
character sets)
I'm sure you'll all think of others that would be useful. Since I have
not looked at the UNIX internal file system much, I do not know how
difficult it would be to find a place to attach this large (and,
potentially, expanding) set of attributes, or what the FOPEN (or other)
interface would look like to set/get the attribute values.
Thanks for your time,
Jon Campbell
--------
More information about the Comp.unix
mailing list