unix file system

Jon Campbell jcampbell at mrfort.DEC
Fri Jul 26 03:53:22 AEST 1985


 
 
 
                                From: Jon Campbell
                                      Digital Equipment Corp.
                                      Marlboro, MA
                                      617-467-6876
                                      DECnode:MRFORT::JCAMPBELL
 
To:  UNIX developers and users
 
Subject:  problems with the UNIX file system
 
Some of us at Digital think we have found a basic problem with the  UNIX
file  system  for FORTRAN.  The problem is that there is no place to put
various kinds of information about  the  contents  of  the  file.   More
specifically:
 
    1.  The FORTRAN language requires that one be able to  have  "random
    access"   files,  with  a  fixed  "recordsize".   The  obvious  UNIX
    implementation is one which uses a fixed number  of  bytes  (perhaps
    even with a <newline> at the end) for each "record".  However, there
    is no way on UNIX that one can open such a file  and  find  out  the
    size  of  each  record.  Thus it is impossible to write a utility to
    look at, modify, or extract data from such a file without  the  user
    having previous knowledge about the file.
 
    2.  As you probably know, most FORTRAN output data files reserve the
    1st  character  position of each output line for a "FORTRAN carriage
    control  character".   When  the  file  is  printed  (or,  in   some
    circumstances,  typed)  these  control characters are supposed to be
    translated into corresponding vertical motion  characters  (such  as
    one  or  more line-feeds, a form-feed, a vertical tab, etc.) and the
    <newline> character at the end of the "record" is removed.
 
    So FORTRAN output files  are  "different"  than  other  files,  even
    though  you  cannot  tell  that  by looking at them - they just have
    "funny numbers" in the 1st character position of  each  line.   UNIX
    provides   a  utility  for  piping  the  FORTRAN  output  through  a
    translator module, so that the  vertical  motion  characters  appear
    directly  in  the  output  file.   But  often  that  is  not what is
    desirable.  Often one wants  to  leave  the  file  in  its  original
    ("FORTRAN  data  file")  state, modify it many weeks later, and then
    print it.  Again, as in the case above, the user must know that  the
    file  was  produced  by  a  FORTRAN  program and pipe it to a filter
    program on the way out to the printer or terminal.
 
    3.  The ANSI Magnetic Tape Label Standard  defines  a  set  of  file
    attributes  in the file labels which must be filled in when the tape
    is written.   Among  them  are  record  size  and  carriage  control
    (referred to in the Standard as "Form Control").
 
I would like to propose that UNIX users and  developers  begin  thinking
about  which  "file  attributes" (knowledge about the file that would be
useful to know for  generalized  programs  which  cannot  have  previous
knowledge  about  each  file)  would  be useful to attach to UNIX files.
Keep in mind that these "attributes" would NOT in any way  detract  from
the  simplicity of UNIX - one would not have to use them;  they would be
                                                                  Page 2
 
 
there only for those users who wish to carry information about the files
along  with  the  files.   Nor would files with attribute information be
looked at by UNIX in any way than they are looked at  now  -  they  just
have  some  more information about them that can be discovered when they
are opened.  No "file management layer"  is  implied  for  UNIX  by  the
creation of these "attributes".
 
We would not even have to make an "incompatible change" for the printing
of files with the "FORTRAN data file" attribute:  a new command could be
introduced to take the place of LPR for those users who wish the utility
to find out whether the attribute is set and print the file accordingly;
many people would probably continue to use LPR.
 
Below is a list of those "attributes" which I have found  useful  in  my
work  in  implementing  the  FORTRAN  runtime  library  for  TOPS-10 and
TOPS-20.  Many of them have been included  in  the  ANSI  Magnetic  Tape
Label Standard:
 
Carriage control
  FORTRAN - funny numbers in char position 1, translated on printing
  LIST - take just the contents of the "record", add a <newline>. This
        is for files which have no <newline> characters in them
  NONE - print the file as it appears (the default)
 
Character set (for those folks who want to have both EBCDIC and ASCII files)
 
Record format - (refer to the Tape Label Standard)
  Delimited - each record has a 4-character byte count in front of it
  Fixed - all records have the same length, with no terminators
  Undefined - the default - no implied record format
 
Record size (For "fixed" record format, the size of all records;
        for variable-length records, this is usually interpreted
        as the maximum record length - zero means "unknown"
        maximum record length)
 
File type (for "data management" programs...)
  Sequential (the default)
  Others (user-definable, for various flavors of other types
        of access, such as [ugh] indexed sequential, database, etc.)
 
Bytesize (for typesetting applications which use 16- or 32-bit
        character sets)
 
I'm sure you'll all think of others that would be useful.  Since I  have
not  looked  at  the  UNIX  internal file system much, I do not know how
difficult it would be to  find  a  place  to  attach  this  large  (and,
potentially,  expanding) set of attributes, or what the FOPEN (or other)
interface would look like to set/get the attribute values.
 
                                        Thanks for your time,
                                        Jon Campbell
   --------



More information about the Comp.unix.wizards mailing list