Unix/C program modularity

Thu Oct 24 07:42:25 AEST 1985

I don't think that Jon and I disagree very much.  My posting was
in response to a fellow who appeared to have not fully assimilated
the use of processes as opposed to subroutines.  Therefore I
emphasized the worth of processes, perhaps giving the impression
that I don't think much of library functions.  The following
lengthy discussion is mostly about software engineering, not UNIX.

KEY:  >>> original worrier  >> me  > Jon

> > Re-usability is obtained at the higher, process, level.

> Doug, you are entirely correct, but this does not negate the case in
> favor of lower level reusability.  Go figure out one day how much of
> your disk is being used by copies of the C library. Compile a program
>
> 	main(){}
>
> using the "-lc" option and multiply this size by the number of
> executable files on your system.  I think you will be astonished.

342 disk bytes each, of which 216 bytes is overhead, for a total of
126 bytes of code and data.  I am astonished -- at how small it is!
Your example did not illustrate the point you were trying to make,
which is not important if you have "shared libraries" (see below).

I am for good software economics at all levels.  The ultimate "low
level" is the UNIX kernel, which offers nice reusable facilities.

> Having modularity at the procedure level can be a great boon,
> particularly if you have a loader which links in libraries at run time
> and doesn't duplicate code unnecessarily.

I do agree that "shared libraries" can cut disk space significantly.
They should only be used for modules with absolutely stable interface
definitions, however; otherwise, at some future date a module change
can instantly break a lot of formerly correct executable binaries.

I have argued for the use of libraries as well as for processes.
Libraries are good when they implement access routines for some
relatively complicated object, e.g. frame buffers or B-trees.  They
are also a nice way to provide generally useful programming support
functions, e.g. complex arithmetic, polynomials, vector math, list
structures, etc.  They can be handy in enforcing file structure,
although often just having good #include files is sufficient.  But
libraries have real problems in some cases (see below).

> > Many new applications should be produced by combining existing tools
> > rather than by writing code in the traditional sense.

Notice that I didn't say "all" or even "most".

Usually, an interesting application requires that its central
computational module(s) be implemented from scratch.  This
does not imply that the application should be a single monolithic
process, however.  I find that designing separate processes for
the hard computation and for the user interface often leads to
a better, more flexible, design.  At the last place I worked,
the Data Flow Diagram for a large new system ended up with
almost every bubble implemented as a separate UNIX process!
The user interface of that system consisted of a few screen-
oriented processes controlling and monitoring the data flows
between subordinate processes.  One of the nice things was
that the computational modules could be developed and tested
separately, they could be used in a batch mode quite easily,
and our blind programmer could operate the calculations via
a simple terminal interface (Bourne or C shell on a Braille
soft-copy terminal).  If we had bundled everything into one
bulky executable module, the system would have been much less
adaptable and effective.

> I agree, in a development environment.  In an applications environment
> this leads to systems which are inconsistent, have no reasonable error
> messages, are poorly documented, and are confusing as hell in
> general.  This is not an intrinsic property of the approach, except to
> the extent that the approach does not enforce programmer discipline in
> such matters.

As you say, not an intrinsic property of the approach.

I don't think any approach that really automatically enforces programmer
discipline is as good as simply having conscientious programmers.
UNIX was designed for skilled software developers for their own
use; it may well be true that it is not sufficiently rigid to keep
mediocre or incompetent programmers out of trouble.  I don't think
you can have it both ways.  (By the way, the people who were pushing
the raw UNIX shell interface as desirable for nontechnical end-users
were fools!  Such users need a controlled, "safe" environment, much
as poorer programmers need highly contrained language systems.)

> > This works especially well when one is trying to support a wide and growing
> > variety of graphic devices.

> Again, a general purpose device independent graphics library would
> probably serve the need better, and would almost certainly allow for
> more efficient collapseing of common code used.  It would also narrow
> the scope of modifications necessary to support new devices, thereby
> helping to minimize support costs.

In non-"shared library" environments, that approach would require
rebuilding all executable binaries before they could be used with
a new device.  With device-independent intermediate files/pipes,
the code that generates graphics is cleanly separated from the code
that displays them, which makes new display devices much less hassle.

Any viable form of modularity suffices to limit the scope of work
to add new devices.

The argument is often heard that interactive graphics requires a full-
duplex connection between the graphics-generating application and the
display/input device (this is true) and that that cannot be achieved
efficiently enough by separate processes (this is false).  We have a
counterexample in daily production use.

The best interaction is obtained with device-specific code handling
the interaction, but that should not affect most of the application.

> > > As a result of this philosophy to design systems as a network of filters
> > > piped together:
> > > 
> > > 	o Much of the bulk of the code is involved in argument parsing,
> > > 	  most of which is not re-usable.

> If your argument parsing is so long, you haven't done it sensibly, or
> your argument conventions need to be rethought.  No matter how you
> slice the program, It has to take arguments, and the shell is doing
> all of the argument division anyway, so there is no added complexity
> to speak of involved in the use of pipes.

I think maybe he meant that there was a loss of efficiency in turning
binary data into character arguments to pass to a process and to decode
them in the invoked process.  If so, the counter-argument is that there
should not be much information passed as process arguments; if a large
number of parameters have to cross the process-process interface, then
either they should be in the major data flows (files, pipes) or someone
has not designed the module interfaces right.

> > > 	o Error handling is minimal at best.  When your only link to the
> > > 	  outside world is a pipe, your only recourse when an error
> > > 	  occurs is to break the pipe.

> This is not really true.  I do agree, however, that error recovery is
> substantially harder if your disjoint pieces of code are not properly
> modularized.  If you and the next guy in the pipe sequence need to
> resynchronize your notion of the data stream, you have invoked a lot
> of code AND protocol overhead.  C's modularity and data sharing
> facilities are not what they could be.  It is one of the only features
> of Modula-2 which I like, though I think the Modula-2 approach is a
> pain.

All part of proper module design, no matter how implemented.

> > If a subordinate module is not able to perform its assigned task,
> > it should so indicate to its controlling module.

> Have you ever tried to implement this using only error(n)?  Talk about
> rendering your code complex...

Success/failure return requires only one bit, and UNIX already has
an established convention for this.  If a lot of complicated dialog
between master and slave modules is required to establish what has
gone wrong, then that too is part of necessary module interface
design no matter how implemented.  Often a pipe is used with a very
simple protocol for such communications.

An important point is that each level in the module hierarchy should
make its own assessment of the situation based on what its slaves
report, and after taking appropriate actions it should return a
boiled-down report to its own master.  The complexity at each level
of module interface should be about the same.

> > Error recovery is best performed at the higher strategic levels.

> Depends on the kind of error.  This argument comes back to the
> modularization point I made above.

Strategic decisions at low levels actually harm the goal of
reusability; if they are inappropriate for the application,
near-duplicate substitutes for the low levels must be developed.
A notorious example was the 4.2BSD network library module that
printed on stderr when a failure was detected.  Really!

> > > 	o Programmers invent "homebrew" data access mechanisms to supplement
> > > 	  the lack of a standard Unix ISAM or other file management.  Much
> > > 	  of this code cannot be re-used...

> I know of no database system where speed is important which has ever
> been distributed using a standard record support library.  I believe
> that you will find that all of the major database products, even under
> VMS, ultimately have given up and gone directly to doing their own
> thing directly using Block I/O because the provided record structure
> facilities, by virtue of the fact that they are general, are
> necessarily not well optomized to any particular task.  In particular,
> the overhead associated with error checks which your database system
> doesn't need is hideous.
>
> On the other hand, a resaonable record structure facility is something
> sorely lacking in UNIX, and is very useful when simply trying to get
> the code running initially.  It allows you to leave the modifications
> for a database hacker and get it running.  For non-critical code, or
> code where the critical element is development time and ease of
> support, this is crucial.

UNIX almost trivially supports fixed-length records; for more complex
file organizations, as you imply, no matter what might be provided it
would probably not be what a particular application really needed.
This would not be a problem if a general facility can be made "good
enough", and sometimes that is possible.  I think a "good enough"
record locking primitive already exists (in some UNIXes), but there
is not yet a standard database access library that is "good enough".
(Not counting ones that nobody can get their hands on.)

> > It is relatively rare that UNIX applications have to be concerned
> > with detailed file access mechanisms.

> Is this cause, or effect?

> I hope that my replies here are not unduly long.  I went over them,
> and I believe that I have made my points succintly: UNIX needs both a
> standard database facility and an intelligent notion of run-time
> libraries.

Amen! to the need for a database facility.  It need not be in the
kernel; simply nice ISAM and B-tree libraries would be a good start.
If these became generally available, then one could consider
establishing a "good enough" standard; that would be premature now.

More generally-available support library functions would be welcome;
I find the new ones in the System V standard C library to be quite
useful, but there are many others one can think of that should be
added (it looks like the directory access routines are on their way
to becoming standard, at long last).