Refined C

Fri Feb 26 06:28:00 AEST 1988

Several folk have requested that I post a brief  description
of refined C...  note that conformant arrays are a subset of
paramtypes.

                       Refined C (RC)
                          H. Dietz

     Refined C is one of the fruits of my thesis  work.   To
begin  with,  I  studied  conventional  C to determine which
language constructs unnecessarily  blur  flow  analysis  for
automatic  parallelization.   Among  the  worst of these are
interprocedural analysis  problems  in  supporting  separate
compilation,  the inability to precisely name a portion of a
structure (pointers do this, but not in a way  the  compiler
can understand), and the need to override the type system in
solving common problems.  Refined C - RC - is the result  of
refining  C  so that pretty much the same things can be done
with pretty much the same syntax (less than 5% change),  but
without confusing the compiler.

1.  Language Overview

     The following is a very brief description  of  how  the
current definition of RC differs from ANSI C:

1.1.  Access Right Specifications

     Function prototypes are required in RC, and have a syn-
tax  very similar to that used in ANSI C, however, one addi-
tionally specifies access rights being  transmitted  through
pointer  arguments  and  access rights for global data.  The
symbol "<" is used to indicate read-only, ">"  means  write-
then-read,  "^"  means modify, and "-" means no access.  For
example:

{ < x; } int f( ^ char *p);

where x is a global variable, means that  f  is  a  function
which  returns an int value, has modify access to characters
through its argument p, and  has  read-only  access  to  the
value  of  the  global  variable  x.  Violation of specified
access rights is a fatal compile-time  error.   Further,  no
function  can  call another function which has access rights
superior to those of the caller.

1.2.  Partitioning

     Partitioning allows one to refer to (and specify access
rights  on)  arbitrary non-overlapping pieces of data struc-
ture.  Given a declaration like:

struct s { int a, b; } c, d[100];

it is obvious that c, for example, can be  partitioned  into
c.a  and c.b; however, one can also partition d into d.a and
d.b, each of which acts much  like  an  array  of  100  int.
Further,  it  is  possble  to create new names for arbitrary
collections of elements from an array:

part(d[i], e, f(i), g);

would make e a name for the elements d[i] such that f(i)  is
true,  and  would make g a name for remaining elements of d.
Partitioning need not imply making a copy or even evaluating
the  separating  conditions - it simply serves to inform the
compiler that accesses made through the name e are  disjoint
from  those made using the name g, and it provides a formula
for confirming this fact.  Knowing that d[j]  and  d[k]  are
provably  non-overlapping  data  is  key  to  good automatic
parallelization - by turning this into e[j] and g[k]  parti-
tions efficiently provide such a proof.  Of course, the par-
titions of a struct or array  can  be  logically  subdivided
further.

1.3.  Paramtypes

     A paramtype is a variable-size parameterized type, each
instance  of  which is tagged with the values of its parame-
ters (and these values can be examined as though  they  were
elements  of  a  struct).  For  example,  a  variable-length
string type might be:

struct string(length) { char letters[length]; };

to allocate a string named s which could hold i  chars,  one
would simply write:

struct string(i) s;

One could then reference s.length and  s.letters.   Provided
that  there  is  at most one variable-length field, the com-
piler would typically sort that to the last position in  the
struct  and  addressing  would  work just like the current C
"hack" of declaring a  struct  and  malloc-ing  extra  space
after  it  for  the last field to grow into (e.g., declaring
letters[1]).  The big  difference  is  that  paramtypes  are
full-fledged  types,  hence you can have a function return a
paramtype, take sizeof a paramtype, etc., just as though  it
were  an  ordinary  struct.  Paramtypes also permit multiple
parameters  and  multiple  variable-sized  fields,  although
accesses  to  members  of  such paramtypes will typically be
slower because variables are involved in the address  calcu-
lations.

     Although, in retrospect, the type of a parameter should
also be specified in the declaration, the current definition
of RC assumes that they're all ints.

2.  Status

     A complete  optimizing/parallelizing  RC  compiler  has
been  built,  by  K.  Stein,  but it is owned by the company
which funded it.  In addition,  we  have  a  prototype  tool
called  CP  which,  using  rather complex and slow analysis,
converts ordinary C code into reasonable (but not great)  RC
code; CP is in the public domain.  At Purdue, we are working
on another tool called CR, to interactively guide  the  pro-
grammer    in   improving   RC   code.    There   are   also
definitions/research efforts in  RF  (refined  FORTRAN),  RP
(refined Pascal), RL (refined Lisp), and RA (refined Ada).

3.  Annotated References

[1]  H. Dietz and D. Klappholz,  "Refined  C:  A  Sequential
     Language  For Parallel Programming," 1985 International
     Conference on Parallel  Processing,  August  1985,  pp.
     442-449.
     Gives original (pre-ANSI) definition of RC.

[2]  H. Dietz and D. Klappholz,  "Refined  FORTRAN:  Another
     Sequential  Language  for  Parallel  Programming," 1986
     International Conference on Parallel Processing, August
     1986, pp. 184-191.
     Gives definition of refined FORTRAN (RF).

[3]  H. Dietz, The Refined-Language  Approach  to  Compiling
     for  Parallel Supercomputers, Ph.D. Thesis, Polytechnic
     University, June 1987.
     Most complete review of post-ANSI RC and  concepts,  as
     well  as  compilation  techniques.  Also available as a
     Purdue University internal report and soon  (hopefully)
     as  a published monograph incorporating compiler imple-
     mentation notes by K. Stein.