signed/unsigned char/short/int/long

Sun Dec 11 10:19:40 AEST 1988

In article <839 at quintus.UUCP> ok at quintus.UUCP (Richard A. O'Keefe) writes:

    In article <347 at aber-cs.UUCP> pcg at cs.aber.ac.uk (Piercarlo Grandi) writes:
    >This is not
    >suprising, considering that C is a descendant of BCPL (whose single most
    >annoying feature is having to use putbyte() and getbyte() for string
    >manipulation, as it has just one length of integer).

    That hasn't been true of BCPL for a long time.  BCPL has two subscripting
    operators:
    	base!index		index is a word offset, form addresses a word
    	base%index		index is a byte offset, form addresses a byte

Handy syntactic sugaring for putbyte() and getbyte(), that I appreciated for
a short while, as it was still fairly recent when I switched to C... (I did
not use BCPL a lot, after all).

But it was not there at the time C was derived from BCPL! I am persuaded that
Ritchie & Co.  evolved C from BCPL for essentially two reasons, to have
different lengths of word recognized by the compiler, and to have a neater
syntax. BCPL has evolved a little bit... Still, you cannot define a byte
array in BCPL; this would go against the very fundamentals of the language.
Richards wrote that the easy portability of BCPL was largely based on having
the code generator deal with a single type.

    >In a sense C is a wonderfully equilibrated mix, BCPL with quite a good lot of
    >Algol68 thrown in, and this shows thru in things like some semantics
    >(BCPL-ish) of integer types, and their syntax (Algol68-ish).

    The semantics of C integral types resembles Algol 68 (which has e.g.
    int, short int, short short int, long int, long long int) rather than
    BCPL, which has only one type "word".

To me that is syntax. The semantics is that functionally C types are still
essentially "words", albeit of different lengths (to me, lengths do not change
the semantics of operations, and thus do not really introduce new "types").
Unsigned was a significant departure, especially in that it was defined
to obey the rules of modular arithmetic.

    The syntax of C constants resembles BCPL rather than Algol 68 (e.g. no
    general "radix" notation, characters as integer constants rather than
    char constants, \escapes -- BCPL uses *escapes, Algol 68 has no escapes
    at all).

Indeed, indeed; exactly what I meant. Apparently BCPL, going into B, and then
early C, remained quite BCPL-ish; on one one clearly "struct" was taken from
Algol68, but the fact that members of structs were essentially named offsets,
with no (in)visibility rules, was easily a way to transpose BCPL's
"pointer!named.int.constant" into "pointer->field_name", which is arguably
nicer than the literal translation "pointer[named_int_constant]". Of course
there are a lot of other details... Enough for now of these.

Eventually Bourne & Co. (I surmise) did bring a lot more Algol68 lore
(actually, amusingly, Algol68*C* -- for Cambridge) into Bell Labs., and C and
Unix.  As a sign of this, I was always greatly amused that in the released V7
adb $a was described as "Algol68 stack backtrace", even if the Algol68
compiler (probably a derivative of Algol68C, an excellent piece of
engineering) was never released to the general public... (at least not to me,
unfortunately!).

    >    introducing the signed keyword and related paraphernalia instead of
    >    allowing "int char" (an existing unintentional "feature" of some
    >    compilers, by the way) to do the trick,

    I will make an argument against this.  "int" does *not* in general have
    the meaning "make it signed".

Yes, Yes. But it could be construed to... Actually I would not like,
as you have understood, to have it have that meaning. I would rather
interpret "char int" mean "short short int" than "int char" mean
"signed char"...

    For example, "int unsigned", if accepted, is not signed!

Yes according to existing rules; but "unsigned int" (and "signed int"), are
exactly what I am trying to make obsolete!

In a sense you have spotted the weak point of my argument; if a declaration
were to be built of a length modifier and a base type (both optional), then
"unsigned int" would be illegal (two base types!), against existing common
practice. It could however be declared obsolescent and allowed as a special
case, which admittedly is ugly, but virtually painless.

    I would definitely expect that if "int char" or "char int" were accepted
    at all, they would be identical to "char" in every respect.

Yes, with some caveats, in the dpANS framework.  In my framework, char would
be a modifier, and unsigned/int base types. If the base type were omitted by
the programmer, any of the two base types could be defaulted by the
implementation, as currently is done. If not, "char int" would have to be
signed, and "char unsigned" not.

    What *would* have been consistent with C's intellectual ancestry, and
    *would* suggest signedness, would have been introducing "short short int"
    = "signed char" and "unsigned short short int" = "unsigned char".

Yes, again, except that I'd rather have "short short unsigned" mean "unsigned
char". I think that indeed one problem with Algol68 is that there is no
notion of unsigned. Since (in C, at least), unsigned behaves differently from
int, it ought to be regarded as a different base type to which apply the same
length modifiers as int.

    But I'm quite sure that X3J11 considered this and rejected it for good
    reasons.

Essentially that "short short" is superfluous, as "char" in practice is being
used for that. In that I agree, after all C is not Algol68.

As I have indicated, however, I'd rather dispose of Algol68 like length
indicators, except as an obsolescent feature; instead of wasting a keyword on
"signed", I'd rather waste it on "range" or whatever, and let the compiler
figure the appropriate number of bits.

As a more C-ish, and less radical alternative, I'd extend the bit field
notation to ordinary declarations. Let me quote from a reply I sent (no, I am
not yet like Prof. Dijkstra in quoting only my own works, diary and letters
:->) to somebody making points similar to yours:

    But with the current scheme I find myself doing things like

	#ifdef pdp11
	# define bit8 char
	# define bits16 int
	# define bits32 long
	#endif
	#ifdef vax
	# define bit8 char
	# define bits16 short
	# define bits32 int
	#endif

    (note use of #define and not typedef because I want to be able to say
    things like "bits8 unsigned") and then, as a consequence,

	typedef bits8 ascii;
	typedef bits16 procid;
	typedefs bits32 dollars;

    The first step is useless and circuitous, and less portable, as you have to
    have explicitly as many cases as you have machine types and compilers; I'd
    rather say:

	typedef unsigned ascii : 7;
	typedef int procid : 16;
	typedef int dollars : 32;

THE END, FINALLY!

Now for some meta-discourse.

I thank you for your civil reply.

I also have another reasons to thank you.

Evidently I have not been able to communicate to Mr Wells and Mr. Gwin that I
do know the existing language in the Classic C book by K&R, and the ones in
dpANS C, even if I find it less brilliant :-), and even if has been a (now
nearly fixed) moving target.

Evidently I have not been able to make them understand that I was trying to
show that with a little definitional legerdemain, for which there could be
some justification in existing or old compiler bugs, or in looking at Classic
C with a jaundiced, but historically justified, attitude, some potentially
confusing, and needless, X3J11 decisions could have been avoided, and the
Classic C syntax and pragmatics be made even simpler and more symmetric, at
virtually no cost in breaking existing programs.

A few people that have sent me msgs by email have penetrated my admittedly
somewhat heavvvvvy prose, and have understood as much, whether agreeing
(mostly) or disagreeing with me (like you).

I thank you for posting a reply that demonstrates to our audience, and not
to me alone, that somebody can understand the points I make, and address
them, instead of confusing my inability to express myself in a way
palatable to themselves with something else.
-- 
Piercarlo "Peter" Grandi			INET: pcg at cs.aber.ac.uk
Sw.Eng. Group, Dept. of Computer Science	UUCP: ...!mcvax!ukc!aber-cs!pcg
UCW, Penglais, Aberystwyth, WALES SY23 3BZ (UK)