Multibyte characters
Mike Banahan
mikeb at inset.UUCP
Tue Jul 3 23:47:58 AEST 1990
On the interesting subject of wide characters, multibyte characters and
so on, I haven't noticed a discussion in this group which touches on
the following.
Let's say that I do have a multibyte execution character set which supports
for the sake of argument, English and Greek, with Greek using a shift-in
shift-out mechanism.
A string of the form "abc at d" is valid C (using @ to represent the Greek
character `alpha'.
It will contain 8 bytes, counting the shift-in, shift-out and the null
at the end.
Presumably the integral constant '@' is a three-byte constant, no matter
what it may look like? An alternative interpretation is that it violates
the constraint in 2.2.1.2 `a .. character constant .. shall begin
and end in the initial shift state', but presumably I can expect my
implementation to do the necessary good deeds and put a shift-out
in there too.
Since it is a three-byte constant (assuming I'm right), then can I be
sure that I do not get overflow when I assign it to a char variable?
3.1.3.4 says that the value of a multi-character character constant
will be implementation-defined, and 3.2.1.2 says that that (paraphrase)
demoting an int to a char gives an implementation-defined result.
So to call it `overflow' is perhaps overstating the case, but I clearly
end up in implementation-defined territory twice over.
Sorry if this has been discussed before. If not, could someone enlighten
me as to the actual situation?
Thanks in advance,
Mike Banahan
--
Mike Banahan, Technical Director, The Instruction Set Ltd.
mcvax!ukc!inset!mikeb
More information about the Comp.std.c
mailing list