wchar_t values
Erik M. van der Poel
erik at srava.sra.co.jp
Wed Apr 10 15:23:54 AEST 1991
Sorry, I'm a bit late with this reply. Just a few minor nits:
Al Harkcom writes:
> 'c' in all three of
> the popular multibyte encodings (EUC, JIS, SJIS) is 0x63 (same as
> ASCII). The most common wide character format (UJIS) has 'c' as
> 0x0063 (ASCII in 2 bytes).
EUC is the name of the scheme, while UJIS is the name of the Japanese
EUC. UJIS is not a wchar_t encoding.
> Keld Simonsen writes:
> =}Thus the internal widechar representation of 'c' and the external
> =}multibyte representation SHOULD not be the same for character sets
> =}like ISO 10646, JIS X 0208, KS C 5601 and GB 2312.
> =}At least this should hold for characters in the C character set.
>
> Huh? This doesn't follow... It doesn't even sound correct. A single
> byte wide character set using values above 0x80 in addition to the
> ASCII characters would become difficult...
You're probably referring to the European characters with the 8th bit
up. These are not relevant in this discussion since the ANSI C wchar_t
spec explicitly refers to the basic character set, which does not
include these European characters.
> =}The reason why the Japanese have not seen the problem before with
> =}JIS X 0208, but first with 10646, is beyond my understanding.
> =}Maybe some Japanese could enlighten us (me!) on this?
>
> What 'problem' do the 'Japanese' see with ISO 10646?
Keld is referring to the problem that I brought up in the first
article in this thread. I.e. 10646 'c' does not have the same numeric
value as ASCII 'c'.
-
--
Erik M. van der Poel erik at sra.co.jp
Software Research Associates, Inc., Tokyo, Japan TEL +81-3-3234-2692
More information about the Comp.std.c
mailing list