Hex escape for quoted multibyte character
Teruhiko Kurosaka - Sun Intercon
kuro%shochu at Sun.COM
Wed Apr 26 04:18:40 AEST 1989
I have a question about relationship among three new concepts and
notation introduced by ANSI-C draft: multibyte characters,
wide characters, and hexadecimal escape notation.
For the following discussion, let's assume a character X is a
multibyte character and is represented by three byte sequnce: 0x8e 0xab 0xcd,
in some system.
The first question I have is how to represent this three-byte character
by hexadecimal escape sequnce within double-quoted strings.
The draft (12/7/88 p.30 line 14) says:
The hexadecimal digits that follow the backslash and the letter x in
a hexadecimal escape sequnce are taken to be part of the construction of a
single character for an integer character constant or of a single wide
character for a wide character constant. The numeric value of the
hexadecimal integer so formed specifies the value of the desired
character or wide character.
If I take this literally, it would be:
char *the_multibyte_char="\x8eabcd"; /* I-1 */
However, I noticed, the draft sometimes use the word "character" and
"byte" interexchangably. If the "character" actually means a byte, then
char *the_multibyte_char="\x8e\xab\xcd"; /* I-2 */
must be the right notation.
What I want to mean here is:
char the_multibyte_char_array[]={0x8e, 0xab, 0xcd, 0};
char *the_multibyte_char=the_multibyte_char_array;
Another related question is, how to use the hexadecimal escape in
the wide character string ( L"..." ). Let's say, the wide character value
for this character X is 0xbcde. Then, a wide character string
that includes only one character X should be written as:
wchar_t *the_wide_char_str=L"\xbcde"; /* II-1 */
or should it be:
whcar_t *the_wide_char_str=L"\xbc\xde"; /* II-2 */
to mean:
whcar_t the_wide_char_array={0xbcde, 0};
whcar_t *the_wide_char_str=the_wide_char_array;
?
And finally, which is right?
whcar_t the_wide_char=L'\xbcde'; /* III-1 */
whcar_t the_wide_char=L'\xbc\xde'; /* III-2 */
My personal choices are I-2, II-I and III-1. This is based on my personal belief that
a hexadecimal escape sequnce should describe the value of the 'atom' element
in a notation. Because a double quoted string is of type (char *), it's atom's datatype
is char, which actually means a byte for historical reasons all of you know. Therfore
an escape sequnce should describe a byte. For the same reason, a hexadecimal
escape sequnce within a wide character constant/string-literal should describe
a wide character.
I would like to know what other people's think about this.
In your response, please distinguesh what you think ANSI-C should have been, and
what ANSI-C spec (draft) should be interpreted.
Thank you in advance.
-T.Kurosaka, Sun Microsystems
More information about the Comp.std.c
mailing list