Report on WG15 Rapporteur Group

David Wheeler wheeler at
Sat Mar 17 09:35:09 AEST 1990

From: wheeler at (David Wheeler)

domo at (Dominic Dunlop):
= From: Dominic Dunlop <domo at>
= 	   Report on ISO/IEEE JTC1/SC22/WG15 Rapporteur Group on
= 	         Internationalization Meeting of 5th - 7th
= 	              March, 1990, Copenhagen, Denmark
= 	            Dominic Dunlop   --  domo at
= 	                  The Standard Answer Ltd.

I enjoyed your posting, thank you!  You included a lot of "what this
phrase really means" that I appreciated.

= 	 3. ISO 646[4], the earliest ISO standard for information
= 	    technology, is the international derivative of ASCII.
= 	    Its Danish variant replaces ASCII's } with aa.  Around
= 	    the world, #$@[\]^`{|}~, all of which have a special
= 	    meaning to the shell, are replaced by other characters
= 	    in standards derived from ISO 646.  See [5] for much
= 	    more information.

Isn't there an 8-bit standard character set that defines the first 128
characters as a standard set (say as USASCII, provincial I'm afraid but it
would break no Unix tools), then includes all the international
characters as those with values > 127?   If this were used in the POSIX
standard, wouldn't this solve many problems for those using a
Latin-based alphabet? Or is this standard unused in the real world?
Admittedly this eliminates the non-Latin alphabet world, and that
is a weakness.

= 	Apart from all this organizational stuff, we did review some
= 	existing documents.  For example, DTR (draft technical
= 	report) 10176, a product of SC14, discusses the treatment of
= 	characters appearing in language constructs, variable names,
= 	literals and comments, and turns out to have implications
= 	for sh, awk, yacc and the other ``little languages'' defined
= 	in DP 9945-2, the forthcoming international standard for the
= 	shell and tools.  And a document from SC22's study group on
= 	character sets suggests that source files should have some
= 	means of announcing the character set that they're using.
= 	Could this mean typed files or resource forks for POSIX6?
= 	Gee.  How would we hide that?

Some C programs would have to be fixed to deal with signed characters
but at least the rules would be simple: 128+ are ordinary characters &
can be used in identifiers, etc.

Source file tagging for language sounds like an abomination!

--- David A. Wheeler
    wheeler at

Volume-Number: Volume 18, Number 80

More information about the Comp.std.unix mailing list