getch() and getche() in MSC 4.0
Steve Summit
scs at athena.mit.edu
Fri Oct 21 16:54:53 AEST 1988
This is a snide, whiney "I told you so" to the efficiency addicts
and macro panderers out there.
In article <10508 at dartvax.Dartmouth.EDU> Scott Horne writes:
>Has anyone else had trouble with getch() and getche() in Microsoft C v. 4.0?
>They often skip every other keypress on me--and in one case, they skip two
>keypresses out of three! Maybe it's my code. This occurs mainly when I try
>
> c = toupper(getch());
(getch and getche are fairly pointless and superfluous low-level
analogues to getchar, but this is irrelevant.)
In the old days, the toupper macro worked correctly only on
lowercase alphabetic characters, which meant that one often
ended up writing
if(islower(c))
c = toupper(c)
The hackers at the Shady Hill home for arthritic-fingered
programmers got tired of typing this, so a variant appeared:
toupper could be made to work correctly (a laudable goal) with
an implementation such as:
#define _toupper(c) ((c) - ('a' - 'A'))
#define toupper(c) (islower(c) ? _toupper(c) : (c))
Now, there are three conventions for writing macros:
1. Parenthesize fully, inside and out
2. Use capital letters in the name, to remind the reader
it's a macro and may therefore act weird
3. Make every effort not to repeat "arguments," so that
side effects aren't replicated
A "side effect" is anything that an expression does other than
"return" a value, and is therefore a problem if something like
toupper(*p++)
is (textually, before the code generator gets to it) expanded to
islower(*p++) ? _toupper(*p++) : *p++
How many times is p incremented?
Besides pre- and postincrenment and -decrement, the other classic
example of a side effect is I/O. What a coincidence: look at
what Scott Horne used as an argument to toupper, and note the
curious concordance between the period of its failure mode (two
out of three) and the number of times toupper's argument is
repeated in its expansion.
Rule 2 is occasionally broken by "standard library" facilities,
but generally only when rule 3 is observed, so that the
distinction between function and macro is transparent to the
caller.
The "improved" toupper macro, scrupulous as it is in its
adherence to rule 1, violates both rules 2 and 3, and is
therefore a perfect ticking time bomb long term booby trap of
a recurring nightmare for unsuspecting programmers everywhere.
If it is desirable for toupper to work correctly on characters
that are nonalphabetic or already upper-case (I believe this
property is called "idempotence," and as I said, it is a laudable
goal), then the macro implementation has to be sacrificed, and
toupper() made a proper function.
By the way, the fancy toupper macro also violates a fourth rule,
almost universally ignored today, which is that macros shouldn't
expand to "too much" code, because in the old days we only had
64K or so to play with, and every byte counted. The most famous
exception is the recent Berkeley line-buffered putc macro, which
is something like seven backslash-continued lines long, although,
believe it or not, it does manage to guarantee a single
evaluation of its first argument, so putc(*p++, fd) will work, as
indeed it must. One would try something ludicrous like
FILE *fdarray[10]; ... putc(c, fd[i++]) at one's extreme peril,
however.
Now, with respect to Microsoft, their run-time library gets
tugged in several directions as they try to maintain
compatibility with existing code while migrating toward ANSI, and
in version 4 I believe they had two separate versions of toupper,
depending on which header file you #included. To make things
even more confusing, I think one header file gave you the unsafe
macro I'm disparaging, and the other got you a real function.
(Of course, there was also a third implementation, called
"_toupper", which is the non-checking version, safely
implementable as a macro, such as appears in the example towards
the beginning of this article.)
(These difficulties may be resolved in Microsoft's Version 5.
Although I happen to use Microsoft V5, I don't pay much attention
to its or anyone's implementation of islower/toupper any more.
Any code of mine that cares protects itself with
#ifdef _toupper
#undef toupper
#define toupper _toupper
#endif
which recreates, with only the barest twinges of worry about
undermining _reserved ANSI identifiers, a cozy V7 environment.
I'll call islower() explicitly; thank you. Note that I do this
not for efficiency's sake but for safety; an even more likely
side-effect-containing argument for ctype macros than getch() is
*p++.)
The bottom line is, don't implement things with macros unless
it's absolutely safe. The potential efficiency improvements
simply aren't worth it when they lead to these "little
surprises." In those rare cases where the efficiency gain is
significant and important, capitalize the hell out of the macro
name and plaster the code and documentation with big warnings,
and budget some time for the confusion and stubborn bugs which
will still inevitably arise.
Speaking of documentation; some will haughtily tell the original
complainant to RTFM; Microsoft's manual may well state that
toupper is a macro and can't be used on arguments with side
effects. That's unacceptable. Someone coined a nice phrase
called the "principle of least surprise." Among other things, it
holds that there is a class of mistakes which are so easy to make
that no amount of documentation will rescue them; the only
solution is to remove the problem, in this case the dangerous
macro implementation.
Let's not get started on tweaks to the preprocessor to make
dangerous macros safer to write; we just spent a month or so
exhaustively treating how not to square numbers. If you want to
work on something, work on good inlining algorithms instead. And
before you think that your proposed improvements to the
preprocessor make whacko macros safe, or even that the three or
four rules listed above are sufficient, consider
putc(c,
fd);
which is what people like me write when we've indented ourselves
into a brick wall at the right margin but are for some stupid
reason reluctant to break out into another subroutine. Although
ANSI says macro invocations are allowed to cross newline
boundaries, there are a lot of existing preprocessors which can't
handle them without explicit backslash continuations. (I can't
say I blame them, macro invocations spanning newlines being
rather extremely painful to implement correctly.)
Steve Summit
scs at adam.pika.mit.edu
More information about the Comp.lang.c
mailing list