Generating code for the switch statement

Tue Aug 5 06:37:41 AEST 1986

> As I understand it, a switch/case setup compiles exactly the same as
>	if (var == const1) {.....};
>	else if (var == const2) {.....};
>	else  {default_action};
> anyway.

A compiler should be able to do better than that.  Generating good code for
switches is one place where compiler writers can show they know something
about searching.

Using a branch table will produce faster and smaller code if the cases
are consecutive integers.  The only thing that may be non-obvious is the
best way to test whether the expression being switched on is within
range before accessing the branch table.  A straightforward coding of
this test requires two comparisons, but Ritchie's compiler for the
PDP-11 generated only one comparison using the machine language
equivalent of the following code:

	if ((unsigned)(value -= MIN) > (MAX - MIN) goto default;

On the PDP-11, this generates one less instruction (or two less, if MIN
is zero) than:

	if (value < MIN || value > MAX) goto default;

(Learning hacks like these is one of the benefits of reading the
compiler.)

Since some people like to count starting at one, a possible enhancement
would be to change MIN from 1 to 0 in this case by adding an extra entry
to the beginning of the branch table.  But real C programmer count from
0; wimps who count starting with 1 deserve to have their switch statements
execute slower.  :-)

If the cases are not consecutive, the branch table must be filled with
entries for the "missing" cases, making the table larger.  The Ritchie
compiler will generate a branch table if the number of entries in the
branch table would be less than three times the number of cases.  If
the cases are mostly consecutive with a few outliers, a branch table
could be augmented with a few specific tests for the outliers, but I
don't know of any compiler that does this.

There are two general methods for implementing tables which do not rely
on the keys being nearly consecutive:  hashing and the binary search.
Hashing, which is what the Ritchie compiler uses, has the advantage that
its performance is independent of the number of cases, just as with a
branch table.  The modulo instruction is used as the hash function; the
C compiler tries a variety of values for the modulus and selects the
best one which means that the time required to generate the hash table
is proportional to the square of the number of cases.

The UN*X VAX compiler, on the other hand, uses a binary search.  A
binary search executes in time proportional to the log of the number of
cases.  This makes it sound worse that the hash search; I presume
someone discovered that most switch statements contained few enough
statements that the binary search was superior.  The binary search works
best on a machine with a three way branch like that provided by the
Fortran arithmentic IF statement.  On a machine with condition codes
like the VAX, each step of the binary search is implemented by a compare
instruction followed by two branches on the condition codes.

Several things may be done to improve the performance of the binary
search.  First, many machines execute a conditional branch instruction
faster when no branch is taken, so that rather than testing the middle
value in the table one can test a value somewhat to one side in order to
decrease the chance that a branch will be taken.  This is implemented in
the VAX compiler, but some other possible improvements are not.  The
binary search could be abandoned in favor of the linear search at some
level (e. g. when the number of possibilities has been reduced to 2 or
3).  If there are a series of consecutive cases, this can be used to
eliminate extra tests.  (For example, if a number is known to be less
than 5 it cannot be greater than 4, and if a number is greater than 2 and
less than 4 then it must be equal to 3, but the VAX compiler will code
to test for both these cases.)  Finally, each comparison generated by
the VAX compiler looks like:

	cmpl	r0,$7		# compare switch value with 7
	beql	L1		# if equal, branch to the code for case 7
	bgtr	L8		# if greater, branch to L8 for the next cmpl.
	# The next cmpl instruction goes here.

Since the branch to L8 is more likely to be executed that the branch to
L1, the bgtr instruction should come first to minimize the number of
instructions executed.

For small cases, the good old linear search is used by both the Ritchie
and the VAX compilers.  Several things are done to speed up the linear
search.  First, the switch expression is copied into a register.  This
helps in general but loses when the switch instruction is a register
variable or the switch contains only one case.  (This is hard to fix
because the compilers generate the code for a switch in two pieces;
first they generate code to place the value to be switched on in r0, and
then they generate the code to implement the switch.  Another problem
with this separation appears on the PDP-11, where it may be easier to
compute an expression in r1 than in r0 due to a brain damaged multiply
instruction.)  Second, the compiler implements the search as a series of
compares followed by branch if equal instructions.  This minimizes the
time required to reach the default case because in the default case none
of the branches will be taken.  In contrast, the code at the top of this
article will probably be compiled into a series of branch if not equal
instructions, all of which will branch if there is no match.

The Ritchie compiler at one time actually generated a table of values
and labels, and did a linear search on it using a loop.  This saves
space but takes more time.  Furthermore, it doesn't even save space if
there are 1 or 2 cases.  (The Ritchie compiler has special handling of
switch statements with no cases except the default.)  The Ritchie
compiler was changed to generate linear searches using a series of
compare instructions quite a long time ago, but I don't know if Dennis
was responsible for the change.

I don't know why the VAX compiler uses with a linear search at all,
rather than using the binary search.  Probably this was a holdover from
the Ritchie compiler, which must be prepared to generate a linear search
because a linear search will outperform a hash search on a few elements.

In general, then, a compiler should do pretty well by using a jump table
where possible and a binary search otherwise.  What is actually optimum
must be determined by measuring various approaches on a specific machine.
				Kenneth Almquist
				ihnp4!houxm!hropus!ka	(official name)
				ihnp4!opus!ka		(shorter path)