What I expect from a C optimizer
Stuart Gathman
stuart at bms-at.UUCP
Thu Jan 12 07:42:45 AEST 1989
1) Jump optimization - easy for machines, hard for humans and machine
dependent to boot, almost any decent compiler does it. This
includes choosing optimal 'switch' strategies.
2) Common subexpression elimination. This includes pointer expressions
and should apply to an entire function, not just one statement.
All variables used should be divided into two classes, 'alias'
and 'noalias', by one of three methods (in increasing order of expense):
a) 'register' variables are noalias.
b) 'register' variables and 'auto' variables whose address is
never taken within the function are noalias.
c) 'register' variables, 'auto' variables whose address is never
taken within the function, and 'static' variables whose
address is never taken within the module are noalias.
For structured variables, taking the address of any member at any
level cancels noalias status for the entire structure.
(One can see why compiler makers wanted the 'noalias' keyword.
One can also see why it would be dangerous.)
Subexpression trees are not considered identical when a member
of the tree is 'alias' and there is a function call or modify
indirect operation seperating the references. When expression
rearrangement allows, the offending operation should be moved.
Of course, modification of any variable member of the tree also
renders them non-identical. Some compilers may be able to detect
identities (e.g. *p++, *--p), but this is not required. The
key thing is that a->b->c->d used twice will not follow the pointer
chain each time.
CSE elimination should not be confined to values in registers.
The compiler should allocate and compute temporaries where
required.
3) Constant reduction. This is specified as a requirement in
K&R, but many compilers don't do it! Gcc does it in the
parse phase. In particular this includes:
if (0) { ... }
4) Register optimization: let CSE determine register usage.
There is no reason to assign a fixed register to a variable.
Registers that are never flushed do not need memory allocated.
Register flushes never followed by a reload should be eliminated.
Flush variables declared 'register' as a last resort.
Flushing strategy can range from a simple LRU over sequential
program flow to attempts to guess frequency of usage. The
latter should still give 'register' priority because the
compiler doesn't know usage context (apart from profiler
feedback used by some systems - but it is still unknown the
first time it is compiled, and the profile results from every
potential application would have to be combined to do library
routines properly).
LRU over program flow with 'register' priority would suit me
just fine.
5) Loop unrolling: this is OK as long as code size is not increased
significantly. If I want speed over size, I'd rather unroll it myself.
GCC does the important things in several steps:
1) constant reduction is done while parsing.
output: tree representation of program with all variables
as memory references. Gcc calls this form 'C-tree'
2) translate to program using infinite registers. Do CSE and
assign 1 register to each unique sub expression. GCC calls
this form 'RTL'.
3) assign virtual to real registers with appropriate flushes and
reloads from memory. This is still in RTL form.
4) translate to actual machine instructions with peephole
optimizations. This is programmable using a machine
description in Gcc, the other steps are portable.
5) 4 can be done before 3
Apparently, CSE is the hardest part since very few compilers do a decent
job of it.
--
Stuart D. Gathman <stuart at bms-at.uucp>
<..!{vrdxhq|daitc}!bms-at!stuart>
More information about the Comp.lang.c
mailing list