function calls
Michael Meissner
meissner at osf.org
Sat Mar 17 01:32:26 AEST 1990
In article <14272 at lambda.UUCP> jlg at lambda.UUCP (Jim Giles) writes:
| Careful register-allocation conventions are usually the ones that use the
| registers most greedily. This is because, in general, the more you use
| the registers, the faster your code goes. If your "careful" approach to
| registers is not greedy, how much performance am I loosing to it by not
| getting full use out of the hardware availible? Further: what's "an
| adequate supply of registers"? I know code which can use up as many as
| you give me. In fact, if interprocedural analysis were commonplace
| instead of rare, you would probably find that the register set was almost
| always completely full (this wouldn't matter since it would be a logical
| consequence of register allocation being done on the program instead of
| a routine at a time).
|
| These problems may all be solved in the future - even the very near future.
| But at present, only MIPS and Cray (the only ones mentioned anyway) have
| addressed this problem. And these two 'solutions' rely on the 'callee'
| not using lots of registers and the 'caller' deliberately leaving some
| spare ones - but this, in itself, may have a negative impact on performance.
Umm, the 88k also partitions the register set into caller save and
callee save. For the two machines that I'm familar with, the
breakdown is as follows:
MIPS: 32 integer 32-bit registers
1 register hardwired to 0
2 registers used for return value and/or staic link
4 registers used for arguments
10 registers not preserved across calls
9 registers preserved across calls
1 register for stack pointer
1 register for return address
1 register for accessing small static/global data
1 register used by the assembler
2 registers used by the kernel
16 floating point 64-bit registers
2 registers for return value
2 registers for arguments
6 registers not preserved across calls
6 registers preserved across calls
88K: 32 32-bit registers (double precision takes 2 regs)
1 register hardwired to 0
8 registers for arguments & return value
4 registers not preserved across calls
13 registers preserved across calls
4 registers reserved for linker/OS
1 register for the stack pointer
1 register for the return address
Note that registers used for passing arguments, and returning values
are also used as temps. If the return address is stored on the stack,
the register that holds can also be used for a temporary. Neither
architecture requires the use of a frame pointer, though frame
pointers can be synthesized easily if needed because variable sized
stack allocations are done. Finally, both machines software defines
static tables that describe where registers are stored on the stack,
and what register and offset from that register are to be used as a
virtual frame pointer for use in the library and in the debugger.
The MIPS compilers also have a -O3 option which does global register
allocation. Here is an fragment of the man page from a Decstation:
-O3 Perform all optimizations, including global
register allocation. This option must
precede all source file arguments. With this
option, a ucode object file is created for
each C source file and left in a .u file.
The newly created ucode object files, the
ucode object files specified on the command
line, the runtime startup routine, and all
the runtime libraries are ucode linked.
Optimization is performed on the resulting
ucode linked file and then it is linked as
normal producing an a.out file. A resulting
.o file is not left from the ucode linked
result. In fact -c cannot be specified with
-O3.
-feedback file Use with the -cord option to specify the
feedback file. This file is produced by with
its -feedback option from an execution of the
program produced by
-cord Run the procedure-rearranger on the resulting
file after linking. The rearrangement is
performed to reduce the cache conflicts of
the program's text. The output is left in
the file specified by the -o output option or
a.out by default. At least one -feedback
file must be specified.
Because of the restriction of not specifying -c, I'm not sure how many
people use this in practice for large software. I would imagine that
for programs which use use runtime binding (ie, emacs, or C++ code
with virtual functions), it would default back to the standard calling
sequence. I wonder how much it buys for typical software as opposed
to special cases.
--
Michael Meissner email: meissner at osf.org phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA
Catproof is an oxymoron, Childproof is nearly so
More information about the Comp.lang.c
mailing list