Integer Multiply/Divide on Sparc
Robert D. Silverman
bs at linus.UUCP
Sat Dec 30 00:18:05 AEST 1989
In article <15418 at vlsisj.VLSI.COM> davidc at vlsisj.UUCP (David Chapman) writes:
:In article <84768 at linus.UUCP> bs at linus.mitre.org (Robert D. Silverman) writes:
:>Does any have, of know of software for the SPARC [SUN-4] that will
:>perform the following:
:>
:> [standard multiply and divide]
:>
:>The SPARC is brain dead [as were its designers] when it comes to doing
:>integer arithmetic. It can't multiply and it can't divide.
:
:There should be instructions on the order of "multiply step" and "divide
:step", each of which will do one of the 32 adds/subtracts and then shift.
There is a multiply step instruction. There is no such support for division.
It can take 200+ cycles to do a division on the SPARC [worst case].
A 32 x 32 bit unsigned multiply takes 45-47 cycles. Programs that have a
significant number of multiplies and divides can run SLOWER on a SPARC
than on a SUN-3. [I have such!] ONLY because of the slow multiply/divides.
:I'm not particularly fond of the SPARC architecture (don't like register
:windows), but this is a theoretical viewpoint and is not based on any
:direct exposure to assembly-language programming for it (translation:
:sorry, I can't give you any more help).
:
:Neither SPARC nor its designers were brain-dead when it was built. It's just
I didn't say they were. I said they were with respect to arithmetic. I stand
by that assertion. Most programs may not need multiply/divide in hardware.
However, for those that do require it, not having it is a real KILLER
of algorithms.
:that it is difficult to get multiplication and division (especially the
:latter) to run in 1 or 2 clock cycles. All instructions are supposed to
I know of quite a few DSP chips that do multiplies in 1 cycles. Divides
take a little longer [but not much; Ernie Brickell of SANDIA invented a
hardware divide that works much faster than standard conditional-shift/
subtract].
:execute in the ALU in 1 cycle; if the multiply and divide instructions take
:more time then the front of the processor pipeline has to be able to stall
:and this added complexity will slow down the entire processor.
:
:Thus they provide you with the tools to do your own multiply and divide.
See above. They are too slow.
:One of the benefits is that a compiler can optimize small multiplies and
:divides to make them execute quicker (i.e. multiply by 10 takes 4 steps
That's fine for multiply-by-constant. Most programs that NEED multiply/divide
are multiplying variables.
:P.S. Don't write a loop on the order of "MULSTEP, DEC, BNZ" or it will be
: incredibly slow. Unroll the loop 4 or 8 times (MULSTEP, MULSTEP,
: MULSTEP, MULSTEP, SUB 4, BNZ). Branches are expensive.
Agreed. In fact my 32 x 32 bit multiply consists of 32 calls to multstep
and no looping at all. It is still slow.
--
Bob Silverman
#include <std.disclaimer>
Internet: bs at linus.mitre.org; UUCP: {decvax,philabs}!linus!bs
Mitre Corporation, Bedford, MA 01730
More information about the Comp.lang.c
mailing list