C Floating point arithmetic

Fri Dec 6 12:46:21 AEST 1985

> >> Any intelligent scientific programmer trys it both ways
> >> on his own and determines whether single precision is adequate
> >> (without the help of a numerical analyst, an analyst is consulted
> >> when one wants to understand the root of the problem in an attempt
> >> to rearrange the computation so that single precision is sufficient
> >> should it fail).  .... more stuff, largely irrelevant .....
> 
> And suppose that the numerical analyst says that double precision
> isn't enough precision regardless of the ordering of the computation;
> what do you do then?
> 
> Check out:
> 
> Linnainmaa, Seppo
> "Software for Doubled-Precision Floating-Point Computations",
> ACM Transactions on Mathematical Software,
> Vol. 7, No. 3, Sept 1981, pp. 272-283
> 
> -- 
> Ken Turkowski @ CIMLINC, Menlo Park, CA
> UUCP: {amd,decwrl,hplabs,seismo,spar}!turtlevax!ken
> ARPA: turtlevax!ken at DECWRL.DEC.COM

Double, Triple, more???? I routinely do computations involving precision
to hundreds of digits! e.g. computational number theory, cryptography, etc.
For such applications floating point is virtually useless on most machines
(and I've worked on a LOT of different ones). To do efficient multi-precision
arithmetic for numbers in this size range you can't do much better than
n^2 algorithms for multiplication and division. Floating point numbers
would take up too many bits of each word in a multi-precise number with the
exponent. Until you get really large , i.e. thousands of digits, the
Schonhage-Strassen FFT based multiply algorithm has too high an overhead.
Also for numbers in that range it is faster to do division by finding an
accurate inverse via Newton's method then doing a multiply. Unfortunately
quite a few machines today will not multiply 2 full words together giving
a double length product nor will it divide such a product by a full word
yielding a full word quotient and remainder. The 68010 is such a processor.
For multi-precision calculations you must therefore restrict your radix
to half the size of a full word and slow down your application by 4.
Give me a 128 bit machine with double length registers!!!!!

Bob Silverman