From billa@entropy.sps.mot.com Tue Mar 29 09:41:29 PST 1994
Article: 4332 of comp.graphics.algorithms
Newsgroups: comp.graphics.algorithms
Path: kpc!news.kpc.com!sgiblab!swrinde!emory!sol.ctr.columbia.edu!howland.reston.ans.net!cs.utexas.edu!uunet!spsgate!mogate!newsgate!billa
From: billa@entropy.sps.mot.com (William C. Archibald)
Subject: Re: looking for reference
MessageID: <1994Mar28.233939.29596@newsgate.sps.mot.com>
Sender: news@newsgate.sps.mot.com
NntpPostingHost: 223.12.249.25
Organization: Motorola AMCU Division, Austin, Texas.
References:
Date: Mon, 28 Mar 1994 23:39:39 GMT
Lines: 170
td@tempel.research.att.com (Tom Duff <418821922> 0112730) responds:
> hbaker@netcom.com (Henry G. Baker @ nil) asks:
> >I remember seeing a very clever iterative algorithm for computing
> >Euclidean distances on the plane by Bentley??? in about 1985. It
> >converges cubically in the number of iterations  i.e., faster than
> >Newton's method for the square root. Does anyone know this reference?
> >It's driving me crazy trying to find it. Thanks in advance.
> The paper in question is
> Cleve Moler and Donald Morrison,
> ``Replacing Square Roots by Pythagorean Sums,''
> IBM Journal of Research and Development,
> Vol. 27, Number 6,
> pp. 577581,
> Nov. 1983
> [...]
Here's a reposting of the tail end of this discussion when it went through
comp.graphics. Ken Turkowski originally cited Tom Duff's SIGGRAPH '84 tutorial
on Numerical Methods for Computer Graphics (which I would love to get
a copy of...) as his source for the MolerMorrison algorithm. BTW, it also
works well in fixedpoint.
,
w. archibald

snip, snip, snip
From: turk@Apple.COM (Ken "Turk" Turkowski)
Newsgroups: comp.graphics,sci.math
Subject: Re: Iterative square root function?
MessageID: <4331@internal.Apple.COM>
Date: 25 Sep 89 05:23:18 GMT
In article <1989Sep25.022915.5886@sun.soe.clarkson.edu> stadnism@image.soe!clutx.clarkson.edu writes:
>In the Ray Tracing News, vol. 2 # 4 (I think), someone mentioned a cubically
>convergent method for doing Pythagorean sums and square roots. Does anyone
>have info on this method, and any methods for computing quick trig and inverse
>trig functions? It would be much appreciated.
Here's a reposting of an article of mine of a few years ago:
[MolerMorrison, "Replacing Square Roots by Pythagorean Sums", IBM
Journal of Research and Development", vol. 27, no. 6, pp. 577581, Nov.
1983] describes an algorithm for computing sqrt(a^2+b^2) which is fast,
robust and portable. The naive approach to this problem (just writing
sqrt(a*a+b*b) ) has the problem that for many cases when the result is
a representable floating point number, the intermediate results may
overflow or underflow. This seriously restricts the usable range of a
and b. Moler and Morrison's method never causes harmful overflow or
underflow when the result is in range. The algorithm has cubic
convergence, and depending on implementation details may be even faster
than sqrt(a*a+b*b). Here is a C function that implements their
algorithm:
double hypot(p,q)
double p,q;
{
double r, s;
if (p < 0) p = p;
if (q < 0) q = q;
if (p < q) { r = p; p = q; q = r; }
if (p == 0) return 0;
for (;;) {
r = q / p;
r *= r;
s = r + 4;
if (s == 4)
return p;
r /= s;
p += 2 * r * p;
q *= r;
}
}

And a relevant response:
> From decwrl!decvax!dartvax!chuck Mon Nov 19 23:15:02 1984
> Subject: Pythagorean Sums
>
>
> Ken 
>
> I recently worked on our assembly language routines which compute
> the square root and absolute value of a complex number. Thus, your
> posting of this article caught my attention. I went so far as to
> visit our library and read the paper by Moler and Morrison. I then
> reread the code that we use to implement our routines in hopes of
> improving them.
>
> In their paper, Moler and Morrison write "The [algorithm] is
> robust, portable, short, and, we think, elegant. It is also
> potentially faster than a square root." Unfortunately, I've come
> to the conclusion that our existing algorithm is about as robust,
> shorter, faster, and more elegant. (I'm not sure about portability.)
>
> I thought I'd send this letter to you, hoping that if you got bored
> one afternoon, you could point out any flaws in my thinking. And I
> thought that there might be a small chance you would be interested
> in a comparison of the MolerMorrison algorithm with another algorithm.
>
> The algorithm that we use is:
>
> double hypot(p,q)
> double p,q;
> {
> double r;
> if (p == 0) return (q);
> if (q == 0) return (p);
> if (p < 0) p = p;
> if (q < 0) q = q;
> if (p < q) {r = p; p = q; q = r;}
> r = q / p;
> return p * sqrt (1 + r * r);
> }
>
> Moler and Morrison claim two advantages: their algorithm does
> not overflow very easily and it converges rapidly. However, depending
> on just how "sqrt" is implemented, our algorithm will not overflow
> very easily, either. This brings us to speed considerations. The
> standard method for computing the square root is the NewtonRaphson
> method which coverges quadraticly (sp?). The MolerMorrison
> algorithm converges cubically. However, the amount of work needed for
> one iteration of the M&M algorithm will allow us to perform two iterations
> of the NewtonRaphson method.
>
> Let me unroll these two methods side by side so we can see what is
> involved in computing each. We assume that the arguments 'p' and 'q'
> have been normalized so that 0 < q <= p. (Pardon my pseudocode.)
>
> Newton_Raphson(p,q) Moler_Morrison(p,q)
> r = q/p; r = q/p;
> r *= r; r *= r;
> r += 1;
>
> /* linear approx. to sqrt(r)
> using magic constants 'c1' and
> 'c2'. Note that 1 <= r <= 2.
> See "Computer Evaluation of
> Mathematical Functions"
> by C.T. Fike. */
>
> g = r * c1 + c2; /* 7 bits */ s = r / (4 + r);
> g = (g + r/g) * .5; /* 15 bits */ p = p + 2 * s * p;
>
> /* Here NR has about 5 digits of accuracy, MM has about 6,
> and NR has performed an extra addition. */
>
> g = (g + r/g) * .5; /* 30 bits */ q = s * q;
> g = (g + r/g) * .5; /* 60 bits */ r = q / p;
> r = r * r;
> s = r / (4 + r);
> p = p + 2 * s * p;
>
> /* Now, both algorithms have about 20 digits of accuracy.
> Note that (depending on how one counts) MM uses from 2
> to 3 more floating point multiplication than NR on each
> iteration. NR used an extra addition to set up for the
> iterations and will need an extra multiply to return its
> value. On each further iteration, the number of significant
> digits will quadruple for the NR method and only triple
> for the MM method. */
>
> return p * g; return p;