0
101 Aug 01, 2008 at 21:53

I need a fast way to compute sqrt(x*x + y*y + z*z) that avoids intermediate overflow and underflow.

I know that most runtime libraries (C,C++,Java) provide a 2d hypot() function that does this for 2 variables, _AND_ I realize that I can achieve my result using 2 calls to it as follows:

hypot( hypot(x, y), z)

but that seems uneccessarily slow. Is there a faster way that still avoids intermediate overflow and underflow? I am _NOT_ looking for a fast sqrt() function here. I am well aware of the _NUMEROUS_ threads on fast sqrt() and fast reciprocal sqrt(). What I am looking for is a way to scale the 3 inputs before calling <any fast sqrt method here>, and then to un-scale afterwards to obtain the accurate result.

So I started looking at how to implement a 2d hypot() function:

Define hypot(x, y) as sqrt(x*x + y*y) but that avoids intermediate overflow and underflow:
Using the fact that sqrt(x\^2+y\^2) == k*sqrt((x/k)\^2+(y/k)\^2) for k>0
you can implement a fast hypot() by choosing k to be a power of 2 close to sqrt(x*y). Since you choose k to be a power of 2, computing k, x/k, and y/k can be done using bit twiddling and knowledge of IEEE floating point representation. I can go into that further, but my problem is how to generalize this for 3 components x,y,z without simply nesting the call as hypot(hypot(x,y),z). Any suggestions?

#### 8 Replies

0
167 Aug 01, 2008 at 21:58

Since sqrt(x*y) is the geometric mean of x and y, I’m guessing for 3D you want the geometric mean of x, y, and z, which would be the cube root of x*y*z. But I suspect that’s not as easy to estimate by bit-twiddling as sqrt(x*y) is…

0
101 Aug 01, 2008 at 22:07

Thanks for the reply; I see that my reference to hypot() causes confusion but actually all I want is the magnitude of a 3d vector that avoids overflow and underflow that one would normally get in the interim calculation of

sqrt(x*x + y*y + z*z).

{Square, not cube root in my case}.

0
101 Aug 01, 2008 at 22:08

Sorry now I understand. You’re saying cubeRoot(xyz) for the moderating factor. I see.

0
101 Aug 02, 2008 at 09:05

Hi.

Take a look at this paper:

http://torus.untergrund.net/misc/distance_by_pythagorean_sums.pdf

It shows how to calculate the vector length in a non obvious but mathematical much more stable way. It’s not *that* fast though.

Don’t bookmark the pdf please. I’ll take it offline in a couple of days..

Cheers,
Nils

0
101 Aug 08, 2008 at 19:55

Can you not extend your original idea?
sqrt(x\^2+y\^2+z\^2) == k*sqrt((x/k)\^2+(y/k)\^2+(z/k)\^2) for k>0
Assuming you want the geometric mean (as suggested by Reedbeta) you want k to approximate (xyz)\^(1/3) as a power of 2 so

2^n =  (xyz)^(1/3)
2^3n = xyz
n = (ln xyz) / 3(ln2)


Urgh maybe its not the way to go, but only one ln needs to be calculated. I don’t know of any useful approximations to ln x, but it would only have to be loosely accurate
After that, x/k, y/k and z/k can be done again with bit shifts.

0
101 Aug 09, 2008 at 00:31

(Thanks to Nils for the fascinating paper).

Yes I believe that is the idea. If you assume IEEE format for the floats you can do some bit-fiddling to approximate a log_2 function by extracting the exponent bits.

Something like this might work for positive x:

floor(log_2(float x)) is something like: (floatToIntBits(x) >> 23) - 127

where in C you might implement floatToIntBits(x) as: *(int*)&x
or (preferably?) use an anonymous union:
union {
int asInt;
float asFloat;
} u;
u.asFloat = x;
return (u.asInt >> 23) - 127;

0
101 Aug 09, 2008 at 01:52

So I think you want this. The 2 float/integer conversions are probably the slowest bits.

float get_k(const float &x, const float &y, const float &z)
{
union
{
float mult;
int bits;
};
mult = x*y*z;
const int exp = (((*reinterpret_cast<int*>(&mult)) >> 23) & 255) - 128;
bits = (bits & ~(255 << 23)) + 127 << 23;
mult = (((-1.0f/3.0f) * mult + 2.0f) * mult - 2.0f/3.0f + static_cast<float>(exp))* 0.69314718f; // mult = ln(xyz)
mult *= 0.48089834; // mult = (ln xyz) / 3(ln2) = n

// k = 2^n = 2^mult
return (1<<static_cast<int>(mult));
}

0
101 Aug 09, 2008 at 18:02

for(long i = 0; i < 500000; i++){
ax = (float)rand() + 1 / 100.0f;
ay = (float)rand() + 1 / 100.0f;
az = (float)rand() + 1 / 100.0f;
bx = (float)rand() + 1 / 100.0f;
by = (float)rand() + 1 / 100.0f;
bz = (float)rand() + 1 / 100.0f;

dx = ax - bx;
dy = ay - by;
dz = az - bz;

dist = sqrt(dx + dy + dz);
}

takes 0.07 seconds on a core2 duo 2.14ghz - of course, that’s all I was doing….